Notes on Apache Redirect/Rewrite

Some Notes on Using Apache2 Redirect/Rewrite

A need recently arose to replace one domain name with a different one, which in turn required moving the web pages of the former domain to the new domain. In order to provide a reasonably smooth transition, the decision was made to retain the old site for a short period with all attempts to access any location at it directed to a page write a simple message indicating the new location.

To state the requirements a slightly different way, a web site needed to be set up wherein a request to any arbitrary web page would be directed to a single existing web page that would display a message indicating that the web page had moved.

The tools ultimately best used to produce the desired results in Apache2 are the "redirect" and "rewrite" series of directives. The "redirect" directives, "Redirect" and "RedirectMatch", are part of the Apache2 core. The "rewrite" directives are part of the "mod_write" module, which needs to be loaded if not already done. This can be accomplished using the following Apache2 configuration directives:

    <IfModule !rewrite_module>
        LoadModule rewrite_module /usr/local/lib/apache2/mod_rewrite.so
    </IfModule>

In a nutshell, these directives state that: if the rewrite module is not already loaded, then do so.

For the remainder of this document, it is assumed that the web site resides at domain "www.example.com". It is also assumed that the web page with which all requests will be replaced is in file "index.html" at the root directory of the web site. Thus, the only available web page will be at:

    https://www.example.com/index.html

This will, of course, need to be replaced appropriately when being used.

Redirect HTTP to HTTPS

Since the original web site responded to both HTTP and HTTPS requests, the first step is to simplify the task by directing all requests to only one protocol, in this case: HTTPS. This can be done in more than one way.

Method 1

The simplest approach is to use the "Redirect" directive in the configuration for "http://www.example.com":

    Redirect permanent / https://www.example.com/

This directive takes a request for a web page and redirects it using HTTPS to the same location on the web site for "www.example.com". It does not change the actual web page being requested from that site. This directive can NOT, of course, be placed in the configuration for "https://www.example.com" itself or else it would create an infinite loop.

Method 2

The directive above does not, as indicated, change the URL path name of the location requested. Since the ultimate objective is to direct all traffic to a single specific web page, that approach, therefore, relies on some other mechanism to achieve the required effect. A more efficient way would be to attempt the full redirection directly at this stage, using a slightly modified version of the previous directive:

    RedirectMatch permanent ^ https://www.example.com/index.html

In this case, the entire URL is replaced with the new version, even though the "pattern" only seems to match the very beginning part of the original. As a result, EVERY request received is redirected using HTTPS to "index.html" at the root directory of the web site.

Method 3

An alternative approach is to use the "mod_rewrite" module:

    RewriteEngine on
    RewriteRule ^ https://www.example.com/index.html [L,R=301]

This produces the exact same result as the previous method. Note that the use of "mod_rewrite" requires that the "rewrite" engine must be turned on prior to use, unlike "redirect(match)", which is always available.

Redirect HTTPS Requests to a Single Web Page

The second task is to redirect a request for any web page on a web site (in this case HTTPS-based) so that it responds with a specific single web page. This can not be easily accomplished using the "redirect" directives as they are too limited. In particular, it's not possible to (easily) exclude certain web page requests, which is required in order to prevent infinite loops. Thus, it is necessary to use the "mod_rewrite" family of directives.

Method 1

The most straightforward way to achieve the needed result is as follows:

    RewriteEngine On

    RewriteCond   %{REQUEST_URI}  !=/index.html [NC]
    RewriteCond   %{HTTP_REFERER} !https://www.example.com/ [NC]
    RewriteRule   ^ /index.html [L,R=301]

The "RewriteRule" directive is the same as that of Method 3, above, for redirecting HTTP to HTTPS. By itself, it creates an infinite loop. The "RewriteCond" directives are included to prevent that from happening. The first one makes sure that a request for "index.html" at the root directory of the web site is NOT rewritten. The second one ensures that any web page requested directly from any page on the given web site is also not rewritten, thereby making sure that any components needed by "index.html" are provided as required.

Method 2

One of the key features of all of the suggestions given so far is that they send a response back to the web client directing it to make a new request for a new destination. The "redirect" directives do this implicitly whereas the "rewrite" directives make use of the 'R' flag o accomplish that result.

In the case of redirecting HTTP to HTTPS, there seems to be no way to avoid this as a result of the change in protocol. However, when redirecting a request to a single web page, there is an intriguing alternative. Assume that the destination web page, along with any other web pages that it needs, all reside at the root directory of the web site. In that event, the following set of directives will work:

    RewriteEngine On

    RewriteRule   ^.*/([^/]*)$ /$1

    RewriteCond   %{HTTP_REFERER} !https://www.example.com/ [NC]
    RewriteRule   ^ /index.html [L]

The effect of the first "RewriteRule" drective is to remove any directory portion from the URL path name, leaving only the actual file name portion of the web page location. This is then the equivalent of the full path name of the web page if it resided at the root directory of the web site. The "RewriteCond" directive checks to see if the (original) request was a result of a referral by a web page on the local web site. If so, the rewritten request is passed through as-is. Otherwise, it is replaced by "index.html" from the root directory of the web site, which is then processed.

The net effect of these directives is, thus, that any arbitrary web page request is redirected to "index.html" which, along with any of it's needed components, reside at the root directory of the web site. A simple modification (left as an exercise) allows that location to be changed to any arbitrary directory, but ALL components MUST be in the same directory, as a consequence of the first "RewriteRule".

The appeal of this approach is that it does not send a response back to the web browser to request a new location. The contents of the new location is simply returned. This provides a greater efficiency in that the server does ALL of the work of redirection. An artifact of this is that the client is thus unaware of the redirect and retains the original web page request location, which it (potentially) uses for any subsequent required link requests, which therefore may or may not exist. This is the reason for the first "RewriteRule" and the reason why all local components of the destination web page must reside in the same directory on the web site.