IISRewrite Documentation - www.qwerksoft.com

Table of Contents

Summary

Credits

Shareware Version Limitations

System Requirements 

Installation
    Downloading
    Unpacking
    Installing the DLL
    Configuring IIS
    Testing

Configuration
    Making Changes
    The Configuration File
    Supporting Multiple Virtual Servers

Configuration Directives
    RewriteEngine
    RewriteLog
    RewriteLogLevel
    RewriteCond
    RewriteRule

Back References

Environmental Variables

Regular Expressions

Practical Uses

    ASP based download
    Security Screen
    Protect files on a site
    Making dynamic sites indexable


Summary

IIISRewrite is a rule-based rewriting engine that allows a Webmaster to manipulate URL's on the fly based on regular expressions. URL's are rewritten before IIS has handed off the request to be processed so requests for graphics, html files, and even entire directory structures can be passed to ASP files for processing.

IISRewrite was written because it solves some practical problems that are nearly impossible to solve with IIS and ASP. It solves compatibility issues when doing dynamic downloads with ASP, it allows portions of dynamic sites to be indexed by search engines as if they were static html, and can provide a way to customize static html based on which browser a client is using without java script. See the "Practical Uses" section for details and more ideas.

IISRewrite is a stripped down implementation of Apache's mod_rewrite modules for IIS. For those Webmasters who have used Apache's mod_rewrite in the past you will find much of the configuration files and functionality similar. Some of the more esoteric functions have not been included in the interest of simplicity and speed but the basic functionality remains.


Credits

IISRewrite uses Regex++ by Dr John Maddock copyright 1998-2000. Regex++ can be found at http://www.boost.org and is an excellent regular expression library. We thank Dr Maddock for making it available.


Shareware Version Limitations


The shareware version of the library has all the functionality of the full version but will stop rewriting after two hundred URL manipulations until the web server is restarted.


System Requirements

IISRewrite is compatible with Microsoft's ISAPI specification and has been tested on Windows NT 4 running IIS v4 and Windows 2000 running IIS v5.


Installation


Downloading

To download a free trial of IISRewrite visit www.qwerksoft.com/products/iisrewrite/download.asp

Unpacking

IISRewrite is distributed as a zip file. Unpacking the zip file creates an IISRewrite directory containing this documentation file, a sample configuration file (rewrite.ini), a test html file (helloworld.html), and the IISRewrite DLL.

Installing the DLL

Installing the program files is simply a matter of making sure that the IISRewrite DLL and its configuration file are in the same directory. We currently use the WINNT\System32\inetsrv\IISRewrite directory. It is highly recommend that the program files NOT be installed in the web server's home directory for security reasons.

The configuration file by default is configured to rewrite all requests containing helloworld to /helloworld.html for testing. If you want to run the tests copy the html file (helloworld.html) into the web servers home directory.

Configuring IIS
To load IISRewrite into IIS follow these steps:
· Open up Internet Services Manager
· Expand Internet Information Services by double-clicking
· Expand the server name by clicking on the + symbol to the right of the name
· Right click on the name of the website that will be using IISRewrite
· Click properties
· Select the ISAPI Filter tab
· Click Add
· In the Filter Name box enter IISRewrite
· Click browse and select the IISRewrite.dll from the WindowsNT system32 directory
· Click OK
· Click OK again.
· Restart the web server.

Testing

Open up a browser and connect to the website and request /test/helloworld/. If the test was successful you should see the helloworld html page that was copied to the web server's home directory during installation. If the test fails check that the rewrite.ini, rewrite.dll, and helloworld.html file are in the correct directories with the correct permissions. For issues related to your test configuration, tech support can be contacted at support@qwerksoft.com or through our website at www.qwerksoft.com.


Configuration

Making Changes
IISRewrite uses a text based configuration file that can be edited with Notepad or any other text editor. Changes in the configuration file will not take effect until right clicking on the machine in the Internet Services Manager and selecting "Restart IIS".

The Configuration File
Each line of the configuration file contains a configuration directive followed by one or more arguments. Lines that begin with hash characters '#' are considered comments and are ignored. A simple configuration file that consists of two configuration directives might look like this: 

    #My Configuration File
    RewriteEngine On
    RewriteRule ^/global.asa /trap.asp

This configuration file consists of a single comment and two configuration directives. The first directive "RewriteEngine" turns on URL rewriting (required but not interesting). The second directive "RewriteRule" is where most of the work gets done. 

The first argument to RewriteRule is a pattern that is matched using regular expressions against the requested URL. If there is a match the URL is completely replaced with the second argument to RewriteRule called the substitution. In this case any request that starts with /global.asa would be rewritten to /trap.asp. This may be useful for logging hacking attempts against your web server.

The substitution used in a RewriteRule can be made dynamic by the use of back references. When a pattern is matched against a URL the portions of the patterns in parentheses are stored in special variables called back references $N (N=1-9). These back references can be used to replace portions of the substitution. A configuration file using back references might look like this:

    #My Configuration File
    RewriteEngine On
    RewriteRule /class/(.*)/calendar/(.*)/ /classcalendar.asp?c=$1&m=$2 

In this example a URL /class/chem101/calendar/June/ would be rewritten as /classcalendar?c=chem101&m=June. With back references it is possible to create easy to remember URL's that are dynamically generated by scripts. These types of URLs are also more search engine friendly and make it possible for more of a site to be indexed.

Once a URL is matched against a RewriteRule and rewritten the new URL continues to be matched against other RewriteRules until the end of the configuration file is reached. The last flag can be used to stop further URL rewriting if there is a match. A web server designed to log potential automated intrusion attempts may have a configuration file like this:

    #My Configuration File
    RewriteEngine On
    RewriteRule ^/$ - [L]
    RewriteRule ^/web/ - [L]
    RewriteRule /.* /logtrap.asp

This rule set would allow requests to the root directory "/", the web directory "/web/", and send any other request to an asp page the logs the request. The "-" character has special meaning and signals IISRewrite to leave the requested URL alone if there is a match. The full details of the configuration directives are documented below.

Supporting Multiple Virtual Servers
Once IISRewrite is installed and working with a single web site on a machine it is a simple matter to install IISRewrite on other web sites on the same machine. The simplest method of supporting additional web sites is to install multiple copies of IISRewrite in different directories according to the installation instructions. Each web site will get its own instance of IISRewrite and its own configuration file.

IISRewrite also has a more advanced option where multiple web sites can share the same DLL and configuration file but have customized rule sets. This can simplify administration for web servers that support hundreds of web sites and don't want multiple copies of IISRewrite on the server. It can also be useful in cases where an administrator wants different rules to fire based on ports, IP addresses, or host header name.

To allow multiple web sites to share the same configuration file IISRewrite uses configuration blocks that look like this:

    #IP based virtual host on port 80
    <VirtualHost 192.168.0.4>
    RewriteEngine Off
    </VirtualHost>

    #Name Based virtual host on port 443
    <VirtualHost 192.168.0.3:443>
    NameVirtualHost name1.qwerksoft.com name2.qwerksoft.com
    RewriteEngine On
    RewriteRule ^/global.asa /test.asp
    </VirtualHost>

Each block starts and ends with a VirtualHost tag. Each configuration block can contain any valid configuration directive and an optional directive NameVirtualHost. The NameVirtualHost directive is used with named based virtual hosts where multiple web sites share the same IP address. The arguments to this directive are the host names that the site is know by. They should match the names under the "Host Header Name" field in the "Advanced Multiple Web Site Configuration" pane in Microsoft's IIS Internet Services Manager.

When IISRewrite loads it parses these configuration blocks and compiles a rule set for each block. IISRewrite then creates an index for these rules sets based on the IP address, port, and NameVirtualHost directive. For the above configuration file the index would look something like this:

Index Key Rule Set
192.168.0.4:80 Rule Set 1
192.168.0.3:443:name1.qwerksoft.com Rule Set 2
192.168.0.3:443:name2.qwerksoft.com Rule Set 2

When a request is made IISRewrite first looks in the index for a match on the IP address, port, and host header. If there is a match the corresponding rule set if fired. If no match is found IISRewrite looks for a match on the IP address and port and fires that rule set. If IISRewrite cannot find a match it passes the request to IIS unchanged.


Back References
When expressions are placed in parentheses the portion of the string that they match against is placed in a temporary holding areas. By using back references these temporary holding areas can be used in the substitution string in the RewriteRule directives. The notation for back references is $N (N=1-9). When the string "/1763/640x480/engine.gif" is matched against the regular expression ^/(.*)/(.*)/.* the back references $0, $1 would contain 1763 and 640x480 respectively. 


Configuration Directives

RewriteEngine
Syntax: RewriteEngine on|off
The RewriteEngine directive enables or disables IISRewrite. When RewriteEngine is set to off, IISRewrite does no runtime processing at all.

This directive can be used to disable IISRewrite without uninstalling the module. If the configuration file is missing a RewriteEngine configuration directive IISRewrite defaults it to off. 

Example:
RewriteEngine on

RewriteLog
Syntax: RewriteLog filename
The RewriteLog directive sets the name of the file to which the server logs any rewriting actions it performs. If the name does not begin with a drive letter then it is assumed to be relative to the Windows NT system32 directory. The directive should occur only once per configuration file. Commenting out the RewriteLog directive or setting RewriteLogLevel to 0 will disable logging.

Example:
RewriteLog "C:\Winnt\system32\logs\rewrite.log"

RewriteLogLevel
Syntax: RewriteLogLevel 0-9
The RewriteLogLevel directive sets the verbosity level of IISRewrite's log file. The default level 0 means no logging while 9 means that nearly everything is logged. Logging can be disabled by setting RewriteLogLevel to 0.

Example:
RewriteLogLevel 9

RewriteCond
Syntax: RewriteCond TestString CondPattern [flags]
The RewriteRule directive can be preceded with one or more RewriteCond directives. The RewriteCond directives determine if the following RewriteRule directives are applied. When a RewriteCond directive is encountered the TestString is matched against the CondPattern argument. A positive match results in the RewriteCond directive being evaluated as TRUE.

The TestString pattern can contain constructs of the form %{Variable} in addition to plain text where Variable can be taken from the following list:

HTTP Headers: Connection:
HTTP_USER_AGENT REMOTE_ADDR
HTTP_REFERER REMOTE_HOST
HTTP_COOKIE REMOTE_USER
HTTP_FORWARDED REQUEST_METHOD
HTTP_HOST
HTTP_ACCEPT
Server Internals:
SERVER_PORT_SECURE
SERVER_PROTOCOL
LOCAL_ADDR

These variables correspond to the variables available with the Request.ServerVariables() function in ASP. If the variable isn't defined for the connection (for example the browser didn't send a HTTP_REFERER string) it is treated as if it had a null value. The CondPattern argument to the RewriteCond directive is a regular expression that is matched against the TestString where TestString is a regular expression. The RewriteCond can be negated by prefixing the CondPattern with a '!' character.
The [flags] argument of the RewriteCond directive can be used to determine if the RewriteCond directives will be combined with OR instead of the implicit AND. Currently the only valid flag is the [OR] flag which means "or the next condition." The OR and implicit AND flags cannot be combined.

Example:
Send common email harvesting robots to my bait page:
RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailWolf [OR]
RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro 
RewriteRule ^.*$ /bait.asp [L]

Explanation: If the client is EmailSiphon, EmailWolf, or ExtractorPro send them to my /bait.asp page which generates fake email addresses for them. (For a more complete list of email harvesters check: http://mosa.unity.ncsu.edu/brabec/antispam.html)

RewriteRule
Syntax: RewriteRule Pattern Substitution [flags]
The RewriteRule is where most of the work gets done. IISRewrite operates on RewriteRule directives in the order that it appears in the configuration file. IIS Rewrite attempts to match the regular expression Pattern against the current path info + query string. If the preceding RewriteCond directives are satisfied and the Pattern matches the current path info + query string they are replaced with Substitution.

The Substitution argument is the string which replaces the current path info + query string. This can be plain text but can also contain back references to the groupings of the matched pattern represented by $N (N=1-9). There is also a special substitution string '-' which means no substitution. This can be used in combination with the [L] flag to exempt some strings from being rewritten that would normally be rewritten by some later rewrite rule.

The RewriteRule directive can be negated (if the pattern does not match) by preceding the pattern with the '!' character (NOT character). When using the NOT character it is impossible to use back references in the Substitution.

IISRewrite will continue to operate on each RewriteRule until it reaches the end of the configuration file. The effects of the RewriteRule directives are cumulative with each RewriteRule taking as input the substitution of the previous RewriteRule directive. This can be short circuited with the last [L] flag, which tells IISRewrite to ignore the following RewriteRule directives when the current RewriteRule is a match.

Adding a comma-separated list of flags as the last argument can modify the behavior of a RewriteRule. IISRewrite supports the following flags:

· last|L (Last Rule)
The Last flag stops the URL rewriting and causes IISRewrite to pass the request back to IIS without applying additional rules.

· Chain|C (Chain to next Rule)
The Chain rule can tie several rewrite rules together. If there is a match on a chained rule operation continues on normally and the next rule is called. If the RewriteRule doesn't match then all the following chained rules are skipped.

· Forbidden|F (Forbidden)
Immediately return to the client a 403 (Forbidden) response to the client.

· Redirect|R[=code] (Redirect)
Force an external redirection to the URL in the substitution with the response code equal to code (Defaults to 302 (Moved Temporarily). If the substitution isn't an external url (doesn't start with http) then the substitution is prefixed with http://servername:serverport/ before being sent to the client.

· UnMangleLog|U (Unmangle Log)
Log the URL as it was originally requested and not as the URL was rewritten.

Example:
To rewrite this URL:

    /download/17269/banner.gif 

to

    /download.asp?id=17269

Use the following RewriteRule"

    RewriteRule ^download/(.*)/.* /download.asp?id=$1



Environmental Variables
IIS Rewrite adds an additional (non-standard) server variable named HTTP_SCRIPT_URL that contains the original path info + query string before any manipulation. This can be retrieved in ASP with Request.ServerVariables(HTTP_SCRIPT_URL).    

Regular Expressions

There have been many books and papers written about regular expressions and it is impossible to give more than a cursory overview of the syntax. Regular expressions are formulas to match a string with a pattern. A regular expression is made up of normal text and metacharacters, which have special meaning. The table below lists the metacharacters and their meanings.

Metacharacter

Description

.

Match any single character.  The regular expression m.re would match more or mare but not moore.

$

Matches the end of a line.  The regular expression great$ would match “This is great” but not “This is the greatest

^

Matches the beginning of a line.  The regular expression ^IIS would match “IIS 21 days” but not “Mastering IIS”.  The regular expression ^$ is used to match a null value.

*

Matches zero or more occurrences of the preceding character.  The regular expression a*b would match “ab” “aab” or “b”.  The regular expression .* is commonly used as wildcard and will match any number of characters.

\

This character is used to “turn off” the special meaning of other metacharacters.  To have the ‘.’ character mean period and not “any character” precede it with a back slash as in the regular expression index\.html.  Forgetting to escape the ‘.’ character is a common mistake.

[ ]

Match any one of the characters in brackets.  The regular expression ^[bcf]at would match bat, cat, or fat but not mat.

[C1-C2]

Match against a range of characters in the brackets.  The regular expression ^[A-Z0-9] would match any alphanumeric character.

[^C1-C2]

Match anything but the characters in the brackets.  The regular expression [^0-9A] would match any character except numbers and the letter ‘a.’

|

Match either text.  For example (dog|cat) would match the string “she has a dog” or “she has a cat” but not “she has a bird.”

+

Match one or more occurrences of the preceding character.  The regular expression a+b would match “ab” or “aaaab” but not “b”

?

Match zero or one occurrences of the preceding character.  The expression a?b would match “ab” or “b” but not “aab”

( )

Treat the characters as a group.  When used in a RewriteRule it saves the group in a temporary holding area that can be referenced as $N where N=1-9.

For an in depth look at regular expressions O'Reilly's Mastering Regular Expressions (ISBN 1-56592-257-3) is highly recommended. 


Practical Uses


ASP based downloads
Sites that have thousands of files that they need to manage often find it simpler to place these files in a database instead of in the document root of the web server. It is simple to use Active Server Pages to serve these files using readily available file download components. The URLs for the pages look something like this.

http://www.somesite.com/getfile.asp?fileid=1007

In the past these types of URLs have caused problems with various proxy servers and browsers. Many proxy servers will refuse to cache a file with a query string in the URL. Many browsers with download managers installed will save files as the name of the ASP page instead of the file name that was set in the MIME headers.

The solution to this problem is to make the files look as if they are in the document root of the web server instead of a database. With IIS rewrite and this configuration file:

RewriteEngine On
RewriteRule ^/filearea/(.*)/.* /getfile.asp?fileid=$1

URLs that look like this:

http://www.somesite.com/filearea/1007/download.exe

Are handed to ASP as this:

http://www.somesite.com/getfile.asp?fileid=1007

No more problems with proxy servers or browser compatibility.

Security Screen
Anyone who has worked with Microsoft's IIS server is probably aware of the many "show code" vulnerabilities. With a stock installation of IIS 4.0 it was possible to make request like:

http://www.somesite.com/global.asa+.htr

And IIS would dump the contents of the global.asa file and all of its passwords. There are currently patches and workarounds available to solve the problems but with IIS Rewrite it is simple to create a trap for anyone that is probing your web server. Create an ASP page that sends an email alert to the webmaster and dumps a fake global.asa file. The following configuration file is all that is needed to set the trap:

RewriteEngine On
RewriteRule global.asa /trap.asp

Protect files on the site
A site that offers interesting things for download, such as freeware or graphics, is usually dependent on advertising revenue to support those activities. It is not uncommon for other sites to link directly to these files from their own pages, which allows users to bypass the pages with the advertising. A simple configuration file can solve this problem:

RewriteEngine On
RewriteCond ${HTTP_REFERER} ^$ [OR]
RewriteCond ${HTTP_REFERER} ^http://www.mysite.com/.*
RewriteRule /filearea/.* - [L]

RewriteRule /filearea/.* /filearea/nofile.jpg [L]

Which says "If the HTTP_REFERER isn't set or they are coming from my site send them the file, otherwise send them a file that doesn't exist.

Making dynamic sites indexable
Many search engines will refuse to index pages that contain query strings. With IISRewrite these sites can be made to look as if they were static so they can be indexed. For a site that has URL's of the form:

http://www.somesite.com/realestateagent.asp?id=124873

IISRewrite can allow you to use the following URL instead:

http://www.somesite.com/agents/124873/index.htm

A simple configuration file:

RewriteEngine On
RewriteRule /agents/(.*)/.* /realestateagent.asp?id=$1

Can make that portion of the site indexable.