Thursday, December 14, 2006

MOD_REWRITE for Dummies

Author: Bobby Handzhiev

A short article in human language

This article is not a complete quide to Apache's mod_rewtite neither to .htaccess. Its purpose is to help you - the webmaster - to create ""mod_rewriten"" versions of your dynamic webpages even if you have limited technical knowledge. I won't show you all the tips-and-tricks - my aim is to bring all the complexity of the Apache's documentation to 1-2 pages of human lanuage - easy and fast.

What is mod_rewrie?

Mod_rewrite is Apache extension which allows you to ""rewrite"" the URLs of your web pages. If your server supports this technology (most linux webhosts do nowadays) you are able to rewrite virtually any URL into anything you like. Most often it is used to rewrite the URLs of dynamicly generated webpages such as www.mywebsite.com/index.php?par1=1&par2=2&par3=2... This can easy be 'translated' into www.mywebsite.com/par1/par2/par3

Why mod_rewrite?

- Search engine optimization - there are a lot of debates on this topic, but it is still true that the static-looking links rank better than the dynamic ones. Here is a comfirmation from Google on that topic:

""Your pages are dynamically generated. We're able to index dynamically generated pages. However, because our web crawler could overwhelm and crash sites that serve dynamic content, we limit the number of dynamic pages we index. In addition, our crawlers may suspect that a URL with many dynamic parameters might be the same page as another URL with different parameters. For that reason, we recommend using fewer parameters if possible. Typically, URLs with 1-2 parameters are more easily crawlable than those with many parameters.""

- User-friendlyness - Some users remember the URLs visally. Even if they bookmark, they can easier recognize a link like www.mywebsite.com/services.html than www.mywebsite.com/index.php?task=12 for example.

- Security - mod_rewrite helps you hide the parametters passed in the application. Basicly your dynamic pages should be secure enough even without mod_rewrite. But hiding the parametters will decrease the danger of attack

How to use it? Mod_rewrite is really powerful if you are familiar with the regular expressions which it uses. But learning the whole pattern syntax can be quite complicated, especially for the non-technical user. Thats why i'll teach you at several simple patterns which are pretty enough to get your website URLs rewritten.

Lets start: First you need to create a file called .htaccess and place it exactly in the folder where you want the rewriting to take effect (it will also take effect over all subfolders). In case you already have a .htaccess file you can simply add the lines to it (if it already has mode_rewrite directives you can mess them however). Open it in a simple text editor an start with:

Options +FollowSymLinks RewriteEngine on

Now the rewrite engine is switched on. You can now start adding as many rewrite rules as you want. The format is simple:

RewriteRule rewrite_from rewrite_to

Here ""RewriteRule"" is static text, i.e. you should not change. ""rewrite_from"" is the address which will be typed in the browser and ""rewrite_to"" - which page the server will actually activate. Both of these can contain ""masks"", but in ""rewrite_to"" we will only use $ and will discuss more or ""rewrite_from"" part. Let me ""meet you"" with the very few masks you'll need and bring you some samples. You'll see how easy is it.

Let's stop talking theory and see an example. Let's imagine your server runs an e-shop, which uses URLs like index.php?task=categories to list the categories, index.php?task=category&id=5 to show a category contents and other parametters in 'task' to do other things.

RewriteRule ^(.*).html index.php?task=$1

What does all that mean? This is a rewrite rule which allows you to make your URLs looking as ""static"". In this example categories.html will be ""translated"" to index.php?task=categories. So you no longer need dynamic URL to list ther categories, but can write categories.html

But what do all these strange characters mean? - ^ character marks the beginning. I.e. you tell the server that it should not expect anything before it. - (.*) - This combination is the most often used and it means literally ""everything"". So everything you type before "".html"" (i.e. your fake file name) will be passed as: - $1 - This is a parametter, saying where the first mask should be put. If you have more than one masks (masks are everything which you use to represent dynamich text or file names) you can use $2, $3 etc. You'll seemor ein the following examples.

So, if you have categories.html it will be translated info index.php?task=categories, services.html into index.php?task=services etc...

What if you have more than one parametter? First, you should use some characters as delimiter:

RewriteRule ^(.*)-(.*).html index.php?task=$1&language=$2

Here how you can also pass task and language. For example: categories-englist.html will be translated into index.php?task=categories&language=english.

IMPORTANT: If you first write RewriteRule ^(.*).html index.php?task=$1 The second one may not work. You need to always start from the most complicated rule to the simplest one.

Make it Better: The rule (.*) is too general and often may prevent you of making more complicated rewriting rules. So it is recommended that you ""limit"" the rules into something more concrete. Here are a couple of advices:

- Use the ""OR"" operator. In our e-shop example we have only few possible ""tasks"" passed to index.php. Lets say: index.php?task=categories index.php?task=category index.php?task=product index.php?task=services

What will happen if you want to use your static file about.html? It will be rewritten into index.php?task=about and won't work. So you can use the OR operator and limit the rewriting only to the cases you need:

RewriteRule ^(categories|category|product|services).html index.php?task=$1

This tells the server to rewrite only if the file name is categories.html OR category.html OR product.html OR services.html

- Using ""numbers"". You can easy limit the rewriter to rewrite if it meets only numbers at a certain place:

RewriteRule ^category-([0-9]*).html index.php?task=category&id=$1

With ([0-9]*) mask you tell the rewrite engine that on the mask place it should expect onlly numbers. So if it see category-english.html it won't rewrite to index.php?task=category&id=english, but to index.php?task=category&language=english (because of the rule we have shown above - RewriteRule ^(.*)-(.*).html index.php?task=$1&language=$2.).

Complete example: Here is how will look the final .htaccess file for our imaginary e-shop:

-------- Options +FollowSymLinks RewriteEngine on

RewriteRule ^(.*)-(.*).html index.php?task=$1&language=$2. RewriteRule ^(categories|category|product|services).html index.php?task=$1 RewriteRule ^category-([0-9]*).html index.php?task=category&id=$1

About the author: The author is Senior Deveoper in PIM Team Bulgaria and Consultant in SEO PIM Team

0 Comments:

Post a Comment

<< Home