x
 
Create New Page
 View Previous Changes    ( Last updated by tbritton )

Regular Expressions in .htaccess

Category:.htaccess
Category:Regular Expressions
Category:RegEx
Category:SEO
Category:Google

The .htaccess file found on most Unix-based servers is a handy hidden file that can be used to:

1) Redirect pages whose template group names or template names have changed.

2) Redirect old bookmarks and search engine crawls pointing to former HTML versions of pages on your site to your new, sparkly EE templates

3) Force older bookmarked index.htm landings to your site’s index.php

4) Force the search engines to treat the non-www and the www versions of your URL as a single website (which also combines the inbound links count together, improving your search engine rankings).
... among other things.

Here I’ll share a few redirects I’ve used, including some fancy ones that worked redirecting a hundred-fifty or so HTML pages to their new database-driven equivalents.
———————————————————————————————————————
Start with this directive at the top of your .htaccess file:
RewriteEngine On
———————————————————————————————————————
1) Redirect pages whose template group names or template names have changed.
The next directive told the system to redirect all attempts to go to an older template_group/template structure to instead point to the new names for those templates:

RewriteRule issues_section\/issues_articles_community\/(.*)$    http://www.mysite.com/index.php/community_section/community_articles_news/$1 [L,R=Permanent]
RewriteRule issues_section\/issues_articles_controversy\/(.*)$    http://www.mysite.com/index.php/community_section/community_articles_passages/$1 [L,R=Permanent]
RewriteRule issues_section\/archives\/(.*)$    http://www.mysite.com/index.php/community_section/archives/$1 [L,R=Permanent]
RewriteRule issues_section\/archives_community\/(.*)$    http://www.mysite.com/index.php/community_section/archives_news/$1 [L,R=Permanent]
RewriteRule issues_section\/archives_controversy\/(.*)$    http://www.mysite.com/index.php/community_section/archives_passages/$1 [L,R=Permanent]
RewriteRule issues_section\/results\/(.*)$    http://www.mysite.com/index.php/community_section/results/$1 [L,R=Permanent] 

Be sure to remember to change your paths settings in the Weblog preferences Admin as well, for search to work correctly. The last one is used to redirect any stored searches in Google or to cover when coming in from bookmarked searches.
———————————————————————————————————————
2) Redirect old bookmarks and search engine crawls pointing to former HTML versions of pages on your site to your new, sparkly EE templates
The following redirected the html older static version of some chapter pages to land instead on their EE template-driven dynamic equivalents:

RewriteRule asheville\/asheville.htm   http://www.mysite.org/index.php/chapters/asheville [L,R=Permanent]
RewriteRule charlotte\/charlotte.htm   http://www.mysite.org/index.php/chapters/charlotte [L,R=Permanent]
RewriteRule triangle\/triangle.htm   http://www.mysite.org/index.php/chapters/triangle [L,R=Permanent] 

The use of the back-slash to “escape” certain special characters is important in the “conditional” side of the statement. Above I “escape” the forward-slash.
The use of [L,R=Permanent] tells the search engines’ crawlers that this is a permanent change, so they will alter their index to remove the old path and add the new one.
———————————————————————————————————————

The next two directives handle a difficult situation - folders within folders containing an antique HTML file holding a sort-of categories/TOC file, and folders within folders within folders having the target files. In this situation, we had a plant gallery, and prior to the EE conversion each plant’s individual HTML file existed in its own plant name folder (plant_name universally spelled with an underscore), and that folder lived in an alphabet-named folder, which lived in the plants folder. So, the old path was like this:

mysite.org/plants/p/plant_name/plant_name.htm 

This project required two changes - one to point to the dynamically created new TOC generated by the alpha categories, and another to point the plant_name to its EE template_group/template/plant-name dynamically generated equivalent.

Within the letter folder was a table-of-contents style table that listed all the plants of that letter-category with links to each. (I have so much respect for the diligent former designer of this site who did the entire thing in HTML pages and tables!) Its path was like this:

mysite.org/plants/p/p.htm 

This had to be converted on-the-fly to fit the new template and category structure. First, let’s point to the “listings” template. I used the word “of” as the category indicator, with alpha letters as category names, so the new template path showing a table of all plants with scientific names starting with “p” is now:

mysite.org/index.php/plants/listings/of/

Here is how “RegEx” or Regular Expressions are used to do this:

RewriteRule plants\/([a-zA-Z])\/?$ http://www.mysite.org/index.php/plants/listings/of/$1 [L,R=Permanent] 

This says, “redirect plants-slash-any_alpha_letter-slash if there is a slash (the question mark does that)End(nothing follows) to the EE template TOC using the letter captured by the parentheses $1 as the category name in the URL.
———————————————————————————————————————
Now, how to redirect the plant detail pages:

mysite.org/plants/p/plant_name/plant_name.htm 

to point to the new details template:

mysite.org/index.php/plants/details/plant-name 

This was slightly complicated by the fact that the new database structure we used hyphens instead of underscores for article URL titles, so the text to the left and right sides of the underscore had to be taken first and then recombined with a hyphen between them instead. Here’s how that went.

RewriteRule plants\/[a-zA-Z]\/([a-zA-Z0-9]*)_([a-zA-Z0-9]*)\/.*  http://www.mysite.org/index.php/plants/details/$1-$2 [L,R=Permanent] 

This selects plants and any alpha-named folder afterwards (a single letter):

plants\/[[a-zA-Z]\

then “grabs” the first alpha contents using the parentheses to grab with

([a-zA-Z0-9]*) 

(the asterisk means “take anything alphanumeric”)

...then includes the selection of the underscore following, but does not grab that.

...then grabs the second part of the plant_name. The combination of these results in:

([a-zA-Z0-9]*)_([a-zA-Z0-9]*) 

It replaces what it found with the template_group/template path followed by the hyphen-separated scientific name:

http://www.mysite.org/index.php/plants/details/$1-$2 [L,R=Permanent] 

...then includes the forward slash and any other text following it with the asterisk:

\/.* 

This was a rough one, but regular expressions allowed it to happen. So, to summarize, this one says This says, “redirect plants-slash-grab any_alpha_letter-slash-grab any_alpha_numeric_text-don’t grab the underscore-grab any_alpha_numeric_text-slash-anything_following_that to the EE template_group/details page using the combination of the first grab $1 followed by a hyphen followed by the contents of the second grab $2 acting as the article name in the URL.” Whew!!
———————————————————————————————————————
3) Force bookmarked and search-engine index.htm landings to instead go to your site’s index.php
The next directive is simple. It just redirects index.htm or index.html to index.php

RewriteRule index.htm    http://www.mysite.org/index.php/ [L,R=Permanent]
RewriteRule index.html    http://www.mysite.org/index.php/ [L,R=Permanent] 

This forces old bookmarked index.HTML or index.HTM entries to the index.PHP version.
———————————————————————————————————————
4) Force the search engines to treat the non-www and the www versions of your URL as a single website (which also combines the inbound links count together).
This was mentioned elsewhere in the Wiki, but here for completeness in this article is the redirect that forces the non-WWW version of your site url to the WWW version:

RewriteCond %{HTTP_HOST} ^mysite.org [NC]
RewriteRule 
^(.*)$ http://www.mysite.org/$1 [L,R=301] 

Be forewarned that you may have to specify download links with the complete www.mysite.com version of the download URL for file access to your server to work. This only affects file downloads, however.
———————————————————————————————————————
You can find a great video tutorial and cheat-sheet for Regular Expressions at the e-texteditor.com website.

The video tutorial and a downloadable cheat-sheet can be found in the blog archives or by following this link.
———————————————————————————————————————

Terry

Category:EE1
Category:EE2