ExpressionEngine CMS
Open, Free, Amazing

Thread

This is an archived forum and the content is probably no longer relevant, but is provided here for posterity.

The active forums are here.

Help Creating a robots.txt File

November 08, 2010 1:46pm

Subscribe [2]
  • #1 / Nov 08, 2010 1:46pm

    Unifusion

    103 posts

    Hi All,

    My site is a real-estate listing site, so the client will be regularly adding and deleting various listings on the site. I’m using the Structure module and I have the real-estate listings set up as Structure listings.

    My client’s SEO person has told me to write a robots.txt file that excludes all the actual listings from being crawled. I’m not sure how to do this since those pages are generated dynamically and change regularly. Can anyone give me some guidance? Thanks!

  • #2 / Nov 08, 2010 7:12pm

    box-head

    177 posts

    Basically you will need to create a text file called robots.txt
    You have to put this in the root directory of the site.

    In it you just list which robots to disallow access to (* means all robots) and which directories/files to block from them.

    Example

    User-agent: *
    Disallow: /listings/

    This would stop all robots accessing any file within the “listings” template group.

  • #3 / Nov 08, 2010 7:20pm

    Unifusion

    103 posts

    Wouldn’t that keep search engines from crawling the “listings” directory at the root of the site? Shouldn’t it be “/system/expressionengine/templates/listings/”?

  • #4 / Nov 08, 2010 7:26pm

    box-head

    177 posts

    OK lets look at this a little differently.
    Which template-group(s) and template(s) need to be block from the search engines?

    The example I gave would stop search engines accessing anything beyond http://www.examplesite.com/listings/

    This presumably contains the templates files you would find in the /system/expressionengine/templates/listings/ directory on the server?

  • #5 / Nov 08, 2010 7:37pm

    Unifusion

    103 posts

    I understand the concept, but wouldn’t writing a robots.txt file that just said “Disallow: /listings/” tell search engines not to crawl the directory named “listings” at the root of my site?

    What if I had two directories named “listings” in two different places on my site? How would the search engine know which one to crawl and which one to not crawl?

    So shouldn’t the robots.txt file look like this?:

    User-agent: *
    Disallow: /system/expressionengine/templates/listings/
  • #6 / Nov 08, 2010 7:49pm

    box-head

    177 posts

    Yep that’s right search engines wouldn’t crawl http://www.examplesite.com/listings/

    I’m not sure I understand why/how there are 2 directories in the same place with the same name on your site…?
    Can you write which specific directories or individual files/templates need to be blocked and we can work from there.

  • #7 / Nov 08, 2010 8:07pm

    Unifusion

    103 posts

    They’re not in the same place.

    ./listings/

    is different from

    ./system/expressionengine/templates/listings/

    One is at the root of the site, the other is in the templates folder.

    However all this talk about template groups is moot because I’m using the Structure module, which bypasses the standard template system.

  • #8 / Nov 09, 2010 5:01am

    box-head

    177 posts

    Ah yea I forgot that you mentioned using Structure.
    Looking at it from a browsers point of view then, which urls should the spiders not be indexing?

  • #9 / Nov 09, 2010 12:13pm

    Unifusion

    103 posts

  • #10 / Nov 09, 2010 12:58pm

    box-head

    177 posts

    This should probably do it then

    User-agent: *
    Disallow: /find_a_community/community_profiles/
  • #11 / Nov 09, 2010 2:17pm

    Unifusion

    103 posts

    Correct me if I’m wrong, but wouldn’t that instruct search engines not to index files in a directory called “community_profiles”? Obviously there’s no directory like that on the site, since the pages are generated dynamically.

    Is there a way to write a robots.txt file using EE tags?

  • #12 / Nov 09, 2010 6:47pm

    box-head

    177 posts

    Hmmm good question. I’m not sure so if anyone else wants to chime in and clarify.

.(JavaScript must be enabled to view this email address)

ExpressionEngine News!

#eecms, #events, #releases