Since this is not directly EE related I thought the Lounge was the appropriate location for this question…
I know there is no perfect answer here, but I was looking for guidance (from smarter folks than me) on what a “good” robots.txt file contains. Do you refuse by default? Which ones do you allow? Just looking for opinions here. Using a robot.txt generator, I came up with this:
—-
# robots.txt generated at http://www.mcanerin.com
User-agent: Googlebot
Disallow:
User-agent: MSNBot
Disallow:
User-agent: Slurp
Disallow:
User-agent: Teoma
Disallow:
User-agent: ia_archiver
Disallow:
User-agent: *
Disallow: /
Crawl-delay: 20
Disallow: /cgi-bin/
Sitemap: http://ebohling.com/bryce/sitemap
—-
good? bad? indifferent?
thanks,
—bb—