ExpressionEngine CMS
Open, Free, Amazing

Thread

This is an archived forum and the content is probably no longer relevant, but is provided here for posterity.

The active forums are here.

How to re-write our URL structure to combat scraper sites?

March 01, 2011 9:49am

Subscribe [1]
  • #1 / Mar 01, 2011 9:49am

    Miami Beach 411

    28 posts

    Hello,

    Google is having a hard time indexing our blog posts, which I think has to do with how our weblog urls were originally configured.

    Not sure why, but when the developer set up our weblogs, she configured the blog post urls with two links, a link to the summary of the blog post, like this -

    http://www.miamibeach411.com/news/index.php?/news/miami-techies/

    And a link to the full blog post, like this -

    http://www.miamibeach411.com/news/index.php?/news/comments/miami-techies/

    The problem is, Google is only indexing this page -

    miamibeach411.com/news/index.php?/news/miami-techies/

    It’s causing us problems because it is allowing scraper sites to appear as the original creator of the post.

    Any advice how to re-write our urls?

    How do I make our urls look prettier, and more search engine friendly?

    Thanks,

    Gus

  • #2 / Mar 03, 2011 6:24am

    joost

    62 posts

    If I search for cache:http://www.miamibeach411.com/news/index.php?/news/comments/miami-techies/ Google says it crawled it 2 days after the post (19 february), so the post is indexed.

    What I would do is to add a rel=canonical to http://www.miamibeach411.com/news/index.php?/news/miami-techies/. This is possible with the String plugin (EE2): http://www.emarketsouth.com/add-ons/string-plugin/examples/#canonical-links or there is an EE1 plugin on Devot:ee called Canonical, I believe.

    On the summary and the article page include: <link rel=“canonical” href=“http://www.miamibeach411.com/news/index.php?/news/comments/miami-techies/” />

    This says to Google, don’t get the summary page but get the full article.

    You can battle against (rss)scrapers by including absolute links in your rss feeds as well as including the full link with anchor text at the end of all articles.

    This might help.

    To overcome indexing problems you could take a loot at Solspace Tag module to get a decent site/link structure or include related posts at the end of each article so older articles get also new fresh links. Also, be sure to put links in your new articles to older related articles. Indexing is about (internal) link structure. You can use Xenu Link Sleuth (http://home.snafu.de/tilman/xenulink.html) to see which articles and sections have the least incoming links (and are thus not important through Google’s eyes).

  • #3 / Mar 03, 2011 6:56am

    Miami Beach 411

    28 posts

    Hi Joost,

    Great ideas!

    Thank you very much for telling me about the plugins, software, and mods… I’m researching them now.

    If you ever come to Miami, please contact me in advance. I want to make sure you get the VIP treatment! =)

    Gus

.(JavaScript must be enabled to view this email address)

ExpressionEngine News!

#eecms, #events, #releases