ExpressionEngine CMS
Open, Free, Amazing

Thread

This is an archived forum and the content is probably no longer relevant, but is provided here for posterity.

The active forums are here.

Removing phantom URL's

November 18, 2010 10:58pm

Subscribe [3]
  • #1 / Nov 18, 2010 10:58pm

    JCDerrick

    57 posts

    I have been meaning to ask this for a long time. I have noticed Google is indexing pages on my EE site that do not exist. I call these phantom pages. One such page is this:
    /index.php/maui/list/Kauai/P130/

    Everything is legit up until the /Kauai/P130/ part. A real paginated /maui/list url would be as follows:
    /index.php/maui/list/category/maui_sights/P10/
    or
    /index.php/maui/list/category/maui_beaches/P30/

    Basically, /index.php/template_group/template/category/category_name/pagination/

    My problem is, for all those fake pages, a page DOES actually load (no 404) - but no where on the site does such a page actually exist. It’s a phantom page and I can’t help but think pages like this are killing me SEO wise because Google is indexing fake pages over and over again. Because of pagination, it indexes hundreds of these fake pages, like below, which is the next fake/phantom paginated page on the above fake URL.
    /index.php/maui/list/Kauai/P120/

    I have strict URL’s turned on, but these pages continue to load. Is there anything I can put in my code to remove these fake URL’s and make them return as 404’s? The real URL structure for the above surely can be controlled so Google stops indexing all these fake URLs.

    Any help would be greatly appreciated.

    Thanks,
    John

  • #2 / Nov 18, 2010 11:03pm

    JCDerrick

    57 posts

    Some of these fake URL’s just are downright strange, like this one I also just found.

    /index.php/maui/list/Maui/content/feedback/content/Guide859.html/P100/

    Everything after the /maui/list/ is all imaginary and doesn’t actually exist.

  • #3 / Nov 19, 2010 11:16am

    Ingmar

    29245 posts

    Google doesn’t usually make up link, existing or not, so it’s likely that it encounters them somewhere (on your site or elsewhere). Using strict urls or the require_entty=“yes” parameter would be a good way to start.

  • #4 / Nov 19, 2010 11:24am

    JCDerrick

    57 posts

    Thanks, I did some more reading and came up with a solution, albeit not the perfect one. I removed pagination from the /list/ pages and put this in the header to remove all the fake urls.

    {if segment_5 || segment_3!="category" || segment_4=="..."}{redirect="404"}{/if}

    So far that seems to be working… I just have long pages of content now in places. But I figure that’s better than having a lot of incorrect URL’s getting indexed.

    I suspected there might be a link somewhere on my site that was broken that started this mess. Finding it would be difficult because of the size of the site unfortunately.

  • #5 / Nov 19, 2010 5:16pm

    Sue Crocker

    26054 posts

    John, it sounds like you have a workaround.. Don’t hesitate to post again as needed.

.(JavaScript must be enabled to view this email address)

ExpressionEngine News!

#eecms, #events, #releases