ExpressionEngine CMS
Open, Free, Amazing

Thread

This is an archived forum and the content is probably no longer relevant, but is provided here for posterity.

The active forums are here.

Odd, invalid urls indexed by Google

February 10, 2012 7:55am

Subscribe [3]
  • #1 / Feb 10, 2012 7:55am

    Linda A

    666 posts

    Looking up my site in Webmaster Tools, I am finding quite a few odd, invalid links indexed by Google.

    Some examples:

    <pre><code>
    http://www.westeros.org/ASoWS/News/00/P975/

    http://www.westeros.org/ASoWS/News/Entry/3578/2>previously reported</a> that award-winning artist <a

  • #2 / Feb 10, 2012 7:58am

    Linda A

    666 posts

    Ack, the forum of course strips some parts of the second url. But somehow it seems to have a partial link as part of the url.

  • #3 / Feb 11, 2012 9:17am

    Sean C. Smith

    3818 posts

    Linda,

    Have you set up a 404 page and configured EE to use it(docs)? I’m sure with a 404 page set up correctly google will stop indexing non-existent pages.

    Sean

  • #4 / Feb 11, 2012 9:22am

    Linda A

    666 posts

    I’m afraid I am stuck on 1.x which doesn’t have those options, does it?

  • #5 / Feb 13, 2012 5:33pm

    Dan Decker

    7338 posts

    Hi Linda,

    I’m afraid I am stuck on 1.x which doesn’t have those options, does it?

    Indeed, it does. Go to Templates->Global Template Preferences and you will see the options to enable strict URLs and set a 404 template.

    Cheers!

  • #6 / Feb 13, 2012 5:44pm

    Linda A

    666 posts

    Ah, thank you, will give that a try and see if it makes a difference. 😊

  • #7 / Feb 14, 2012 5:52pm

    Shane Eckert

    7174 posts

    Hi Linda,

    We will be here if you need us. Just let us know.

    Cheers,

  • #8 / Mar 31, 2012 3:25pm

    Linda A

    666 posts

    Hi,

    Revisiting this since it seems that search bot hits (I presume that is what it is, anyway) on invalid pages appear to add a lot of load to our pages. Initially, I didn’t want to try the strict URLs feature since I have some pages I suspect will break, but I figured I would give it a try.

    However, it doesn’t seem like its working. I enabled it under Global Template Preferences and select a 404 template, but if I try something like http://www.westeros.org/Updates/Fnark (Updates is a template group, but it doesn’t contain any templates called Fnark), I don’t get a 404.

    Am I misunderstanding something? Or could it be affected by how I am doing my index.php removal and various other rewrites?

    Also, is my assumption correct that the following url also ought to be disallowed by strict urls:

    http://www.westeros.org/Contact/ where Contact is a template in the Westeros template group which I use an .htaccess rewrite to avoid actually displaying.

  • #9 / Apr 04, 2012 2:27pm

    Shane Eckert

    7174 posts

    Hey Linda,

    Can you show me the contents of your .htaccess file?

    ExpressionEngine will only display the 404 Template if the requested Template Group in the URL does not exist.

    So if the template group is good, then the 404 will not be thrown.

    Does that help?

    Cheers,
    Shane

  • #10 / Apr 04, 2012 2:39pm

    Linda A

    666 posts

    Doesn’t the Strict Urls mean it has to have both a template group and a template? Otherwise, its a moot point, since most of the invalid urls I keep having issues with do contain a valid template group.

    The rewrites in my .htaccess look like this:

    RewriteRule ^/?((403|404|About|Chat|Contact|Copyright|Search)(/.*)?)$ /Westeros/$1 [L]
    RewriteRule ^/?((Index|Digest|Updates|Westeros|ASoIaF|Shared|Store)(/.*)?)$ /index.php/$1 [L]
    RewriteRule ^/?GoT/((Index|Characters|Episodes|Features|Gallery|News|Westeros|GoT|Shared)(/.*)?)$ /GoT/index.php/$1 [L]

    The first line removes the template group for templates that are in the “Westeros” template group, so that I can have http://www.westeros.org/About/ instead of http://www.westeros.org/Westeros/About.

    The second line removes index.php in the top-level directory for the specified template groups.

    The third line removes the index.php in the GoT directory (essentially, a subsite) for the specified template groups.

  • #11 / Apr 05, 2012 11:02am

    Shane Eckert

    7174 posts

    Hi Lida,

    Yes and no. The strict URL should work, but it does not always because of how dynamic ExpressionEngine is. That’s actually a warning on the 404 instructions page.

    Can you move aside your .htaccess file and test again? I have seen the .htacess mess with ExpressionEngine’s ability to effectively manage 404 messages.

    Please let me know.

    Cheers,

  • #12 / Apr 05, 2012 11:46am

    Linda A

    666 posts

    Afraid the site is much too busy right now for me to bring it down since it rather depends on those rewrites happening. I’ll see about trying this on a slower hour.

    I had been confused, I see, by this blog post: http://ellislab.com/blog/entry/better_404s_with_strict_urls/

    It seemed to suggest that a template would always be required as well.

    However, I see that at least in the legacy docs (I am on EE 1.7.*) there’s the following code mentioned:

    {if segment_2 != ''}
      {redirect="404"}
    {/if}

    So that might be a workaround for the one template group that seems to be having most of the issues.

  • #13 / Apr 06, 2012 3:45pm

    Robin Sowell

    13255 posts

    Indeed- strict requires the template_group, though not the template.  (And just the domain name will still pull up the default.)  The global/redirect sounds like the way to go in your situations.  Right now- the page number issue could still be problematic- if the page number doesn’t pull back entries, it defaults to treating it as the first page.  But that’s something I’d like to see change sooner rather than later as well.

    Are you good with getting the options you do have working?  I think that redirect should do the trick for most of it.

  • #14 / Apr 06, 2012 4:37pm

    Linda A

    666 posts

    Actually, just one question about the redirect. Looking at it again, what exactly does it say? If segment 2 doesn’t match…empty, then redirect? I am not quite sure how this ensures a valid template?

    Edited to add: It does seem to work, though yes, it kills pagination. But for this template, I can do without it.

  • #15 / Apr 06, 2012 5:01pm

    Robin Sowell

    13255 posts

    Oh- sorry, Linda.  Yea- that’s just an example.  You’d want to modify it to match the logic of the site.  So for a single entry page?  It makes good sense.  Or you could use it when urls have too many segments- like segment_4 != ‘’.  Basically a way to tap into what the url pattern should look like- and if it doesn’t- 404 it.

.(JavaScript must be enabled to view this email address)

ExpressionEngine News!

#eecms, #events, #releases