ExpressionEngine CMS
Open, Free, Amazing

Thread

This is an archived forum and the content is probably no longer relevant, but is provided here for posterity.

The active forums are here.

Better 404s with Strict URLs

October 09, 2008 6:25pm

Subscribe [23]
  • #16 / Oct 10, 2008 4:34am

    evasion

    58 posts

    You’re supposed to reference the URL in EE like so:

    <a href="http://example.com/index.php/my_site/news/">http://example.com/index.php/my_site/news/</a>

    But for whatever reason a person will skip the initial URL segment and specify that the URLs do this instead:

     

    But what EE is really rendering is my_site/news/. This works in the current version of EE but will not work if Strict URLs is enabled.

    Does it means that default group (ie: currently “site”) will appear in the URL ?
    What about URL like :

    <a href="http://example.com/example/">http://example.com/example/</a>

    Will they look like :

    <a href="http://example.com/example/index/">http://example.com/example/index/</a>

    ?

    It seems a bit complicated for me, why don’t you simply check if page exists then send 404 headers ?

    By the way another thing to do to improve SEO is to automaticly add a trailing slash. This could be done with a .htaccess file :

    # Add trailing slash
    RewriteCond %{REQUEST_FILENAME} !-f
    RewriteCond %{REQUEST_FILENAME} !-d
    RewriteCond %{REQUEST_URI} !/$
    RewriteCond %{REQUEST_URI} !\.
    RewriteRule ^(.*) %{REQUEST_URI}/ [R=301,L]


    Cheers 😊

  • #17 / Oct 10, 2008 5:28am

    Jamie Pittock

    108 posts

    Looks good, although I’m not sure if it’ll get in the way of how I sometimes design my templates. 

    I’ll sometimes tack on segments to a URL to display different content on the same template.  Using your example URL (http://example.com/index.php/news/sprts/) I’d maybe only have a news/index template but completely different content might be displayed on that page if the segment ‘sprts’ is attached to the URL.  No template called ‘sprts’ would exist though.

    If you’re currently using the 404 Page feature, then Strict URLs will be enabled when you run the update.

    Derek, can I disable the Strict URLs then and still keep the 404 page feature?

    I’ve come across times in the past when I’ve wished EE would display a 404 page and it hasn’t.  I guess you have to be careful what you wish for…I’ve gotten so used to the flexibility of EE’s segments and I might have actually been abusing their usefulness all along.

    It’s great to see you’re working on stuff like this though, I’m looking forward to having a play around when it’s released to see how it does effect me.

  • #18 / Oct 10, 2008 7:55am

    Carlos Pero

    6 posts

    Yes yes yes yes yes yes yes yes yes! 

    This will be a big help.  I spent a few hours last weekend trying to “lock down” my own URLs this way, and I mostly achieved success using current 1.6.4 tools (and not hacking via PHP), but this kind of improvement is the right way to do it.  THANK YOU.

    The fact that the weblog:entries tag was trying to do double or triple-duty depending on what was in the URL was really bad for SEO and having a single view of content.  I’d still like to experiment and see how this plays with the magical category tags….I didn’t have much success there, and decided to resort back to the more stable “C#” pathing, so let me know if you want a beta tester for this!

  • #19 / Oct 10, 2008 9:34am

    GDmac - expocom

    350 posts

    What i guess, is that that segment_1 is the template_group at all times.
    example.com/index.php (opens index in default group)
    example.com/index.php/my_site (opens index of template group my_site)
    example.com/index.php/my_site/welcome/ (template welcome)
    example.com/index.php/my_site/welcome/seg3/seg4/ (plus other segments)

    Segment_1 is always considered to be a template group
    Segment_2 is always considered to be a template name.
    Segment_3 and further are: pagination, archives, post_id, url_title, categories and other stuff you put in there.

    Is the following true?
    wrong: example.com/index.php/my_site/category/bakery (404 because no template named: category)
    right: example.com/index.php/my_site/news/category/bakery (OK. template my_site/news)

    And that’s an interesting question regarding trailing slashes, how strict should we be about them?

  • #20 / Oct 10, 2008 10:38am

    Benoît Marchal

    204 posts

    The principle looks good and helps address potential problems. Like others on this thread I have a number of questions on how this would work with default templates but let’s wait until the module is available for the questions.

    Side question: another source of duplicate content is when calling the index page with an URL title, this causes the display of the single entry which duplicates the single entry page… are you planning this out as well?

  • #21 / Oct 10, 2008 11:20am

    Derek Jones

    7561 posts

    Is there anything that can be done about trailing slashes? I notice Wordpress cleans this up and is very strict. In EE…

    mysite.org/templategroup/template/url-tile

    and

    mysite.org/templategroup/template/url-tile/

    both work and I’ve had pages excluded from ranking for duplicate content before.

    In the above URIs, ‘url-title’ is in the 3rd segment in both cases.  Trailing slash behavior of path variables and such has not changed in 1.6.5.  evasion’s suggestion above on dealing with this via .htaccess is the correct way to address this, if it concerns you.

    Removing index.php is at its basest form a server trick.  I mean, that ExpressionEngine is only executed when index.php is requested by the server; mod_rewrite merely lets you hide this fact from the visitor.  ExpressionEngine still sees the URL segments that your .htaccess rules provide, so as long as the rewritten rule follows the above conventions, whether or not you have renamed or removed index.php from your visitor’s URLs does not matter.

    But this won’t work with sites that have removed the template group by mod_rewrite, will it?

    While that’s not something we support, it too is still just a server trick.  mod_rewrite interrupts the process between the browser and Apache and modifies how Apache, and hence EE, sees the request.  As long as the rewritten request meets the above criteria, it doesn’t matter what the visitor is seeing in their brower’s address bar.

    Does it means that default group (ie: currently “site”) will appear in the URL ?
    What about URL like :

    <a href="http://example.com/example/">http://example.com/example/</a>

    Will they look like :

    <a href="http://example.com/example/index/">http://example.com/example/index/</a>
    ?

    It means that requests for anything other than your site’s main index page must include a valid template group.  If you’re using a template group named ‘site’, then yes, ‘site’ would appear in your URLs.  That said, the pre-installed template group is nothing more than an example of how templates are constructed.  We never intended that people would continue to use that template group (or its name) after learning how to build sites with ExpressionEngine.  You should always build your template groups based on your site’s desired URL structure; indeed it defines it.

    ‘index’ however, will not appear in your URLs.  It is analogous to an index.html file within a physical directory which is serving a directory index.

    It seems a bit complicated for me, why don’t you simply check if page exists then send 404 headers ?

    Well that’s precisely what we are doing with Strict URLs.  Checking to see if the template group and template being requested actually exist.

    Looks good, although I’m not sure if it’ll get in the way of how I sometimes design my templates. 

    I’ll sometimes tack on segments to a URL to display different content on the same template.  Using your example URL (http://example.com/index.php/news/sprts/) I’d maybe only have a news/index template but completely different content might be displayed on that page if the segment ‘sprts’ is attached to the URL.  No template called ‘sprts’ would exist though.

    With Strict URLs you could not do that.  The index template cannot be used for both multiple and single entry pages, because the second segment does not correspond to a valid template.

    Derek, can I disable the Strict URLs then and still keep the 404 page feature?

    I’ve come across times in the past when I’ve wished EE would display a 404 page and it hasn’t.  I guess you have to be careful what you wish for…I’ve gotten so used to the flexibility of EE’s segments and I might have actually been abusing their usefulness all along.

    Yes, you can use the 404 Page without enabling Strict URLs.  In this case, it will only validate the first segment, the template group.  We realize that developers have come to be familiar with and love ExpressionEngine’s intelligence in working very hard to serve proper content no matter what is requested.  It gives a tremendous flexibility both in how you design your site and how visitors can interact with it and the software still serving relevant content.  The downside is a somewhat loose 404 interpretation, and our community is now full of people in both camps.  This new feature satisfies the newer segment of the community but does not leave the older behind.  If you prefer, leaving Strict URLs off while enabling a 404 Page will continue to behave as you’ve come to expect.

    Is the following true?
    wrong: example.com/index.php/my_site/category/bakery (404 because no template named: category)
    right: example.com/index.php/my_site/news/category/bakery (OK. template my_site/news)

    Correct!

    And that’s an interesting question regarding trailing slashes, how strict should we be about them?

    It’s a personal preference.  I’ve certainly never found it to be a problem, though aesthetically, the trend is towards not having a trailing slash.

    Side question: another source of duplicate content is when calling the index page with an URL title, this causes the display of the single entry which duplicates the single entry page… are you planning this out as well?

    Correct, with Strict URLs you will not be able to use URL titles in either the first or second URL segments under any circumstances.

  • #22 / Oct 10, 2008 11:34am

    Benoît Marchal

    204 posts

    Maybe it’s too late but could you offer an option to redirect to the strict URL instead of returning a 404 ?
    Returning a 404 for an URL that was valid previously simply penalizes users whereas a redirect silently resolves the SEO problem but gives users a better experience overall.

  • #23 / Oct 10, 2008 11:41am

    Derek Jones

    7561 posts

    Maybe it’s too late but could you offer an option to redirect to the strict URL instead of returning a 404 ?
    Returning a 404 for an URL that was valid previously simply penalizes users whereas a redirect silently resolves the SEO problem but gives users a better experience overall.

    Benoît, it sounds like you fall into the category of those who will choose to use a 404 Page but disable Strict URLs.  Strict URL is a yes/no preference, it is not a template that you define.  Your 404 Page is, and redirecting to that template without sending 404 headers would be bad for user-agents, including search bots.  Plus, most people’s 404 Page has a ‘404 file not found’ message and design, which would be an odd thing to see if it were not treated like a real 404.

  • #24 / Oct 10, 2008 11:53am

    Benoît Marchal

    204 posts

    Derek, thank you for the answer.
    Sorry for the description, I realize I did not expressed myself clearly.

    By redirect I don’t mean redirect to the 404 but redirect to the strict URL, e.g. I access:

    example.com/index.php/group/

    redirects to the correct URL:

    example.com/index.php/group/index/

    For existing sites, a number of incoming links will be to URLs that the strict URL option will make incorrect but it’s not the visitor’s fault! I always think it’s best to gently correct the mistake instead of issuing an error message.

  • #25 / Oct 10, 2008 11:58am

    Derek Jones

    7561 posts

    And how would ExpressionEngine know where to send them?  Your example URLs are both valid, though there’s never a need for index to be in the URL, and EE will not put it there for you in any of its path variables.

  • #26 / Oct 10, 2008 12:05pm

    Benoît Marchal

    204 posts

    And how would ExpressionEngine know where to send them?

    Using the same default rules that EE currently uses. Maybe the example was poorly choosen because I am not sure I understand exactly how your strict URL will work but the suggestion is just this: use the same rules that EE currently uses to find what to display if the URL is not strict AND redirect to the strict URL.

    Since the defaulting currently in place manages to find a group and a template, I pressume that you could “just” redirect to the group/template when given a non-strict URL.

    The benefit is that an URL that currently returns content will continue to return content, so one is not loosing anything in terms of incomings links, referencing, etc. BUT since it displays content through a redirect, you avoid the penalty of duplicate content.

  • #27 / Oct 10, 2008 12:11pm

    Derek Jones

    7561 posts

    That would defeat the purpose of the Strict URL, Benoît.  What you’re proposing would essentially mean that ExpressionEngine would never deliver a 404, and would take an invalid segment, redirect to a valid template group and template with the invalid segments tacked on.  You would also not be avoiding duplicate content display, since any and every bad request would be routed to a valid request.  Indeed you could exacerbate the duplicate content problem.

  • #28 / Oct 10, 2008 12:18pm

    Benoît Marchal

    204 posts

    Maybe I misunderstood the purpose. My understanding is that the goal is to address duplicate content to avoid Google penalty.

    To the best of my knowledge, there are two solutions to avoid duplicate content and they are 404 (error) and 301 (redirect). You implement 404 and I’m suggesting you give the other option as well because it has some benefits and some sites may appreciate those benefits.

  • #29 / Oct 10, 2008 12:25pm

    Derek Jones

    7561 posts

    I understand your suggestion, but I don’t think you’re seeing the ramifications.  The point of Strict URL is not solely to avoid duplicate content, it is just a direct benefit of locking down your URLs in a more controlled fashion for proper 404 delivery.  Consider:

    <a href="http://example.com/index.php/jwelkjfslkjse">http://example.com/index.php/jwelkjfslkjse</a>

    With your suggestion, this would use a 301 redirect to:

    <a href="http://example.com/index.php/default_group/index/jwelkjfslkjse">http://example.com/index.php/default_group/index/jwelkjfslkjse</a>

    So you’ve told the user-agent that the request was good, but that it should go to this new location.  Problem is, that new location is rubbish, and is in fact the same content that would be rendered for any other rubbish request.

    <a href="http://example.com/index.php/weoijwelfjwf">http://example.com/index.php/weoijwelfjwf</a>

    301 to:

    <a href="http://example.com/index.php/default_group/index/weoijwelfjwf">http://example.com/index.php/default_group/index/weoijwelfjwf</a>

    So, now you have an infinite number of pages with duplicate content on them, because you’re expressly telling the user-agents that what they requested was fine, just in the wrong location.

  • #30 / Oct 10, 2008 1:30pm

    Benoît Marchal

    204 posts

    Ah yes, I see.
    So far I have never had a problem with totally bogus URLs, I only get duplications when people pass the title url on a multipage instead of a single page and similar problems.

    You could filter the case you are suggesting (redirect to http://example.com/index.php/default_group/index/ when you identify that the end is bogus) but I can see that this could lead to very complicated rules.

.(JavaScript must be enabled to view this email address)

ExpressionEngine News!

#eecms, #events, #releases