Bug Report

Query strings interpreted as first segment on the index page

Date: 01/20/2010 Show-stopper?:
Status: Unidentified Reporter: Ian Beck
Reported in Version: EE 1.6.8 Assigned To: Not Assigned
Keywords: Performance Issues,
Support Thread:

Details

When a query string is passed to an EE homepage, it is interpreted somewhere in EE’s guts as a segment and a 404 page is improperly shown.  Here’s what my testing has uncovered:

- Given a URL like http://onecrayon.com/ EE loads the homepage correctly (whether it’s a Page, Structure page, or just defaulting to the site/index template)

- Given a URL like http://onecrayon.com/?something=random EE refuses to load a Page or Structure page (if there is one defined for the “/” URI), and will only load the site/index template if strict URLs is off and there is no 404 page specified in the global template preferences.

To try and figure out where the problem lies, I stuck this code in my footer:

<!-- Page<?php global $IN; echo $IN->URI?> --> 

What’s odd is that the reason EE is screwing up is because the query string is interpreted as the first segment.  So for the example above, I would have this in my footer when loading the random query string page:

<!-- Page: /something=random/ --> 

This only affects the homepage.  I have been unable to find any other page where appending random query string info affects EE in this way (it is instead correctly discarded as far as the EE URI segments are concerned).

I have tested this both on servers with .htaccess index.php removal and ones without and the behavior is the same in both instances.  The only idea to fix the problems to come from the forums was to make an override in htaccess for a specific query string, which is a band-aid that doesn’t address the root issue.

Hopefully you guys can get this fixed so that things like Google ad campaigns don’t break EE homepages.  I’m not sure if this is a problem in 2.0; haven’t had a chance to test it.

Comment on Bug Report

Page 1 of 1 pages
Posted by: Robin Sowell on 11 February 2010 5:14pm
Robin Sowell's avatar

Hrm- I’m not replicating on my local install:
http://127.0.0.1:8888/index.php/?something=random

I added a var_dump to your debug code- and the uri comes back string(0) “”, plus it worked w/a 404 and strict urls.

Might be something server related going on- though on the above example- this did pull back what looks to be the home page.  Note- has the index.php specified.

http://onecrayon.com/index.php/?something=random

If you try it on a test site- no htaccess, include index.php, and don’t make it a page or anything- just use a blank test index template with maybe:

Homepage
<hr>

<?php global $IN; echo $IN->URI; echo 'testing'var_dump($IN->URI); print_r($IN->SEGS); ?> 

Anything odd happening?  If so- paste the results and we may need to look at server diffs.

Posted by: Jarrett Barnett on 17 February 2010 5:08pm
Jarrett Barnett's avatar

I’ve ran into this same issue a few different times (in particular, with Google Campaign query string for email campaigns).

There’s a good article by Leevi Graham that can be found here:
http://newism.com.au/blog/post/56/expressionengine-campaign-monitor-google-analytics-and-url-rewrites/

It’s pointed out that “The problem arises when the link directs the user to the homepage URL without a template group and template.”

So it’s an .htaccess issue, and not an ExpressionEngine issue.

There are a few ways to tackle the problem (from which one type of solution is mentioned on the article mentioned above). If you create a rule that basically redirects homepage queries to the template URL equivalent with the query appended, that should fix it.

for example…

http://www.website.com/?utm_campaign=EXAMPLE

TO

http://www.website.com/index.php/site/index/?utm_campaign=EXAMPLE

Posted by: Ian Beck on 17 February 2010 10:19pm
Ian Beck's avatar

Hey Robin,

Sorry for taking so long to get back to you on this; I must have missed the notification that there was a comment.

The reason you weren’t able to replicate is because you didn’t reproduce the things I did that caused the problem.  grin  I’m seeing proper behavior when including index.php in the URL (as per your link to OneCrayon), but that isn’t a normal use-case. The main reason this bug is so problematic is that people who do everything right (whether they set up proper rewriting or leave rewriting off) and then direct Google ad campaigns or whatever else to their homepage will have it mysteriously break.  I don’t know of anyone who would think to add /index.php/ to the URL for the homepage by default (except perhaps for EE engineers… wink  ), since it’s a safe assumption that a server will automatically serve the index when there’s nothing else specified.  This assumption is additionally reinforced by EE having the user select a default template group.

EE’s current behavior punishes standard user assumptions about how the software and internet function.  To prevent that, you’ll either need to a workaround for this specific problem or else adjust user expectations.  Having the root URL work only some of the time is not a good middle ground to my mind; personally, I think accessing the root domain should either always work (whether or not random query info is tacked onto the end) or else it should never work (and users will have to either include /index.php/ in all URLs or rewrite/redirect to their main template group rather than selecting a default template group in the admin).  I don’t see how you’re going to prevent people falling afoul of the problem otherwise.

jmbelite: even if this were an htaccess issue, it would still be an ExpressionEngine issue, since EE works on servers without htaccess (where it is just as afflicted with the problem, unless you happen to know that you need a trailing slash after index.php to get random query strings to work).

Posted by: Robin Sowell on 18 February 2010 11:10am
Robin Sowell's avatar

Thanks to both for the discussion/elaboration.  I talked to the crew about this one, and the behavior boils down to it being a compromise made for server compatibility reasons.  Pascal did point out that if your server supports it- switching the $qtype to 1 (path_info) in the main index.php file will allow it to work as desired.  But that support isn’t universal.

I agree that consistency in the handling of whether the root url works is desirable.  But disabling that is just not an option I think anyone would support.  The variability in whether it works without the slash just boils down to server support.  Keep in mind- some servers still require ‘force query strings’.  Bleh. 

But managing expectations is a good point.  In truth, this is an edge case that hasn’t come up much.  I’m thinking a user note in the ‘Understanding URLs’ section might be the best approach.  The info would be there for folks who need it/understand it/run into a problem.  But not so overwhelming for those who don’t.

Name:

Email:

Location:

URL:

Remember my personal information

Notify me of follow-up comments?