ExpressionEngine CMS
Open, Free, Amazing

Thread

This is an archived forum and the content is probably no longer relevant, but is provided here for posterity.

The active forums are here.

Odd, invalid urls indexed by Google

February 10, 2012 7:55am

Subscribe [3]
  • #16 / Apr 06, 2012 5:06pm

    Linda A

    666 posts

    Ok, I see. 😊 Its a bit of a tricky one given the way the incorrect urls are coming out. In quite a few instances, they consist of domain+template group+invalid stuff. So it seems I do need to block out any second segment at all and just skip out on pagination.

  • #17 / Apr 08, 2012 7:26pm

    Linda A

    666 posts

    Looking over the Webmaster Tools, it seems to have helped some to use the redirect. But I still have a few other oddities.

    For example, things like http://www.westeros.org/ASoWS/News/P600/P960/P480/

    How on earth does it end up with urls like that? And is there anything I can do to get those to return 404s?

  • #18 / Apr 09, 2012 3:57pm

    Dan Decker

    7338 posts

    Hi Linda,

    I followed that link and ended up at a list of paginated results, which is what I expected based on the “Pxxx”

    What was odd was it seemed to be a “layer” of pagination? Each click of “First” led to a lower “layer” until I was finally back to the beginning.

    Can you share the template code? I don’t suspect anything is out of the ordinary, but it would be nice to have a look.

    I tried to replicate the multiple “Pxx” on my install and it generated a 404 for me, so we can get this sorted out.

    Cheers,

  • #19 / Apr 09, 2012 4:08pm

    Linda A

    666 posts

    Well, its not entirely impossible, since I am using an old replacement technique in the pagination. It was recommended ages ago when you’re taking out index.php via .htaccess, but I am guessing it wasn’t foolproof. I haven’t noticed these errors before, though, which is what confuses me since the template has been like this for years.

    {paginate}
    <div class="Pagination">Page {current_page} of {total_pages} pages {exp:replace find="/index.php" replace=""}{pagination_links}{/exp:replace}</div>
    {/paginate}
  • #20 / Apr 10, 2012 4:01pm

    Dan Decker

    7338 posts

    Hi Linda A,

    That may indeed be the culprit. We do offer a supported means of removing index.php

    While the User Guide outlines the process for ExpressionEngine 2.x, the principles are the same in 1.x

    I would suggest that you be on the latest release, 1.7.2

    <IfModule mod_rewrite.c>
            RewriteEngine On
    
            # Removes index.php
            RewriteCond $1 !\.(gif|jpe?g|png)$ [NC]
            RewriteCond %{REQUEST_FILENAME} !-f
            RewriteCond %{REQUEST_FILENAME} !-d
            RewriteRule ^(.*)$ /index.php/$1 [L]
    
            # If 404s, "No Input File" or every URL returns the same thing
            # make it /index.php?/$1 above (add the question mark)
    </IfModule>

    Use that in your .htaccess, and you should be able to remove the replace plugin from the picture.

    I look forward to your reply!

    Cheers,

  • #21 / Apr 10, 2012 5:57pm

    Linda A

    666 posts

    Thanks! Just a couple of questions before I go ahead and change my .htaccess:

    1) Does that method for removing index.php assume that the url structure is http://www.domain.com/index.php/templategroup/template? Or would it also work if its http://www.domain.com/directory/index.php/templategroup/template? The way my site is setup it has several “subsites” that reside in directories with their own index.php and path.php, though they all use the same .htaccess since they are closely integrated otherwise.

    2) My template are…a few years old, to say the least, meaning there’s some legacy crud in there. When I put them together, it seems the pagination code behaved in a different way and was inserting an incorrectly placed index.php, hence the plugin. So it wasn’t actually used instead of an .htaccess rewrite to remove index.php, it was used in addition to one because the pagination code wasn’t working with the rewrites. But I am guessing that got fixed along the way?

    Either way, I should be able to ditch that plugin everywhere and possibly also switch to the more concise rewrite you provided, depending on how it works with the url structure I mention above. So thank you. 😊

  • #22 / Apr 11, 2012 2:19pm

    Dan Decker

    7338 posts

    Hi Linda,

    Does that method for removing index.php assume that the url structure is http://www.domain.com/index.php/templategroup/template? Or would it also work if its http://www.domain.com/directory/index.php/templategroup/template? The way my site is setup it has several “subsites” that reside in directories with their own index.php and path.php, though they all use the same .htaccess since they are closely integrated otherwise.

    It has been my experience that a modified .htaccess would need to be in each sub-directory. I did a project with a similar setup once. Additional index.php files were used for a multi-lingual site. It worked a bit like this:

    <IfModule mod_rewrite.c>
            RewriteEngine On
    
            # Removes index.php
            RewriteCond $1 !\.(gif|jpe?g|png)$ [NC]
            RewriteCond %{REQUEST_FILENAME} !-f
            RewriteCond %{REQUEST_FILENAME} !-d
            RewriteRule ^(.*)$ /sub-dir/index.php/$1 [L]
    
            # If 404s, "No Input File" or every URL returns the same thing
            # make it /index.php?/$1 above (add the question mark)
    </IfModule>

    Note the addition of sub-dir in the RewriteRule.

    Of course, I totally recommend any amount of testing you can do to make sure this fits your needs 😊

    Cheers,

  • #23 / Apr 11, 2012 2:22pm

    Linda A

    666 posts

    Ah, excellent, thank you, will definitely be experimenting with that. It would be a good deal easier to manage than the current method which basically means I specify each template that needs rewriting. I think it used to be called the inclusive method. 😊

  • #24 / Apr 12, 2012 1:03pm

    Dan Decker

    7338 posts

    Hi Linda,

    You are correct!

    If you run into any trouble down the road, just let us know in a new thread.

    Happy testing!

    Cheers,

.(JavaScript must be enabled to view this email address)

ExpressionEngine News!

#eecms, #events, #releases