ExpressionEngine CMS
Open, Free, Amazing

Thread

This is an archived forum and the content is probably no longer relevant, but is provided here for posterity.

The active forums are here.

Entry having 2 URIs --one correct, one non-existent (not anymore)

September 17, 2009 7:51am

Subscribe [5]
  • #1 / Sep 17, 2009 7:51am

    ErikV

    126 posts

    Hi,

    Can anyone help me with this very odd behaviour behaviour? In Google Webmaster, I see some entries with duplicate titles, meta data, etc. When I open these URLs, it’s almost always an entry that has a correct URL address AND an incorrect address. For example (screenshot included), the story on SanDisk has a correct URL address that looks like this:

    <a href="http://www.it-enquirer.com/main/ite/more/something_sandisk">http://www.it-enquirer.com/main/ite/more/something_sandisk</a>

    AND one that looks like this:

    <a href="http://www.it-enquirer.com/main/ite/more/mailtags_21">http://www.it-enquirer.com/main/ite/more/mailtags_21</a>

    The first one is correct, the second one points to the same page, but the entry that it is supposed to point to has been removed for about 3 months…

    I have rebuilt the database using the optimise tables command in EE’s control panel, but that doesn’t seem to help much if at all.

  • #2 / Sep 17, 2009 7:59am

    Sue Crocker

    26054 posts

    Have you considered using Google’s Remove Indexed Pages Tool

  • #3 / Sep 17, 2009 8:02am

    ErikV

    126 posts

    I forgot to add that if I enter the wrong address in my browser’s address bar, it returns the SanDisk story—the wrong thing—instead of my 404 page or my index page…

  • #4 / Sep 17, 2009 8:04am

    Sue Crocker

    26054 posts

    Ahhh… different issue, then.

    Where is your 404 supposed to go to, and what about your .htaccess file? Do you have one?

  • #5 / Sep 17, 2009 8:04am

    silenz

    1651 posts

    It’s like

    <a href="http://www.it-enquirer.com/main/ite/more/noodle-soup-with-ham-and-eggs">http://www.it-enquirer.com/main/ite/more/noodle-soup-with-ham-and-eggs</a>

    If there is no entry by that url_title it will show the latest.

    To stop that you’ve got to use the require_entry=“yes” parameter in your weblog-tag(s) and can react with the {if no_results}{/if} conditional to dead pages.

  • #6 / Sep 17, 2009 8:12am

    ErikV

    126 posts

    @Sue:

    My htaccess file looks like this:

    DirectoryIndex main index.php index.shtml index.html index.htm
    
    <Files main>
    ForceType application/x-httpd-php
    </Files>
    
    RewriteEngine on
    RewriteCond %{HTTP_USER_AGENT} !FeedBurner 
    RewriteRule ^main/ite/rss_2$ <a href="http://feeds.feedburner.com/itenquirer">http://feeds.feedburner.com/itenquirer</a> [R,L]
    
    ##EE Spam Block
    (...)
    ##End EE Spam Block
    
    RewriteCond %{HTTP_HOST} ^it-enquirer.com$ [OR]
    RewriteCond %{HTTP_HOST} ^www.it-enquirer.com$
    RewriteRule ^index\.php$ "http\:\/\/www\.it\-enquirer\.com\/" [R=301,L]
    (...)
    RewriteCond %{HTTP_HOST} ^it-enquirer.com$
    RewriteRule ^/?$ "http\:\/\/www\.it\-enquirer\.com" [R=301,L]

    My 404 is set to go to /main/pages/pagenotfound/

    @silenz:

    I used to use that parameter and it then redirected to random other pages, which is certainly not what I want.

    BTW, if an entry on MailTags is not in the database anymore, I don’t see why the story that covers SanDisk and IS NEW, should point to a URL that does not exist anymore. It’s the other way round that you understood to be my problem, I think.

  • #7 / Sep 17, 2009 8:21am

    Sue Crocker

    26054 posts

    Thanks for the assist, Silenz.

    BTW, if an entry on MailTags is not in the database anymore, I don’t see why the story that covers SanDisk and IS NEW, should point to a URL that does not exist anymore. It’s the other way round that you understood to be my problem, I think.

    It doesn’t matter that the url used to refer to an old entry. What matters now is that EE is “seeing” a limit of one, and picking the most recent.

    Silenz’s suggestion is correct. Have you tried what he suggests?

    You’d still want to have Google remove the page, but the ultimate goal would be to add require_entry = “yes”.

  • #8 / Sep 17, 2009 8:33am

    ErikV

    126 posts

    The URL that I gave to the SanDisk story is another one than the MailTags URL. The address for the SanDisk story reads /something_sandisk while the MailTags story (that isn’t there anymore) points to the same story. It’s not as if I re-used the name of the MailTags URL for the SanDisk story—or am I misunderstanding everything you say?

    When I activate required_entry = “yes”, the page just redirects to the second newest entry now—as I said, it redirects at random pages. And yes, I keep Google updated at all times, but if the crawler finds the page I can’t do that much about it, can I? I also have a robots.txt file in which I forbid Google to crawl member pages etc, but it doesn’t seem to bother much and crawls those anyway as well.

  • #9 / Sep 17, 2009 8:57am

    silenz

    1651 posts

    Whether it’s “mailtags_21” or “bumblebee” doesn’t make a difference.
    It only matters whether it’s a currently existing url_title or not.

    When I activate required_entry = “yes”, the page just redirects to the second newest entry now.

    This should lead to displaying nothing. Unless you do something with {if no_results}.

    How is the template set up? Are there multiple {exp:weblog:tags} to show the main content under different conditions?
    Can you share some of the code?

  • #10 / Sep 17, 2009 9:04am

    ErikV

    126 posts

    I don’t mind at all—here it is, with the exception of a long list of custom entry fields:

    
    									
  • #11 / Sep 18, 2009 3:26am

    John Henry Donovan

    12339 posts

    ErikV,

    Have you tried a combination of require_entry and the no_results tag?

    <div id="content">
    {exp:weblog:entries
       weblog="osx|av|print|online|production|pubdesign|extra|store|news"
       limit="1"
       require_entry="yes"}
    
    {if no_results}
    {redirect="404"}
    {/if}
    
    ...
    
    {/exp:weblog:entries}

    Also what version and build of EE are you using?

  • #12 / Sep 18, 2009 3:42am

    ErikV

    126 posts

    John,

    I’ll try that. But can you also explain to me why I should explicitly add this code to get to the 404 page? If memory serves right, it used to be automatic—or am I wrong?

    Always the very latest build…

  • #13 / Sep 18, 2009 12:01pm

    Ingmar

    29245 posts

    EE’s default behavior is to show one entry, any entry, if you url_title doesn’t match anything. By default, it’s the last article. I think it makes perfect sense, but if you do want to throw a 404 in such cases, “require_entry” is the way to go. Give that a try, please.

  • #14 / Sep 18, 2009 12:07pm

    ErikV

    126 posts

    I have given that a try. However, I still don’t see the point of having the system serve an existing page to people who search for that page with a URL_title that doesn’t exist anymore.

    It’ll be me, I’m sure. Thanks for the answers and have a nice weekend.

  • #15 / Sep 18, 2009 12:09pm

    Ingmar

    29245 posts

    However, I still don’t see the point of having the system serve an existing page to people who search for that page with a URL_title that doesn’t exist anymore.

    I am sorry, but that’s just how it works. Did the require_entry parameter work as intended? Case closed, so to speak?

.(JavaScript must be enabled to view this email address)

ExpressionEngine News!

#eecms, #events, #releases