ExpressionEngine CMS
Open, Free, Amazing

Thread

This is an archived forum and the content is probably no longer relevant, but is provided here for posterity.

The active forums are here.

Weird numbers in the URL / Duplicated pages

February 06, 2013 10:28am

Subscribe [1]
  • #1 / Feb 06, 2013 10:28am

    DorinV

    38 posts

    I’m having some problems with EE v2.5.2.

    I got a website with 1000+ articles.

    My Global Template settings:
    Enable Strict URLs - YES
    404 Page - home/404

    Every article/page is duplicated in a weird way.
    Here are some of the pages with this problem:

    http://44fashionstreet.com/article/1347890187000
    http://44fashionstreet.com/article/1345212246000
    http://44fashionstreet.com/article/1345228374000

    The list is huge. I though: Cool, let’s “force” EE to redirect the pages without Articles in going to a 404 page. I did this:

    {if no_results}
       {redirect="404"}
    {/if}

    (I removed the code for now)

    It worked fine but now I’m getting at least 1000 errors every time google crawls the website. Google Webmaster tools is full of 404 Response Codes.

    I have done some research without finding a solution to this. Some users on google’s forums said that they solved the problem by removing Disqus comment system. I tried that but didn’t solve the problem.

    I’m not sure what to do as this might affect my ranking in google.
    If you go on google and type: site:44fashionstreet.com you’ll see that almost all the results have those weird numbers in the url and if you visit the page, there’s no article inside.

    Did you experience something like this before?
    Thank you !

  • #2 / Feb 07, 2013 6:09pm

    You’re going to have to provide some more detail, specifically your channel entries tag within your article/index template to figure out what’s going on.

  • #3 / Feb 07, 2013 6:27pm

    DorinV

    38 posts

    It looks something like this(had to remove a lot of code inside {exp:channel:entries} ):

    {embed="includes/header" default_title="44 Fashion Street - Most popular street on fashion, beauty and celebrities."}
    {embed="includes/top"}
    {embed="includes/toptab"}
    
    {exp:channel:entries channel="articles_v2|articles" limit="1" disable="member_data|pagination|category_fields" dynamic="yes" status="zone1|zone2|zone3|zone4|zone5|zone6|zone7|zone8|Open|toptab|Closed" }
    
    {if status != "Closed" OR username != "admin" OR member_group != "6"}
    {if status == "Closed"}This is a draft{/if}
    
    {if channel_short_name == "articles_v2"}
      - stuff -
    {if:elseif channel_short_name == "articles"}
      -stuff-
    {/if}
    
    {/if}
    {/exp:channel:entries}
  • #4 / Feb 07, 2013 6:55pm

    There’s nothing bleedingly obvious there. Judging but the page titles that Google has indexed, and what’s showing now, I suspect the issue is fixed on your end and now you just need to purge Google’s indexes. I.e. Google page title for 44fashionstreet.com/article/1346983149000: is ‘Topshop Fall/Winter 2012: country grunge ... - 44 Fashion Street’ but if I go to the page now it’s different.

    The fact you’re using canonical links, suggests to me your article category pages at some point had an incorrect canonical link (possibly with the entry date in there?) and Google has indexed it, and you’ve since resolved the problem.

    In which case you just need to add the site to Google Webmaster and you’ll be able to ‘Remove URLs’. I would also set-up a robots.txt to deny access to those specific incorrect urls.

  • #5 / Feb 07, 2013 7:05pm

    DorinV

    38 posts

    I don’t really know what’s wrong.
    Right now I have 1561 errors of 404 on google. All weird numbers in the URLs :|

    I did marked them as fixed 2 days ago ... google came and crawled the website and he found those pages. Putting them into robots won’t help because every new article has 2 pages/2 url’s. One “normal” page with the title of the article in the URL and one with those weird numbers. The one with the weird numbers doesn’t have content inside, no related articles, nothing.

  • #6 / Feb 07, 2013 8:06pm

    Making them fixed only removes them from the error report within Google Webmaster, it does not prevent Google from attempting to reindex those pages.

    Some interaction by your part is required, you can see this Google Webmaster Blog article for more info: http://googlewebmastercentral.blogspot.com.au/2011/05/easier-url-removals-for-site-owners.html

  • #7 / Feb 08, 2013 5:46am

    DorinV

    38 posts

    Thanks Michael but I think it must be other solution to this. Today I’ve got 1,772 errors.

    I repeat: Every time I post an article another page is created, a fake page with a weird number in the URL, and that page has no content at all. That’s how I end up with 1,772 pages that are not really there.

    Manually adding them to “blocked url” in google webmaster is not a solution.

  • #8 / Feb 08, 2013 5:43pm

    I’m telling you, I’m definitely right! 😊

    Run the following search; for one of your new articles. There’s no problem.

    site:44fashionstreet.com "bcbg max azria autumn/winter 2013-14 collection"

    However, run this search for an old article; and you’ll see the problem.

    site:44fashionstreet.com "Dior Grand Bal 2012 Holiday Makeup Collection"

    So sometime in the past your rss feed, or some page had a recursive problem that was linking to in-correct URLs within the site. You’ve since fixed this problem, probably unbeknown to yourself, and now you need to fix Google’s index.

  • #9 / Feb 08, 2013 7:15pm

    DorinV

    38 posts

    Cool Michael, got it 😊

    I’ll put the links in htaccess to deny them and see what’s happening.

    Thanks a lot !

  • #10 / Feb 08, 2013 7:28pm

    DorinV

    38 posts

    Do you know a way to block everything in .htaccess without having to do it line by line ?

  • #11 / Feb 08, 2013 7:45pm

    Something like the following will do the trick; will return 404 Not Found for any urls that start with “/article/134….”

    RedirectMatch 404 ^/article/134*
  • #12 / Feb 08, 2013 8:06pm

    DorinV

    38 posts

    Thanks a lot !

  • #13 / Feb 12, 2013 10:28am

    DorinV

    38 posts

    This solved the problem somehow but those pages are still there and google is keep finding them.

    Now I’ve got 2200 errors of 404. I wonder if that’s ok with google ?

  • #14 / Feb 13, 2013 6:49am

    DorinV

    38 posts

    “Google detected a significant increase in the number of URLs that return a 404 (Page Not Found) error. Investigating these errors and fixing them where appropriate ensures that Google can successfully crawl your site’s pages.”

    That doesn’t sound good :(

    Today I’ve got 3.8k 404 pages

.(JavaScript must be enabled to view this email address)

ExpressionEngine News!

#eecms, #events, #releases