ExpressionEngine CMS
Open, Free, Amazing

Thread

This is an archived forum and the content is probably no longer relevant, but is provided here for posterity.

The active forums are here.

Is it possible to intercept external garbage URLs to a search routine instead of 404 page?

April 11, 2011 2:00am

Subscribe [2]
  • #1 / Apr 11, 2011 2:00am

    John_Betong_002's avatar

    John_Betong_002

    407 posts

     
    Google Webmaster Tools is complaining of numerous links returning with “An Error Was Encountered [400]”.

    Edit - start:

    Just noticed that my original post was truncated:(

    What I would like to do is to somehow trap the URL before it fails the routing tests, etc.

    I would like to use the following code:

    $bad_chars = array('width','height','=',':','<','>','alt','//', 'etc');
    
      $good_words  = str_replace($bad_chars, '/', $_SERVER['REDIRECT_URL']); // REQUEST_URL
    
      header('Location: /my_search_routine/' . $good_words);
      exit;

    Edit - end:


    The following is an example which results in application/errors/error_general.php

     
    http://website.com/afiles/images/santa-email.jpg” width=“100” height=“50” alt=“image”></a> </div> <div class=“c0 r”><a

  • #2 / Apr 11, 2011 2:42am

    WanWizard's avatar

    WanWizard

    4475 posts

    Make sure your images exist?

  • #3 / Apr 11, 2011 3:12am

    John_Betong_002's avatar

    John_Betong_002

    407 posts

     
    The images all exist, the problem is the trailing junk.

    Try appending the junk onto a known URL image on your site and see what happens.


    I have just tried appending the junk onto your avatar and the response I get is “Oops! This link appears to be broken.”

     

    http://ellislab.com/images/avatars/uploads/avatar_78055.jpgOops! This link appears to be broken.

     
     
     

  • #4 / Apr 11, 2011 3:31am

    InsiteFX's avatar

    InsiteFX

    6819 posts

    The problem comes form the href content which starts with a single quote but is erroneously closed with a double quote - so it’s not actually closed until another single quote is found further down. So all of http://www.snapshotjourneys.com/uploads/images/BORNEO/borneo-kota-kinabalu-malaysia/borneo-kota-kinabalu-malaysia-3-university.jpg” width=“81” height=“50” alt=“image”></a> </div> <div class=“c0 r”><a

  • #5 / Apr 11, 2011 4:22am

    John_Betong_002's avatar

    John_Betong_002

    407 posts

     
    Just updated my original post to include the requirements which were truncated.

     
     
     

  • #6 / Apr 11, 2011 1:28pm

    WanWizard's avatar

    WanWizard

    4475 posts

    I still don’t see how you can get into this situation other than invalid HTML or invalid links.
    Which is your problem as a developer, and you should fix that, not work around it.

  • #7 / Apr 11, 2011 2:13pm

    John_Betong_002's avatar

    John_Betong_002

    407 posts

    I got into this situation by other webmasters using incorrect hotlinks. I have no control over these other sites but it appears I am being penalised by Google for not having corresponding landing pages for the bad URLs.

    Here are Google Webmaster Tools’s first two from eighteen web sites that have invalid links:

    http://ezentials.com/eqk-7-days-before-santa-rfc.html

    http://fivestarsmarketplace.com/lov-diagram-santa-pictures-printables.html

    Search the source code for “afiles/images” and as you will see the first part of the image URLs is correct but the complete URL is invalid.

    I was hoping to find a way to test the URL before CI routed the URL to an error page. This would also be ideal for filtering all the other hotlinked images.
     
     
     

  • #8 / Apr 11, 2011 2:32pm

    WanWizard's avatar

    WanWizard

    4475 posts

    Ok. So this is about other sites linking to your site?

    Then instead of a standard CI 404, route to a 404 controller returning a 200 status, and displays a 404 page with links to important parts of your application.

  • #9 / Apr 11, 2011 3:01pm

    John_Betong_002's avatar

    John_Betong_002

    407 posts

    Ok. So this is about other sites linking to your site?

    Then instead of a standard CI 404, route to a 404 controller returning a 200 status, and displays a 404 page with links to important parts of your application.

     

     
    Ah the “penny has dropped”.

    I was curious to know why my code was being ignored in the /application/errors/error_general.php. I will try remming the “header(“HTTP/1.1 404 Not Found”);” script and report back tomorrow… now it is way past my bed time 😊

    Many thanks.
     
     
     

  • #10 / Apr 12, 2011 12:19pm

    John_Betong_002's avatar

    John_Betong_002

    407 posts

    Nearly there but cannot get both conditions to work together.

    What I would like to do is to somehow trap the external URL before it fails the routing tests, etc.

    The following .htaccess in the images folder is supposed to:
    1. accept image links from my own site
    2. intercept all external links and divert to an ./images/index.php
      (where URL is parsed and routed to a search routine).

    .htacees

    RewriteEngine on
    
      # this line redirects everything to index.php including links from my own site
      # RewriteRule (.*) index.php
    
      RewriteCond %{HTTP_REFERER} !^$
      RewriteCond %{HTTP_REFERER} !^http(s)?://(www\.)?johns-jokes.com [NC]
    
      # RewriteRule \.$ ./index.php 
      # RewriteRule (.*) index.php/$1 [R,NC,L]


     
    ./images/index.php

    <?php 
      // this works fine
      // 1. parses the URI
      // 2. formats the results 
      // 3. redirects the results to my search routine with parameters
    
      $x = $_SERVER['REQUEST_URI'];
      if(strpos($x, '.'))
      {
        // bad link used for testing
        // $x = <a href="http://johns-jokes.com//afiles/images/days-before-christmas.png">http://johns-jokes.com//afiles/images/days-before-christmas.png</a>" width="39" alt="image">
    
        $x=substr($x,15);
        $i2=strpos($x, '.');
        $x=substr($x, 0, $i2);
    
        $x=str_replace('-','/', $x);
        header ('HTTP/1.1 301 Moved Permanently');
        header('Location: <a href="http://johns-jokes.com/joke/search/">http://johns-jokes.com/joke/search/'</a> .$x, TRUE, 301);
        exit;
      }

     
     
     

.(JavaScript must be enabled to view this email address)

ExpressionEngine News!

#eecms, #events, #releases