We run a program to pull in all URLs for a website, and it has pulled in some crazy URLs with extra https:/’s added on. We’ve checked this on 5 of our EE sites and this is occurring on every one of them.
Examples:
http://www.shorepinerentals.com/accommodations/property/the-refuge/https:/https:/https:/https:/https:
The issue with the bots gathering these as additional URLs is we then run into duplicate pages/duplicate content issues.
Has anyone seen this before? Is there a fix?