ExpressionEngine CMS
Open, Free, Amazing

Thread

This is an archived forum and the content is probably no longer relevant, but is provided here for posterity.

The active forums are here.

RSS 2.0 Feed Sucking Bandwidth

October 05, 2008 4:30pm

Subscribe [2]
  • #1 / Oct 05, 2008 4:30pm

    mayest

    293 posts

    I recently noticed that bandwidth usage on my site has skyrocketed (more than double). Since I am on shared hosting with EngineHosting, I contacted them to see what was going on. They were very kind (as always - I love them) and pointed out that my RSS_2.0 template was probably the culprit. They also hooked me up with Urchin, which gives me a bit more detail than Google Analytics.

    So, I’ve been using Urchin to investigate. My RSS 2.0 feed was responsible for about 500 MB of data in the last few days. Wow! I also noticed that Google is reporting that my subscriber numbers have more than tripled. Seems odd, but I’ll take it. Anyway, looking at my template, I finally noticed that it was sending out entire posts—even though I had thought that I was only sending 600 characters per post (using the TruncHTML plugin). Well, I found that the description tag contained all of the text. Here’s what the line looked like (note the {body} tag):

    <description>{exp:xml_encode}{body}{/exp:xml_encode}</description>

    I changed that to:

    <description>{exp:xml_encode}{summary}{/exp:xml_encode}</description>

    That significantly reduced the amount of text in the source of the feed, and I hope it will dramatically reduce the bandwidth usage.

    Anyway, it still seems very strange that a text feed could result in a hundred or so megabytes a day. I did find one IP address from France that was hitting it hard, so they are banned.

    Any other ideas? Does anybody think that sending out less text will have a significant impact? How much bandwidth should an RSS feed consume on a popular blog? Mine isn’t terribly popular (a couple of hundred visitors a day - most from search, not feed readers, so they aren’t hitting the feed), so I’m just trying to get an idea of what would be reasonable.

    Thanks!

  • #2 / Oct 05, 2008 5:23pm

    soxhead

    69 posts

    Text is cheap in terms of bandwidth, especially with the number of visitors you’re getting.

    IMO there is a strong possibility that your feed is being scraped. There are a few tools out there that can help you along with this, but for starters, try taking some of the feed text, wrap quotes around it, and plop it into Google. Are you the only listing returned? If not, you probably have a scraping issue.

    I would strongly recommend using Feedburner if you aren’t already. Feedburner caches your feeds (less burden on your server) and has some tools that will alert you if your feeds are being used in a suspicious manner.

  • #3 / Oct 05, 2008 6:22pm

    mayest

    293 posts

    Thanks. Scrapers were the first thing that came to mind. In fact, the French IP address that I banned almost has to be a scraper. I also found that NewsGator was hitting the feed really hard and temporarily banned them. Those moves seem to have helped, but it is hard to tell since it is now a weekend and traffic always drops a lot on weekends.

    My entire blog (pictures, text, spreadsheets) is somewhere around 2 MB in total size. So, the recent bandwidth surge equates to about 75 complete downloads per day. The weird part is that this bandwidth usage is simply from the RSS template and it doesn’t send out anything except text. None of the individual picture or spreadsheet files are showing all that much usage (as far as I can tell).

    I do use FeedBurner, and haven’t noticed any “Uncommon Uses” during this period. Whatever it is, is hitting my feed directly. It sounds like you are suggesting that I redirect (using .htaccess) my RSS 2.0 template to my feeds.feedburner.com/ExcelBlog feed. Is that correct? If I do that, will feed readers automatically pick it up? I would think so, but I’m not sure. I think I’ll try that and see how it goes.

    Thanks.

.(JavaScript must be enabled to view this email address)

ExpressionEngine News!

#eecms, #events, #releases