RSS Feed and Character Nighmare - are there solutions? 
Posted: 11 March 2008 03:53 PM   [ Ignore ]  
Lab Assistant
RankRank
Total Posts:  285
Joined  11-12-2003

My site involves members copying and pasting snippits of news content from around the internet.  I’m always running into weird stuff like

country’s 1994

In my articles, and with the RSS feeds, people find my fees screw up their sites when there’s characters that aren’t liked:

Warning: Invalid character at line 46, column 227 in....

FeedValidator always is saying my rss feeds aren’t valid

column 22: ‘utf8’ codec can’t decode byte 0x92 in position 1253: unexpected code byte (maybe a high-bit character?)

line 37, column 21: description contains bad characters

Is there things I can do to end this nightmare?

Thanks,
Ryan

Profile
 
 
Posted: 12 March 2008 01:41 AM   [ Ignore ]   [ # 1 ]  
Moderator
Avatar
RankRankRankRankRankRankRank
Total Posts:  13075
Joined  05-15-2004

Use UTF-8 throughout. On your site, in your feeds, everywhere.

 Signature 

Everything will be good in the end. If it’s not good, it’s not the end.

Profile
MSG
 
 
Posted: 12 March 2008 02:52 AM   [ Ignore ]   [ # 2 ]  
Summer Student
Avatar
Total Posts:  14
Joined  02-10-2008

I agree. Utf-8 is the best solution.

 Signature 

http://www.serafico.net

Profile
 
 
Posted: 12 March 2008 10:20 AM   [ Ignore ]   [ # 3 ]  
Lab Assistant
RankRank
Total Posts:  285
Joined  11-12-2003

you mean having this line?

<meta http-equiv=’content-type’ content=’text/html; charset=utf-8’ >

Cause I do have it in my pages except the RSS feed template.

Profile
 
 
Posted: 12 March 2008 10:24 AM   [ Ignore ]   [ # 4 ]  
Moderator
Avatar
RankRankRankRankRankRankRank
Total Posts:  13075
Joined  05-15-2004

Yes, everywhere. I think that

country’s

is actually utf-8 encoded text displayed in iso-8859-1.

 Signature 

Everything will be good in the end. If it’s not good, it’s not the end.

Profile
MSG
 
 
Posted: 15 March 2008 02:14 PM   [ Ignore ]   [ # 5 ]  
Lab Assistant
RankRank
Total Posts:  285
Joined  11-12-2003

On the RSS feed I have

<?xml version="1.0" encoding="{encoding}"?>

and in my weblog preferences I have
XML Character Encoding: utf-8

So my RSS feeds have this encoding too.  So far I appear to be doing everything I’m supposed to be doing.  Is there something I can do with the SAEF form to fix these problems before they occur? Or is there something else that I need to do?

Thanks.

Profile
 
 
Posted: 26 May 2008 05:19 PM   [ Ignore ]   [ # 6 ]  
Lab Assistant
Avatar
RankRank
Total Posts:  170
Joined  07-09-2006

Wow, that’s a nice advice. But is UTF-8 it also suitable for Dutch?

It should at least contain characters like:

é ë ç ï í ó ú ü á â “ ‘ €

Just to be sure: added those as a bitmap as well.

My website is already 3 years old, is it smart just to change the character coding (is now set to iso 8859-1, but don’t have a clue why). Can I just change it?

Image Attachments
ee chars.gif
Click thumbnail to see full-size image
 Signature 

Erwin van Lun
Marketeer on a Mission

Profile
 
 
Posted: 26 May 2008 05:28 PM   [ Ignore ]   [ # 7 ]  
Research Assistant
Avatar
RankRankRank
Total Posts:  756
Joined  03-16-2002

UTF-8 is suitable for almost every language you can think of (except Klingon, perhaps ... but I’m not sure on this).

 Signature 

Markus Stolpmann • Expression Engine 1.6.4
Current project: kamaco.info - (german) website for Austrian publication coach K. Matejcek

Profile
 
 
Posted: 26 May 2008 05:39 PM   [ Ignore ]   [ # 8 ]  
Lab Assistant
Avatar
RankRank
Total Posts:  170
Joined  07-09-2006

tx, ms! That gives me a good feeling the solution is coming up grin

Just the other question:
My website is already 3 years old, is it smart just to change the character coding (is now set to iso 8859-1, but don’t have a clue why). Can I just change it? (what would mean: set the char.encoding in EE to UTF-8 and the server setting in het .htaccess to UTF-8). No drawbacks? No corrupted postings from 2006? No problems with the RSS feed?

 Signature 

Erwin van Lun
Marketeer on a Mission

Profile
 
 
Posted: 18 June 2008 04:47 PM   [ Ignore ]   [ # 9 ]  
Lab Assistant
Avatar
RankRank
Total Posts:  170
Joined  07-09-2006

On my question above I have found some threads that are on conversions of existing weblogs to UTF-8.

Here is my next dilemma with a weblog that is perfectly in UTF-8 but the feed doesn’t validate at all. Worse, it is not accepted by any reader.

You can find the live RSS feed on: http://www.chatbots.org/ww/feed

Here is the code:

{assign_variable:master_weblog_name="chatbots"}
{exp
:rss:feed weblog="{master_weblog_name}"}

<?xml version
="1.0" encoding="{encoding}"?>
<rss version="2.0"
    
xmlns:dc="http://purl.org/dc/elements/1.1/"
    
xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
    
xmlns:admin="http://webns.net/mvcb/"
    
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    
xmlns:content="http://purl.org/rss/1.0/modules/content/">

    <
channel>
    
    <
title>{exp:xml_encode}{weblog_name}{/exp:xml_encode}</title>
    <
link>{weblog_url}</link>
    <
description>{weblog_description}</description>
    <
dc:language>{weblog_language}</dc:language>
    <
dc:creator>{author}</dc:creator>
    <
dc:rights>Copyright {gmt_date format="%Y"}</dc:rights>
    <
dc:date>{gmt_date format="%Y-%m-%dT%H:%i:%s%Q"}</dc:date>
    <
admin:generatorAgent rdf:resource="http://expressionengine.com/" />
    
{exp:weblog:entries weblog="{master_weblog_name}" limit="10" rdf="off" dynamic_start="on" disable="member_data|trackbacks"}
    
<item>
      <
title>{exp:xml_encode}{title}{/exp:xml_encode}</title>
      <
link>{title_permalink=chatbot}</link>
      <
guid>{title_permalink=chatbot}#When:{gmt_entry_date format="%H:%i:%sZ"}</guid>
      
<content:encoded><![CDATA[{EnglishDescription}<p>Language:<a href="http://www.chatbots.org/language/{Language}">{Language}</a></p>]]></content:encoded>

      <
dc:subject>{exp:xml_encode}{categories backspace="1"}{category_name}, {/categories}{/exp:xml_encode}</dc:subject>
      <
dc:date>{gmt_entry_date format="%Y-%m-%dT%H:%i:%s%Q"}</dc:date>
    </
item>
{/exp:weblog:entries}
    
    
</channel>
</
rss>

{/exp:rss:feed}

Here is what feedvalidator.org says:

Sorry
This feed does not validate
.

line 57, column 25: 'utf8' codec can't decode bytes in position 2629-2631: invalid data (maybe a high-bit character?)

<title>Coach Carri?</title>

In addition, interoperability with the widest range of feed readers could be improved by implementing the following recommendations.

line 25, column 6: Ensure description precedes content:encoded (10 occurrences)

<content:encoded><![CDATA[The virtual employee Maud answers all questi ...      ^line 57, column 24: title contains bad characters (2 occurrences)

<title>Coach Carri?</title>

line 60, column 61:content:encoded contains bad characters (12 occurrences)
<content:encoded><![CDATA[A la recherche d'
un nouveau d?? Le Coach Car ...                                                             
line 130, column 4: Missing atom:link with rel="self" [help]
</channel>

Auto-Convert High-Ascii is switched off.

So what can be the problem???? Any suggestions more than welcome.

 Signature 

Erwin van Lun
Marketeer on a Mission

Profile
 
 
Posted: 16 August 2008 09:02 AM   [ Ignore ]   [ # 10 ]  
Lab Assistant
Avatar
RankRank
Total Posts:  170
Joined  07-09-2006

Just checking: anyone who has had some time to check the above problem. It still occurs and is still very strange.

 Signature 

Erwin van Lun
Marketeer on a Mission

Profile
 
 
Posted: 17 August 2008 05:19 PM   [ Ignore ]   [ # 11 ]  
Lab Assistant
Avatar
RankRank
Total Posts:  170
Joined  07-09-2006

Next observation: in a other weblog on an other website I manage, the character encoding is UTF-8.

-Feed validator says the feed is okay, but gives a warning that the server is reporting iso 8859-1.
-IE 7 says it contains an error.
On a partner site that is also build in EE and where Magpie is used to import the feed, a PHP error is returned, causing the partner site to crash (which leads to a unhappy partner).

The feed itself contains a accent in the title, as attached. What should I do? The site is encoded in UTF-8.

-----

Another problem: when I try to upload a gif containing a screen dump, this forums returns:

Error Message:  The file you are attempting to upload has invalid content for its MIME type.

You might check yourself at www.erwinvanlun.com/ww/feed

 Signature 

Erwin van Lun
Marketeer on a Mission

Profile
 
 
   
 
 
Post Marker Legend
New Topic New posts Hot Topic Hot Topic with new posts New Poll New Poll Moved Topic Moved Topic Sticky Topic Sticky topic
Old Topic No new posts Hot Old Topic Hot Topic with no new posts Old Poll Old Poll Closed Topic Closed Topic Announcement Announcements
Theme
Change Theme
Visitor Statistics
The most visitors ever was 1149, on July 16, 2007 10:33 AM
Total Registered Members: 60972 Total Logged-in Users: 17
Total Topics: 73685 Total Anonymous Users: 14
Total Replies: 397486 Total Guests: 482
Total Posts: 471171    
Members ( View Memberlist )
Active Members:    Andres F.covienamCrucialcschlormanGreg JgreyfodderhearsayJohnDkeolaleramonekalsawsapidSean C. SmithSintraSteefstevegiowinslow4