UTF-8 mangling characters
Posted: 27 June 2006 09:40 AM   [ Ignore ]  
Lab Assistant
RankRank
Total Posts:  207
Joined  06-07-2004

We have a number of international sites (German, Spanish, French, Italian) that run off of EE using charset=iso-8859-1, but with a Japanese site looming I’m trying to convert over to utf-8. It’s not going so well. Static files on our server with UTF-8 work fine, but every time I try to run some content through EE with a Weblog’s preferences set to UTF-8, it’s a mess. All the special characters and accents common to languages such as Spanish just get mangled with UTF-8. I’ve turned off EE headers and made sure to set charset correctly in the templates, but nothing works.

Now the Default Character Set under General Configuration is still set to iso because, well, I have live sites running and I’d prefer not to tinker with the global settings I started out with, but even still, shouldn’t a Weblog’s preferences over-ride this? Secondarily, I was thinking if EE Headers are turned off, would these settings even matter.

Any suggestions out there?

Profile
 
 
Posted: 27 June 2006 10:18 AM   [ Ignore ]   [ # 1 ]  
Moderator
Avatar
RankRankRankRankRankRankRank
Total Posts:  23521
Joined  05-20-2002

So on the rendered page, you’ve got the charater set specified as utf-8, the weblog specified as utf-8 (in the admin)- hm, are you using autoconvert high-ascii?  (And switching character encoding is not my strong point- so bear with me.)

 Signature 

AKA rob1

Help Request TipsPro Network

Profile
 
 
Posted: 27 June 2006 10:38 AM   [ Ignore ]   [ # 2 ]  
Lab Assistant
RankRank
Total Posts:  207
Joined  06-07-2004

yes, utf-8 specified in weblog, but no auto-convert is not on. I thought about changing this setting, but it’s a site-wide setting and I’m a little fearful of changing that without being sure I know what’s going to happen to the live sites.

Profile
 
 
Posted: 27 June 2006 10:56 AM   [ Ignore ]   [ # 3 ]  
Administrator
Avatar
RankRankRankRankRankRankRank
Total Posts:  15825
Joined  06-03-2002

Do you have a live page you can link to?  It could be an issue with your database using a specific character set, but I’d like to rule out HTML errors and/or browser auto-detection overriding your charset setting.

 Signature 
Profile
MSG
 
 
Posted: 27 June 2006 11:33 AM   [ Ignore ]   [ # 4 ]  
Administrator
Avatar
RankRankRankRankRankRankRank
Total Posts:  15825
Joined  06-03-2002

Thanks for the link via PM, Todd.  The encoding for the entries is set upon submission, so I gather what’s happening is that you have data stored in your database in ISO-8859-1, and when you try to display that as UTF-8 the characters are getting wonked.

You will actually need to convert everything that was previously submitted to UTF-8 in order for it to display correctly as UTF-8.  The General Configuration’s setting of “Default Character Set” needs to be set to UTF-8 to ensure that submitted entries are stored in the database as such.  The developers reminded me that the XML Character Set setting in your Weblog Preferences is only used in RSS/Feed templates, and does not affect the data stored in the database.

If you have a lot of entries that need to be converted, you might need to write a conversion script with PHP and MySQL, otherwise, you can change that setting in your General Configuration and begin going through the entries.  Don’t forget to backup!!

 Signature 
Profile
MSG
 
 
Posted: 27 June 2006 11:36 AM   [ Ignore ]   [ # 5 ]  
Moderator
Avatar
RankRankRankRankRankRankRank
Total Posts:  15346
Joined  05-15-2004

What Derek says. I actually have a conversion script somewhere, so if it’s alot of entries we are talking about, let me know and I’ll dig it up for you. It might not work out of the box, but it should give you a head start.

EDIT: And yes, you’ll be glad you made the switch. I was hesitant at first, but I never looked back.

 Signature 

Everything will be good in the end. If it’s not good, it’s not the end.

Profile
MSG
 
 
Posted: 27 June 2006 11:48 AM   [ Ignore ]   [ # 6 ]  
Lab Assistant
RankRank
Total Posts:  207
Joined  06-07-2004

Thanks. So ya’ll think just “flipping the switch” with the General Config in EE won’t work, right? I was sort of hesitant to do this because I didn’t think it would work and heaven forbid it mess things up across all the live web sites.

The conversion script might be nice because I am closing in on roughly 600 pages. Q, though: I’ve got a lot of customized weblogs ... matter too much?

Profile
 
 
Posted: 27 June 2006 11:57 AM   [ Ignore ]   [ # 7 ]  
Moderator
Avatar
RankRankRankRankRankRankRank
Total Posts:  15346
Joined  05-15-2004

Oh, it’ll work allright for new content. But the old content is still encoded as Latin-1, now suddenly being pulled out of the database and being told to behave as Unicode, so to speak. That won’t work. So you will need to convert your database, too. I’ll see if it find this script; it will simply open your database, convert everything to UTF-8, and (keep your fingers crossed) write everything back. Customized weblogs won’t matter at all, since all work will be done on the db level.

 Signature 

Everything will be good in the end. If it’s not good, it’s not the end.

Profile
MSG
 
 
Posted: 27 June 2006 12:58 PM   [ Ignore ]   [ # 8 ]  
Moderator
Avatar
RankRankRankRankRankRankRank
Total Posts:  15346
Joined  05-15-2004

OK, the problem is this: I found the PHP script, and I remember modifying it to work with EE (it’s been written for WP originally), but I can’t remember if I ever sucessfully tested it.

So bear this in mind, make a backup of your database, and give it a try if you will. I won’t have the time to take a deeper look at it before the weekend, but I understand that you won’t want to wait until then. So please, use at your own risk, if you’re so inclined,but do make a backup.

File Attachments
convert.txt  (File Size: 4KB - Downloads: 112)
 Signature 

Everything will be good in the end. If it’s not good, it’s not the end.

Profile
MSG
 
 
Posted: 27 July 2006 01:53 PM   [ Ignore ]   [ # 9 ]  
Lab Assistant
RankRank
Total Posts:  207
Joined  06-07-2004

Been a few weeks, but I’m now finally forced to deal with the task at hand. Boy, I’ve been procrastinating this one as much as possible. This is such a hairy problem. The more I dig, the more I realize I’ve got some work to do not just with this particular database, but with how MySQL is configured on the server.

http://www.oreillynet.com/onlamp/blog/2006/01/turning_mysql_data_in_latin1_t.html
http://textsnippets.com/posts/show/84

Profile
 
 
Posted: 05 November 2007 05:02 PM   [ Ignore ]   [ # 10 ]  
Research Assistant
Avatar
RankRankRank
Total Posts:  647
Joined  05-16-2004

Hi, Ingmar.

I ran your convert script, and it seemed to work, but I did get a lot of warnings…

Warning: mysql_real_escape_string() expects parameter 2 to be resource, null given in .../system/core/core.functions.php(637) : eval()‘d code on line 82

Warning: mysql_real_escape_string() expects parameter 2 to be resource, null given in .../system/core/core.functions.php(637) : eval()‘d code on line 82

Warning: mysql_fetch_assoc(): supplied argument is not a valid MySQL result resource in .../system/core/core.functions.php(637) : eval()‘d code on line 45
exp_actions
Notice: Undefined variable: decode in .../system/core/core.functions.php(637) : eval()‘d code on line 60

Warning: mysql_real_escape_string() expects parameter 2 to be resource, null given in .../system/core/core.functions.php(637) : eval()‘d code on line 82

Warning: mysql_real_escape_string() expects parameter 2 to be resource, null given in .../system/core/core.functions.php(637) : eval()‘d code on line 83

Notice: Undefined variable: decode in .../system/core/core.functions.php(637) : eval()‘d code on line 60

Warning: mysql_real_escape_string() expects parameter 2 to be resource, null given in .../system/core/core.functions.php(637) : eval()‘d code on line 82

Warning: mysql_real_escape_string() expects parameter 2 to be resource, null given in .../system/core/core.functions.php(637) : eval()‘d code on line 83

Notice: Undefined variable: decode in .../system/core/core.functions.php(637) : eval()‘d code on line 60

Warning: mysql_real_escape_string() expects parameter 2 to be resource, null given in .../system/core/core.functions.php(637) : eval()‘d code on line 82

Warning: mysql_real_escape_string() expects parameter 2 to be resource, null given in .../system/core/core.functions.php(637) : eval()‘d code on line 83

Warning: mysql_real_escape_string() expects parameter 2 to be resource, null given in .../system/core/core.functions.php(637) : eval()‘d code on line 82

Warning: mysql_real_escape_string() expects parameter 2 to be resource, null given in .../system/core/core.functions.php(637) : eval()‘d code on line 83
Invalid query: Unknown column ‘’ in ‘where clause’

 Signature 

Fight spam better with “Defensio for EE,” a free module


Hop Studios Internet Consulting
http://www.hopstudios.com/

Profile
 
 
   
 
 
‹‹ Url problems in Google      Private Messages ››
Post Marker Legend
New Topic New posts Hot Topic Hot Topic with new posts New Poll New Poll Moved Topic Moved Topic Sticky Topic Sticky topic
Old Topic No new posts Hot Old Topic Hot Topic with no new posts Old Poll Old Poll Closed Topic Closed Topic Announcement Announcements
Theme
Change Theme
Visitor Statistics
The most visitors ever was 1149, on July 16, 2007 09:33 AM
Total Registered Members: 64889 Total Logged-in Users: 55
Total Topics: 81817 Total Anonymous Users: 40
Total Replies: 439864 Total Guests: 254
Total Posts: 521681    
Members ( View Memberlist )