Character encoding woes
Posted: 14 August 2008 11:01 AM   [ Ignore ]  
Lab Assistant
RankRank
Total Posts:  171
Joined  06-18-2007

It’s like a recurring nightmare for me…

I’m moving servers, and the database engines are slightly different:

Old setup (info from PHPmyadmin)

MySQL 4.1.12
MySQL charset:  UTF-8 Unicode (utf8)
MySQL connection collation: utf8_general_ci
Table collation: latin1_swedish_ci

New setup (info from PHPmyadmin)

MySQL 5.0.32
MySQL charset: UTF-8 Unicode (utf8)
MySQL connection collation: utf8_general_ci
Table collation: latin1_swedish_ci

The story:

After encountering database encoding issues when dumping the DB, I used the database dump utility in the CP. The database dumped fine. I imported it on the new server, via the command line, with—default_character_encoding set to UTF-8, and it imported correctly. I can view the DB fine in PHPmyadmin, all the character encoding is correct. But when viewed on the front end, the High Ascii characters are corrupted.

So my guess is that the actual database connection between the CMS and the database is at fault. The difference between the MySQL connection collation and the table collation is worrying, but I don’t understand why it works on the old server.

Please help!

Profile
 
 
Posted: 14 August 2008 01:09 PM   [ Ignore ]   [ # 1 ]  
Lab Assistant
RankRank
Total Posts:  171
Joined  06-18-2007

just to add that converting table collation to utf8_general_ci did not help.

Profile
 
 
Posted: 14 August 2008 07:38 PM   [ Ignore ]   [ # 2 ]  
Moderator
Avatar
RankRankRankRankRankRankRank
Total Posts:  12390
Joined  04-29-2002

Richard, I’m by no means an expert, but are you certain you’re using charset=“utf-8” or the variable {charset} on the front end of the site?

 Signature 

Quick Reference - EE Trial Options - EE Wiki - Docs for updating a build

Profile
MSG
 
 
Posted: 15 August 2008 01:02 AM   [ Ignore ]   [ # 3 ]  
Lab Assistant
RankRank
Total Posts:  171
Joined  06-18-2007

yep

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

I’d like to know the exact connection string that Expression Engine uses to connect to the database; what character set and collation it uses to connect to the database. This may help with my troubleshooting.

Profile
 
 
Posted: 18 August 2008 08:49 AM   [ Ignore ]   [ # 4 ]  
Moderator
Avatar
RankRankRankRankRankRankRank
Total Posts:  12390
Joined  04-29-2002

Richard, you can find that information in the file called db.mysql.php found in the /system/db folder.

EE reads the contents of config.php to grab the login info for MySQL. There is no assumed character set and collation as far as I know.

EE doesn’t make the MySQL database. That’s done beforehand.

 Signature 

Quick Reference - EE Trial Options - EE Wiki - Docs for updating a build

Profile
MSG
 
 
Posted: 02 September 2008 12:08 PM   [ Ignore ]   [ # 5 ]  
Lab Assistant
RankRank
Total Posts:  131
Joined  04-02-2006

The following thread may help: http://expressionengine.com/forums/viewthread/81472/

Profile
 
 
Posted: 02 September 2008 01:57 PM   [ Ignore ]   [ # 6 ]  
Moderator
Avatar
RankRankRankRankRankRankRank
Total Posts:  15379
Joined  05-15-2004

Thank you, Keith. Richard, are you still facing this issue?

 Signature 

Everything will be good in the end. If it’s not good, it’s not the end.

Profile
MSG
 
 
Posted: 13 September 2008 11:23 AM   [ Ignore ]   [ # 7 ]  
Lab Assistant
RankRank
Total Posts:  171
Joined  06-18-2007

Hi

Finally got to fixing this. Thanks Keith, that was the answer. For anyone else who may have experienced the same problem, this is what worked for me:

1. Make a database dump. Open the SQL file in a text editor and search and replace “DEFAULT CHARSET=latin1” with “DEFAULT CHARSET=utf8”.

2. Import DB via the command line (SSH). I find this the most effective, although you could probably do it via PHPmyadmin.

mysql -uUSERNAME -pPASSWORD --default_character_set utf8 DB_NAME < DBIMPORT.SQL

The default character set parameter probably isn’t required, but I did it anyway.

3. Open system/db/db.mysql.php and find function db_connect (around line 173). After

$this->server_info = @mysql_get_server_info();

ADD:

$this->query("SET NAMES 'utf8'");
$this->query("SET CHARACTER SET utf8");
$this->query("SET COLLATION_CONNECTION=utf8_general_ci");

Read this for more background: http://expressionengine.com/forums/viewthread/81472/
Remember to backup!

Profile
 
 
Posted: 14 September 2008 02:49 AM   [ Ignore ]   [ # 8 ]  
Lab Assistant
RankRank
Total Posts:  131
Joined  04-02-2006

The MySQL documentation suggests that “SET NAMES ‘utf8’” alone would be sufficient,
and also suggests that it is necessary “to execute SET NAMES every time you start up
only if you want to use a character set that is different from the default”, and suggests that—
if you want to use a character set that is different from the default—it is preferable to
“add the—default-character-set option setting to your mysql statement line, or in your
option file (see docs)” rather than execute SET NAMES as a separate query.

Profile
 
 
Posted: 14 September 2008 10:18 AM   [ Ignore ]   [ # 9 ]  
Moderator
Avatar
RankRankRankRankRankRankRank
Total Posts:  23523
Joined  05-20-2002

Thanks for laying out what you did, Richard.  And thanks to KeithW for elaborating.  (I suspect I’ll be using this myself.)  Glad all is now squared away w/the charset issues!

 Signature 

AKA rob1

Help Request TipsPro Network

Profile
 
 
   
 
 
Post Marker Legend
New Topic New posts Hot Topic Hot Topic with new posts New Poll New Poll Moved Topic Moved Topic Sticky Topic Sticky topic
Old Topic No new posts Hot Old Topic Hot Topic with no new posts Old Poll Old Poll Closed Topic Closed Topic Announcement Announcements
Theme
Change Theme
Visitor Statistics
The most visitors ever was 1149, on July 16, 2007 09:33 AM
Total Registered Members: 64935 Total Logged-in Users: 54
Total Topics: 81900 Total Anonymous Users: 31
Total Replies: 440273 Total Guests: 268
Total Posts: 522173    
Members ( View Memberlist )