ExpressionEngine CMS
Open, Free, Amazing

Thread

This is an archived forum and the content is probably no longer relevant, but is provided here for posterity.

The active forums are here.

1.6.9 to 2.1.3 Upgrade: Problem with character encoding

April 11, 2011 5:10pm

Subscribe [4]
  • #1 / Apr 11, 2011 5:10pm

    cafenoir

    43 posts

    This question may be related to a resolved thread.

    I’m upgrading from 1.6.9 core to 2.1.3 non-profit. I’m doing this locally using MAMP as a test before upgrading the live site. The site is in French, and after the upgrade, all the accented characters are turned into question marks.

    For example the word Événements becomes ?v?nements.

    The original site character set was utf-8, and the new one is too, and I’ve included the following in my templates:

    <meta http-equiv="Content-Type" content="text/html; charset={charset}" />

    I’ve tried importing the database as utf-8 and as latin-1. In both cases if I look at the structure of the db using phpMyAdmin, all the tables are using latin1_swedish_ci collation before the upgrade to ee 2.1.3.

    After the upgrade most are using utf8_general_ci. I’ve also tried manually changing any tables that were not changed to utf8_general_ci.

    I’ve tried editing the db flat file using a text editor before importing and replacing CHARSET=latin1 with CHARSET=utf8.

    What else can I try?

  • #2 / Apr 11, 2011 10:33pm

    Nicolas Bottari

    143 posts

    Bonsoir Café Noir,

    I had problems with upgrades from EE 1.x to 2.x like these before with foreign characters (Japanese in my case). Encoding problems are a mess, but the good news is that once you get the site on EE2 and UTF-8, you’re set 😊

    Do you have “Convert High ASCII as entities” enabled on EE 1.6.9?
    Is your EE 1.6.9 database table collation latin1_swedish_ci? I’m speculating that even at this stage, “?” can already be seen in the database for foreign characters, as latin1_swedish_ci tables cannot handle some characters (and displays ? instead). In this situation, does the exported database show “?“s when you look at its contents?

    Have you tried this: how about converting your EE 1.6.9 tables to the utf8_general_ci collation and changing table columns to utf-8, followed by import or upgrade into EE2?

    I have also tried turning “Convert High ASCII as entities” on in EE 1.6.9, checking that entities are being saved in the database, and went on with the upgrade to EE2 with some success except for categories, which still gave me garbled text. Still at least entry titles, custom fields, etc were ok!

  • #3 / Apr 11, 2011 11:02pm

    cafenoir

    43 posts

    Bonsoir,

    Thanks for the reply.

    No, I don’t have ‘Convert High ASCII as entities’ enabled. The db connection collation is utf8_general_ci. The database file doesn’t show any errors when read in a text editor. I see the text written correctly, accents included.

    In CP Home ›  Admin ›  System Preferences ›  General Configuration, I have the default character set as iso-8859-1.

    I have tried to convert the db using the command line and iconv, with no luck. It fails, but there is no error message.

    I have also tried turning “Convert High ASCII as entities” on in EE 1.6.9, but the entities are not saved in the database. I still see the accents.

  • #4 / Apr 11, 2011 11:19pm

    Nicolas Bottari

    143 posts

    I see. The db connection collation may be utf8_general_ci, but are the tables all utf8_general_ci as well? Have a look at exp_weblog_titles or exp_weblog_data’s table collation and field encodings, for example. Converting all of these to unicode might work.

    Also, how about turning the character set to utf8 in the CP before the upgrade?

    For “Convert High ASCII as entities”, turning this on doesn’t convert previous data to entities, but future data with foreign characters will be saved as entities. Luckily this option isn’t present nor needed in EE2. I really wish conversion would propagate to all db data in EE 1.x, though.

  • #5 / Apr 11, 2011 11:33pm

    cafenoir

    43 posts

    All the tables are latin1_swedish_ci prior to upgrading.

    If I change the default character set to utf-8 in the admin, all the text on the live site breaks and I get diamonds for the accented characters.

    I’ve already tried to convert the db using the command line and iconv (as explained here), with no luck. It fails, but there is no error message.

  • #6 / Apr 11, 2011 11:39pm

    Nicolas Bottari

    143 posts

    Seems that all the tables are latin1_swedish_ci

    Try changing them to utf8_general_ci. In phpMyAdmin, I believe from the database, select your table and go in “Operations” to change the table collation.

    If I change the default character set to utf-8 in the admin, all the text on the live site breaks and I get diamonds for the accented characters.

    I see. Maybe this won’t happen once the table collations are in utf8_general_ci? The diamonds may be from EE trying to display data stored in a latin1_swedish_ci table as utf-8, which it fails to do correctly. I think I ran into something similar in the past, but not 100% sure at this time.

  • #7 / Apr 12, 2011 12:04am

    cafenoir

    43 posts

    Even if I change every table to UTF-8 manually, it has no effect. On the website there are still question marks.

    The database that I export after turning the character set to utf-8, is pretty much identical to when I don’t.

  • #8 / Apr 12, 2011 12:29am

    Nicolas Bottari

    143 posts

    Do the question marks still show up in EE1 or after upgrading to EE2?
    I bet you tried this, but would leaving the character set as it was originally, then set table collections to utf8_general_ci work?

    Maybe one of the EllisLab team members can take a stab at this one?

  • #9 / Apr 12, 2011 6:47am

    John Henry Donovan

    12339 posts

    Hi cafenoir,

    I believe the issue here is that all your data was in Latin1 before upgrade to UTF-8. I found an article here detailing the issue and a shell script to patch things up too before your upgrade

    Let us know if that helps

    Original thread I found here

.(JavaScript must be enabled to view this email address)

ExpressionEngine News!

#eecms, #events, #releases