ExpressionEngine CMS
Open, Free, Amazing

Thread

This is an archived forum and the content is probably no longer relevant, but is provided here for posterity.

The active forums are here.

Problem with encoding on importing SQL update file

September 30, 2010 2:22pm

Subscribe [3]
  • #1 / Sep 30, 2010 2:22pm

    Hi everybody,

    I’ve been scouring the forums, but haven’t found anything with quite this kind of encoding issue. We’ve got a site in English that we’re translating to Spanish as well. I dumped the database to our translation bureau, who returned to me a SQL update file with only the Spanish parts, and all that data seems to import just fine into all the right places (custom fields, etc) in the database.

    When I look at the Spanish data through Sequel Pro (I’m on a Mac) all the accents and such seem to appear correctly in the body_es custom field (a Wygwam field),and in the title_es custom field (simp text field). In the control panel, the body_es still looks fine, but the title_es displays the “special” characters appear as white question marks in black diamonds.

    At this point, if I view the site, any Spanish characters appear corrupted (even the body_es ones that look good in CP), unless I force the browser to use Western (ISO Latin 1) encoding which makes everything display correctly. The weird thing is that if I simply save the entry without making any edits at all, then all the data still looks exactly the same in the Control Panel as it did before, but the site starts to display correctly under the default encoding (UTF-8) and no longer looks right under forced Western Latin; except the “corrupted” titles retain their questions marks in CP and on the site.

    It’s almost as if the SQL import is bringing everything into the database correctly, but ExpressionEngine doesn’t recognize it as UTF-8 until I save the entry, but saving every entry individually isn’t really an option, clearly…

    As far as I can tell, I did my best from the beginning of the project to make everything UTF-8 the whole way through. The SQL dump has default encoding UTF-8 for all tables, and it seems still to be that watt after the import.

    It’s a 1.6.9 install with mySQL 5.1.0. Any other data tat might be helpful?

    Anyone has any ideas what might be going on?

    Thank you so much,
    Willhaus

  • #2 / Oct 01, 2010 10:30am

    Ingmar

    29245 posts

    At this point, if I view the site, any Spanish characters appear corrupted (even the body_es ones that look good in CP), unless I force the browser to use Western (ISO Latin 1) encoding which makes everything display correctly.

    This is most likely an encoding issue. What charset are you using? We recommend utf-8 for all new sites. You need to make sure that you translated data is in the same format when you import it.

    except the “corrupted” titles retain their questions marks in CP and on the site.

    There’s something else going on with the titles, then. Are they different in any way? Do they come from the same file / db dump?

    It’s almost as if the SQL import is bringing everything into the database correctly

    As Latin-1, probably.

    but ExpressionEngine doesn’t recognize it as UTF-8 until I save the entry

    At which point it’s being rewritten to the db as utf-8.

    As far as I can tell, I did my best from the beginning of the project to make everything UTF-8 the whole way through. The SQL dump has default encoding UTF-8 for all tables, and it seems still to be that watt after the import.

    Are you sure the file you got back from the translators was utf-8 encoded? Are you sure to have used utf-8 encoding upon importing your file?

  • #3 / Oct 02, 2010 2:39am

    ¡Ingmar!

    I was hoping you might chime in - from reviewing the forums, you certainly have helped many others in this area. Your message made me think this through super carefully, and I think I’ve figured it out.

    The troubles displaying the spanish body were caused by Wygwam somehow (no time to investigate why). When I turned it off for the body_es fields, they started to display on the site correctly. But they looked just like the title_es fields - white question marks in black diamonds all over the place.

    The solution to that was to add

    mysql_query("SET NAMES 'utf8'");

    to /system/db/db.mysql.php as described in this wiki post (although I deleted the hyphen from “utf-8”).

    That, in turn, exposed some characters in the English version of the site that weren’t properly encoded, but as soon as I fixed all of those in the CP entries, everything seems to be working well in UTF-8.

    Just in time to translate the site to Vietnamese! Perhaps I’ll be back in the forums again, depending on how that goes.

    Thanks very much, Ingmar, for your time. I love that I can post in these forums and feel pretty sure I’ll find some help!

    Sincerely,
    Willhaus

  • #4 / Oct 04, 2010 1:46am

    John Henry Donovan

    12339 posts

    Glad you are up and running again. Feel free to start a new thread if you have any more questions when it comes time to start your Vietnamese version.

.(JavaScript must be enabled to view this email address)

ExpressionEngine News!

#eecms, #events, #releases