ExpressionEngine CMS
Open, Free, Amazing

Thread

This is an archived forum and the content is probably no longer relevant, but is provided here for posterity.

The active forums are here.

Expressionengine 2 upgrade breaks UTF-8 characters

September 01, 2011 4:56am

Subscribe [2]
  • #1 / Sep 01, 2011 4:56am

    truthnews

    37 posts

    Hi,

    I have a website currently running in EE1.6.8.

    The weblog content is already in UTF-8 and the CP Default Character Set is UTF 8. All the UTF-8 characters are working in the existing site and can be seen correctly within PHPmyAdmin.

    EXAMPLE:

    Toothpaste and kerosene, your car’s best friends

    In the above example the apostrophe is a curly apostrophe and can be seen as such within phpMyAdmin using utf8_general_ci.

    The table is defintieyl utf-8 as shown here:

    ... show table status like “exp_channel_titles” \G

    Name: exp_channel_titles
             Engine: MyISAM
            Version: 10
         Row_format: Dynamic
               Rows: 14571
     Avg_row_length: 128
        Data_length: 1878872
    Max_data_length: 281474976710655
       Index_length: 1509376
          Data_free: 0
     Auto_increment: 61856
        Create_time: 2011-08-23 14:14:36
        Update_time: 2011-08-24 09:06:17
         Check_time: 2011-08-23 14:14:37
          Collation: utf8_general_ci
           Checksum: NULL
     Create_options:
            Comment:

    The problem is, that after running the EE 2.1.3 updater, the curly apostrophe is turned into a “?”, eg:

    Toothpaste and kerosene, your car?s best friends

    Note - I know 2.1.3 is not the latest version, but I am not at liberty to simply grab the latest updater. We are about ready to launch the new version and have done all our testing based on upgrading to this version.

    The site is: http://www.carsguide.com.au.

    Thanks in advance!

  • #2 / Sep 01, 2011 4:11pm

    Lisa Wess

    20502 posts

    Hi truthnews,

    Is your entire database UTF-8?  Also, do you know if you had any hacks in place for forcing UTF-8 in EE 1?  There was one hack in particular that caused some difficulties with the updater.

    Thanks!

  • #3 / Sep 01, 2011 6:37pm

    truthnews

    37 posts

    Hi truthnews,

    Is your entire database UTF-8?  Also, do you know if you had any hacks in place for forcing UTF-8 in EE 1?  There was one hack in particular that caused some difficulties with the updater.

    Thanks!

    Hi Lisa,

    Actually it was just the weblog tables that were UTF-8 - and yes we were using the code hack.

    How do you recommend we proceed?

    cheers 😊

  • #4 / Sep 01, 2011 7:00pm

    Lisa Wess

    20502 posts

    Hrm, I’m not sure if there is a clean way to recover when you’ve used that hack; and that is one of many reasons we so strongly discourage hacking the core.

    Let me see if the devs have any ideas on this.

    Thank you.

  • #5 / Sep 02, 2011 11:34am

    Robin Sowell

    13255 posts

    Hrm- is it possible to re-import?  I’m thinking we could have it skip the weblog tables on the utf-8 conversion part- only weblog_titles and weblog_data were stored utf-8, yes?

    I also considered trying to just use the unupdated version and manually change a few field names, but there are some datetime conversions that need to happen, so that might well be messy.

    IF you haven’t added new channel data but have tweaked other things (templates, etc)- might could do the modified conversion and then just move over the resulting channel_titles/channel_data tables.

    Anyway- brainstorming.  First question is- is a re-import a viable option.  If not a total re-import, is replacing the current channel_data/title tables with the (re-converted) old ones an option?

  • #6 / Sep 02, 2011 12:49pm

    truthnews

    37 posts

    Hrm- is it possible to re-import?  I’m thinking we could have it skip the weblog tables on the utf-8 conversion part- only weblog_titles and weblog_data were stored utf-8, yes?

    I also considered trying to just use the unupdated version and manually change a few field names, but there are some datetime conversions that need to happen, so that might well be messy.

    IF you haven’t added new channel data but have tweaked other things (templates, etc)- might could do the modified conversion and then just move over the resulting channel_titles/channel_data tables.

    Anyway- brainstorming.  First question is- is a re-import a viable option.  If not a total re-import, is replacing the current channel_data/title tables with the (re-converted) old ones an option?

    Hello Robin,

    Can you explain what you mean by “re-import”?

    Also, I don’t know what you mean by “re-converted”

    thanks!

  • #7 / Sep 02, 2011 1:02pm

    Robin Sowell

    13255 posts

    Bleh- ‘import’ was a total misnomer.  By that I mean- run through the 2.0 update- that’s where the utf-8 conversion happens.  The easiest fix I can think of is to update from 1.x -> 2.x again- and edit the update file so it does not do the utf-8 conversion on your weblog data.

    The next easiest solution- do the above, but then just substitute in the weblog/channel tables for the ones on your current version- this is a good option if you have added/edited templates and such- but NOT added/edited any of your channel data.

    If neither of those is an option, it gets messier.

    Hope that clarified.

  • #8 / Sep 02, 2011 1:22pm

    truthnews

    37 posts

    Bleh- ‘import’ was a total misnomer.  By that I mean- run through the 2.0 update- that’s where the utf-8 conversion happens.  The easiest fix I can think of is to update from 1.x -> 2.x again- and edit the update file so it does not do the utf-8 conversion on your weblog data.

    The next easiest solution- do the above, but then just substitute in the weblog/channel tables for the ones on your current version- this is a good option if you have added/edited templates and such- but NOT added/edited any of your channel data.

    If neither of those is an option, it gets messier.

    Hope that clarified.

    Sorry, still having trouble understanding ... do you mean we should hack the updater script? You said: “and edit the update file so it does not do the utf-8 conversion on your weblog data”

    cheers (:

  • #9 / Sep 02, 2011 2:11pm

    Robin Sowell

    13255 posts

    Right- but I won’t just say ‘hack the script’- I want to be sure the above is a viable option for you.  IF you can re-update (take your 1.x database and update it to whatever 2.x version you want), then I’ll walk you through what exactly to change so the channel data isn’t converted.

    I’m just trying to make sure that’s a good option- if it is, I’ll walk you through how to do it.

  • #10 / Sep 02, 2011 2:34pm

    truthnews

    37 posts

    Right- but I won’t just say ‘hack the script’- I want to be sure the above is a viable option for you.  IF you can re-update (take your 1.x database and update it to whatever 2.x version you want), then I’ll walk you through what exactly to change so the channel data isn’t converted.

    I’m just trying to make sure that’s a good option- if it is, I’ll walk you through how to do it.

    Umm… yeah We are curretly running the updated version on a staging server .. it is not live yet. Is that what you mean?

  • #11 / Sep 02, 2011 2:43pm

    Robin Sowell

    13255 posts

    Sort of- do you have the 1.x database and config file so you can run the update from 1.x to 2.x over again?  And will you lose important data added since you updated to 2.x if you do so?

  • #12 / Sep 04, 2011 8:13pm

    truthnews

    37 posts

    Sort of- do you have the 1.x database and config file so you can run the update from 1.x to 2.x over again?  And will you lose important data added since you updated to 2.x if you do so?

    To answer your question - we have a staging server and a live server. The live server in running 1.6.8. The staging server is running 2.1.3.

    So we can start again with the content from the live server and convert it again. We also have backups etc… should anything go wrong.

  • #13 / Sep 05, 2011 2:19am

    truthnews

    37 posts

    Sort of- do you have the 1.x database and config file so you can run the update from 1.x to 2.x over again?  And will you lose important data added since you updated to 2.x if you do so?

    To answer your question - we have a staging server and a live server. The live server in running 1.6.8. The staging server is running 2.1.3.

    So we can start again with the content from the live server and convert it again. We also have backups etc… should anything go wrong.

    Unfortuntely due to deadlines etc we have been forced to move ahead with this - and it looks like we’ll have to live with the unwanted question marks in our articles.

    It’s a pity about the timing, because I’m sure we could have got a fix for this, but my paymasters have ordered me to proceed and they said they will just manually fix the bad characters.

    I’m still interested to know what your suggested fix is, but it will probably be too late for us to apply it by the time you get this after your public holiday.

    cheers!

  • #14 / Sep 06, 2011 3:29pm

    Robin Sowell

    13255 posts

    Argh- yes, bad on the timing.  What I’d have tried would be to edit installer/updates/ud_200.php around line 400:

    foreach ($tables as $table)
            {
    // edit
                if ($table == 'exp_weblog_data' OR $table == 'exp_weblog_titles')
                {
                    continue;
                }
    // end edit
    
                $progress    = "Converting Database Table {$table}: %s";

    Then run the update from 1.x->2.x.  What this would do is skip converting for the weblog_data and weblog_titles tables (assuming the above were prefixed correctly).  Untested, but I believe it would have done the trick.

.(JavaScript must be enabled to view this email address)

ExpressionEngine News!

#eecms, #events, #releases