ExpressionEngine CMS
Open, Free, Amazing

Thread

This is an archived forum and the content is probably no longer relevant, but is provided here for posterity.

The active forums are here.

Converting ExpressionEngine databases to UTF-8 or other character sets

August 14, 2009 1:19pm

Subscribe [2]
  • #1 / Aug 14, 2009 1:19pm

    Magnus Wester

    38 posts

    In MySQL 5 and later, it is simple to convert a MySQL table to UTF-8:

    ALTER TABLE tbl_name CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci;

    Use the collation sequence natural for your language, e.g. utf8_swedish_ci if your data is mostly viewed by Swedish users expecting å, ä and ö to be sorted in that order.

  • #2 / Sep 03, 2009 2:38pm

    bsjosten

    100 posts

    Magnus,

    Are you suggesting running the query on each table in the EE database? Or am I misunderstanding?

    Thanks,

    /Bo

  • #3 / Sep 03, 2009 3:38pm

    Magnus Wester

    38 posts

    There are numerous posts in this forum about converting the EE database between character sets, mainly converting to UTF-8. There are several examples on how to code such a conversion. My post was mainly intended to help someone searching for more information about character set conversion for EE databases. I have never considered such a conversion myself.

    For me personally the character set issue is really unimportant. I can understand that non-USASCII characters are converted back and forth between UTF-8 in the browser and latin1 in the EE database. As far as I can tell it works fine as it is. The most important thing for me is the collation, since it maintains the alphabetical sequence that I am used to.

    But just for the hell of it, I converted a few EE tables to UTF-8 using the command above, and they work great. I stayed away from any table that didn’t appear to contain any text, e.g. hash tables.

    If you really need to change the character set of your ExpressionEngine data I suggest you read all there is to read about it in these forums, then use the command above if it helps you avoid more complex solutions to the same problem. If all you want is a pure UTF-8 environment I suggest you wait for ExpressionEngine 2.0 before you do anything to your tables.

  • #4 / Sep 03, 2009 3:48pm

    bsjosten

    100 posts

    Magnus,

    My issue is also collation, for Swedish characters. As far as I can tell, the database is already UTF-8, but collates wrong with Å before A, O and Ö together, etc.

    Any quick fixes for that, short of database updates? Can’t really afford to wait for 2.0.

    /Bo

  • #5 / Sep 03, 2009 6:33pm

    Magnus Wester

    38 posts

    I’m sorry, I’m definitely not an MySQL expert. But be aware that MySQL has extensive character set support, and it operates not only on the database level but also on the server, table and column level. I guess the character set is inherited in a logical fashion. So instead of setting it for each column, you set a default for the whole table, etc.

    Also, ExpressionEngine communicates with MySQL expecting certain character sets. So if you change a particular column and index to UTF-8, EE could still be using latin1 data in WHERE clauses for that column, and then you would get unexpected key-not-found situations.

    For each column that you have a problem with, you must first determine where its character set is defined, and try to reset it to the default character set for that column as used by EE. Normally I believe EE 1.x was written for latin1.

    For each affected column, you must then select a collation appropriate for the column’s explicit or implicit character set. The most common Swedish collations are latin1_swedish_ci and utf8_swedish_ci. The most natural solution to your problem is to ALTER each table you have a problem with, setting the appropriate Swedish collation for the character set using the COLLATE keyword.

    ALTER TABLE tbl_name
        [[DEFAULT] CHARACTER SET charset_name]
        [COLLATE collation_name]

    Of course you can do this also on the server, database or column level if you prefer.

    Read more about MySQL character set support at http://dev.mysql.com/doc/refman/5.0/en/charset.html

.(JavaScript must be enabled to view this email address)

ExpressionEngine News!

#eecms, #events, #releases