ExpressionEngine CMS
Open, Free, Amazing

Thread

This is an archived forum and the content is probably no longer relevant, but is provided here for posterity.

The active forums are here.

SQL encoding of foreign character sets

November 20, 2009 6:05pm

Subscribe [5]
  • #31 / Nov 23, 2009 9:26pm

    KeithW

    138 posts

    This FAQ item explains how to setup Filemaker to export UTF-8. Of course this applies for the MySQL ODBC connector, not just for the Actual Tech. one.  I submitted a “pre-purchase question” to Filemaker asking why they are continuing to recommend out-of-date and buggy versions of the MySQL connector as “supported” when there is supposed to be a “partnership” with MySQL.
    I also sent MySQL a copy of the question, and suggested that they chase it too.

  • #32 / Nov 24, 2009 6:34am

    Ingmar

    29245 posts

    I’m still not clear how to proceed

    You need to make sure that both FM and EE use the same character encoding. I have a feeling that it might be easier to have EE use Latin-1 than teaching Unicode to FM.

    So, go to Admin > System Preferences > General Configuration > Default Character Set and set it to iso-8859-1. Check that your template contains a line like this

    <meta http-equiv="Content-Type" content="text/html; charset={charset}" />

    Then post new articles, both via EE and by having FM do its thing.

    The first is academic, but something I would like to understand for future reference: why does the wiki have instructions on hacking EE to use utf-8 when in my case it seems to be doing so by default?

    You must distinguish between the encoding that is used in the control panel and your templates (utf-8 by default), and what is stored in the database (default depends on your database, the MySQL default is “latin1_swedish_ci”). The latter is usually irrelevant, as the encoding will be changed on the fly if needed. You can easily store UTF-8 encoded characters (or, really, any encoding) in a Latin-1 database. This is what EE does in 1.6.x. (EE 2.0 will have switched to utf-8 everywhere, including the database.)

    Why the hack in the wiki? Because some user put it there. Under very rare circumstances this encoding will matter for the target system. Yours should not be one of these cases, however, so no hacks required.

    The second is the one I need to resolve in order to follow your suggestion to give latin-1 a try: how do I get EE to do that, so that EE and FM are writing the same stuff to SQL, and sending the same stuff to the browser?

    Change the default encoding as outlined above, and make sure that the correct “Content-Type” header appears in your templates.

    what are the effects of all the variables shown in the EE SQL Manager, and how do I change them, if I need to? Screen shot from our CP attached, showing a worrying mix of the two charsets.

    These are system variables used by MySQL to determine if and how to convert your data. You have changed some of them, it seems (collation_database and character_set_database, at a quick glance). Changing such variables is usually not sufficient since you’d also need to convert the data, but at the same time almost never necessary. Best to leave them alone.

  • #33 / Nov 24, 2009 7:15am

    Chris Gill

    24 posts

    Thanks Ingmar. You’re either up early, or sitting somewhere in Europe?

    I will follow your instructions. I confess I am having some difficulty understanding why you think that in our case the character set used for storage in the database doesn’t matter, and therefore does not need to be matched in the two cases (EE and FM). I can see that if you are just using EE, all that matters is what displays in the CP and the site. But I’d have thought our case was one where encoding in the database is at the root of the problem.

    However, let’s see what happens when I follow your instructions.

    Chris

  • #34 / Nov 24, 2009 9:28am

    Ingmar

    29245 posts

    Thanks Ingmar. You’re either up early, or sitting somewhere in Europe?

    Yes, it’s no secret that I’m in Vienna 😊

    I confess I am having some difficulty understanding why you think that in our case the character set used for storage in the database doesn’t matter ...

    We tested this, correct? When I forced the browser in your second example to treat the characters as Latin-1, they displayed correctly. They only didn’t in the first place because your template declared them (incorrectly) to be utf-8.

    Just give it a try and let us know how it goes.

  • #35 / Nov 24, 2009 10:37am

    Chris Gill

    24 posts

    Ingmar:

    Very happy to report that this simple change in the EE CP has done the trick. With the default charset changed, FM data is displaying correctly in the site and CP, and EE-originated data is displaying correctly in FM. Thanks.

    At the risk of wasting another moment of your time, though ... it does seem to me clear that this charset change results in EE writing latin-1 to the database, rather than utf-8. If I examine any other ski resort entry than my two dummy ones (well, any other entry with fancy characters), what FM displays is the classic utf-8 pairs of characters in place of fancy characters. Those entries, of course, were saved in EE with the default charset of utf-8. The two entries saved while I briefly had the default charset of iso-blah look perfect in FM. So they must be encoded in iso-blah in the database, no?

    We now face the different challenge of converting all the utf-8-encoded stuff in the database into iso-blah, of course. Unfortunately, only part of the site’s content comes from FileMaker. So it goes.

    Anyway: the bottom line for anyone wanting to follow us down the path of integrating EE and FileMaker databases is:

    change EE’s charset to iso-8859-1
    sort this out before you fill your site with data.

    Until, that is, FileMaker starts to support a more recent MySQL ODBC driver that allows you to move to the wonderful world of utf-8.

    Thanks again.

    Chris

  • #36 / Nov 24, 2009 11:27am

    KeithW

    138 posts

    Filemaker should reply to my question about that (officially approving a more recent version of the MySQL connector) within 24 hours.
    Also remember that Ingmar said that the soon-to-be-released EE 2.0 will be UTF-8 by default.

  • #37 / Nov 24, 2009 11:35am

    Ingmar

    29245 posts

    Very happy to report that this simple change in the EE CP has done the trick. With the default charset changed, FM data is displaying correctly in the site and CP, and EE-originated data is displaying correctly in FM. Thanks.

    I am very glad to hear that. It means that we have successfully “synchronized” charsets, as it were, standardizing on Latin-1 in the process. I would have recommended to go with utf-8 under normal circumstances, but FM seems to have issues there.

    it does seem to me clear that this charset change results in EE writing latin-1 to the database, rather than utf-8.

    That is correct and, not to put too fine a point on it, the purpose of our changing it 😊

    Those entries, of course, were saved in EE with the default charset of utf-8. The two entries saved while I briefly had the default charset of iso-blah look perfect in FM. So they must be encoded in iso-blah in the database, no?

    That is also correct. Try to open and re-save such an entry to see how it works; it should then display correctly, too. Just how many such entries do you have?

    Basically, I see two options: 1) Continue to work with FM, their support and/or their community to get it to export utf-8 after all; in which case you’d simply revert to EE using utf-8. 2) Convert the content of your database to iso-8859-1. This often means exporting the content, converting charsets, and re-importing. It does not simply mean to change the collation of your db.

  • #38 / Nov 24, 2009 11:52am

    Chris Gill

    24 posts

    it does seem to me clear that this charset change results in EE writing latin-1 to the database, rather than utf-8.

    That is correct and, not to put too fine a point on it, the purpose of our changing it 😊

    Well ... I read your 04:34 post to mean that your advice was to focus only on the character set used in the CP and site pages, and ignore what is stored. But it now seems that simply changing the charset changes all three.

    Keith kindly reminds me that EE2 will by default be all utf-8. If we have gone down the latin-1 road, should we expect difficulty with EE2, or will it again be just a question of setting the character set?

  • #39 / Nov 24, 2009 3:49pm

    Ingmar

    29245 posts

    Well ... I read your 04:34 post to mean that your advice was to focus only on the character set used in the CP and site pages, and ignore what is stored. But it now seems that simply changing the charset changes all three.

    EE uses the default charset to display how to encode the characters in its control panel, including the ones entered there via a form, then stores them in the database. When you retrieve the data again, all you have to do is make sure that your document has the correct “Content-Type” header.

    Keith kindly reminds me that EE2 will by default be all utf-8.

    In the database, yes. You would still be able to specify a different encoding in the control panel. Let me repeat: if you possibly can, I’d still recommend to go with utf-8. If, as seems to be the case, this is not an option, choose iso-8859-1. Whatever you do, you must use the same encoding for both FM and EE.

  • #40 / Nov 24, 2009 4:28pm

    Chris Gill

    24 posts

    OK - thanks. I think that probably concludes this very long discussion.

    Thanks again.

  • #41 / Nov 24, 2009 4:33pm

    Ingmar

    29245 posts

    OK - thanks. I think that probably concludes this very long discussion.

    I agree, I think we covered most of the issues at hand. If you do need additional help, with this topic or any other, please do not hesitate to post again, we’re always happy to help. Thanks 😊

.(JavaScript must be enabled to view this email address)

ExpressionEngine News!

#eecms, #events, #releases