ExpressionEngine CMS
Open, Free, Amazing

Thread

This is an archived forum and the content is probably no longer relevant, but is provided here for posterity.

The active forums are here.

SQL encoding of foreign character sets

November 20, 2009 6:05pm

Subscribe [5]
  • #1 / Nov 20, 2009 6:05pm

    Chris Gill

    24 posts

    We have successfully shifted a fair-sized site from a proprietary CMS to EE, but now have problems with non-ASCII characters.

    The database includes a lot of placenames in Austria, France, Switzerland and other countries which include accented characters. It also includes lots of fancy typesetting characters - en rules and curly quotes, mainly - because the content originates from an annual book. We use InCatalog to sync the InDesign book pages with a FileMaker database. Until now, we have annually exported the data from that database to spreadsheets, and imported those into the CMS. With the move to EE, we saw the opportunity to keep the FileMaker database and the site database in sync more elegantly and efficiently, because FileMaker can access an SQL database as an External SQL Source. We have such a system working, but we can’t use it in practice because we can’t get these fancy characters to work properly.

    We’re working on Macs, but our Filemaker database is remotely hosted on Windows. This has not caused us any problems up to now.
    We seem to have no control over how FileMaker encodes what it sends to the EE SQL database, because FileMaker does not support the latest version of the MySQL connector in which apparently that encoding can be specified.

    When we write text to an EE SQL field from the FM database, all looks fine from the FM end, but in EE (back end and site) we get the dreaded black diamond question marks replacing all the fancy characters. If we put fancy characters into the EE backend they appear correctly on the site, but look like garbage when inspected in MySQL via FileMaker.

    We do have control over the MySQL database behind our EE site, so we thought the solution would be to set the character set/collation of the affected tables or fields appropriately. We were surprised to find that the database as a whole is set to use UTF-8, but that everything else seemed to be set to a collation of latin1_swedish. But changing the collation of a test field has not solved the problem. We’ve tried utf8_general and macroman_general, and the results change but the problem is not resolved.

    I have read countless threads on this forum related to encoding difficulties with moving data from one system to another, which miraculously we seem to have managed without problems. But we do need to be able to edit the data behind the EE site directly, without producing garbage in the site. It is, surely, just a matter of rectifying some settings, somewhere. But where?

    Chris

  • #2 / Nov 20, 2009 6:44pm

    Adam Dorsey

    1439 posts

    Hi Chris-

    In order to get the “fancy” characters to display properly in most web browsers, they need to be converted to into its HTML entity. You can force this in ExpressionEngine by setting “Automatically Convert High ASCII Text to Entities” to YES.

    This option can be found at: Admin ›  Weblog Administration ›  Global Weblog Preferences

    However, this conversion happens when you submit entries, not at run time, so you’ll need to edit/re-save any entries containing High-ASCII values for this feature to work.

    Also, I’m not sure if FileMaker will display character encodings properly, but this is something outside the realm of what we can support.

    Here is a link to the docs, regarding “Automatically Convert High ASCII Text to Entities.”

    Does this help your situation at all?

  • #3 / Nov 20, 2009 7:14pm

    Chris Gill

    24 posts

    Adam:

    Sorry, you’re way off target.

    We have no problem displaying fancy characters in our site. The problem is nothing to do with browsers.

    The problem is that we need to control the encoding of these characters in the EE-SQL database, so that it is compatible with the encoding used by FileMaker.

    Chris

  • #4 / Nov 20, 2009 7:46pm

    Lisa Wess

    20502 posts

    Greetings, Chris,

    As you are likely aware, character encodings can be a difficult thing to work with.  I ask that you follow our TSS’ advice and try the items they recommend.  They may not fix these issues immediately but we’re working towards isolating a solution with you, and your cooperation will get us there more quickly.

    Also please include a link to a page on the site that shows the problematics characters.

    Thank you.

  • #5 / Nov 21, 2009 5:43am

    Chris Gill

    24 posts

    Hi Lisa

    OK - cooperation setting increased to 11 on a scale of 10.

    I have set EE to save high end chars as entities. This is not an avenue we want to go down long-term, but I’m cooperating.

    To show the problem as clearly as possible I have built a special page which includes a special field where we can tinker with encoding without affecting the database and site as a whole. I have created two dummy entries, and below are links to them. On the pages I am going to link to, the section headed Summary is a standard field with its collation still at the default setting of latin1_swedish; the section headed Testbed is my specially created field, which at present is set to a collation of macroman. (I have also tried utf9 - neither solves our problem.) So:

    On the Norton St Philip page the text has been created in EE, and is displaying fine. It displayed fine before we made the change to saving as entities, though.

    On the Frome page the text has been written to the SQL database from FileMaker, and is displaying garbage fancy characters. This is the mechanism we need to get working properly.

    I am attaching screenshots showing how the two relevant fields look when pulled out of the SQL DB by FM. In the Norton record, we are seeing html entities. In the Frome record, we are seeing the text as it should appear - not surprising, because it originated from FM.

    With cooperation still set at 11, I say again that our problem is about encoding in the SQL DB. We need to get EE and FM writing and reading the same encoding. I am attaching a further screenshot which may help: the EE SQL Manager variable display. Seems to me we have a bit of a mess here, with a variety of settings. A good start, I would have thought, would be an explanation of what each of these settings does, and how to change it.

    regards

    Chris

  • #6 / Nov 21, 2009 1:36pm

    Lisa Wess

    20502 posts

    Chris,

    While we are trying to help you, at this point it really is not an ExpressionEngine issue.  EE characters are saving properly and appearing properly on the pages.  Filemaker’s are not.  This is really a third-party issue and not something we can offer official support on.

    With that said, we have a few on our staff more advanced with character set issues - Ingmar Greil in particular.  He will be back on Monday and I will ask him to take a look at your post and see if he has any words of wisdom that might steer you towards a solution.

  • #7 / Nov 21, 2009 6:05pm

    Chris Gill

    24 posts

    Thanks Lisa - appreciate your help.

    Chris

  • #8 / Nov 22, 2009 6:42am

    Chris Gill

    24 posts

    Lisa/Ingmar:

    On reflection, I ought to raise again the following from one or two messages back:

    I am attaching a further screenshot which may help: the EE SQL Manager variable display. Seems to me we have a bit of a mess here, with a variety of settings. A good start, I would have thought, would be an explanation of what each of these settings does, and how to change it.

    Are you able to explain the various character_set… settings?

    Chris

  • #9 / Nov 22, 2009 9:37am

    KeithW

    138 posts

    You could try Googling for “Filemaker Character Encoding”, you’ll get a lot of hits. For example Migrating Filemaker data to UTF-8.
    A lot of people seem to be having similar problems with Filemaker. Your simplest solution may be to ensure that Filemaker data
    is converted (manually or otherwise) to UTF-8, and also ensure that EE is saving your data in the database as UTF-8 (i.e. I think
    you won’t be able to use HTML entities)—you can check the actual encoding being used in the database using phpMyAdmin,
    which allows you to change the display character set (if the display character set and the stored data use different encodings
    then you’ll see garbled characters).
    There are ways to force EE to use a particular character set encoding like UTF-8 in its MySQL database—see the
    mysql_query(“SET NAMES ‘utf8’”); comment below the Installation Instructions, for example. 
    You can also set the “lang” attribute in your web page HTML, and it may also help to set the EE charset encoding to UTF-8.
    Only your own EE data need be stored as UTF-8, it should not matter what other data is stored as.

  • #10 / Nov 22, 2009 12:44pm

    Chris Gill

    24 posts

    Thanks Keith

    I will certainly keep searching for a solution, via Google as well as the tech support facilities and forums of all the components of our system - FileMaker, our FileMaker hosts, MySql, our MySQL hosts, ExpressionEngine. Somebody, somewhere, must have already confronted this problem.

    Conversion of our FileMaker data is not a sensible option, unless we do it on the fly (and if we have to do that, we will - as a last resort). Our data is constantly being updated via links to InDesign pages.

    I have checked out the user note at the end of the install instructions that you refer to, and it does seem to suggest that the encoding used by EE can be controlled. I’ll be interested to see if Ingmar has any comments on this on Monday.

    I have also followed your second link, and it goes to documentation that appears to be flawed. It says:

    charset
    {charset}
    This variable will be substituted for the global character set preference under Admin > Control Panel Settings. It is typically used in your META tags to indicate the character encoding:
    <meta http-equiv=“Content-Type” content=“text/html; charset={charset}” />
    If you wish to use the character encoding you have specified for a particular weblog (in the Control Panel Admin > Weblog Management page) you may use the Weblog Information EE Tag instead.

    But there is no such place as Admin > Control Panel Settings. There is Admin > System Pref > Control Panel Settings but it does not contain such a setting. Admin > System Pref > General Config offers a Default Character Set setting.

    However ... in any event I’m not convinced this determines how characters are encoded in MySQL. I think it is likely to affect only how EE presents characters in the CP and the site.

    Anyway, thanks again for your suggestions. Right now I’m awaiting Ingmar’s contribution tomorrow.

    Chris

  • #11 / Nov 22, 2009 6:39pm

    KeithW

    138 posts

    You might also look at the EE wiki

  • #12 / Nov 22, 2009 7:49pm

    KeithW

    138 posts

    You might also look at the Filemaker forums. You should
    also check that your hosting service uses MySQL 5.x

  • #13 / Nov 23, 2009 4:26am

    Chris Gill

    24 posts

    Thanks Keith

    Most of the wiki stuff is about switching languages, but this entry is about the same configuration step you pointed me at in an earlier message (the one appended to the install instructions).

    So, subject to further input later today from encoding specialist Ingmar, that would seem to be the way forward.

    I have spent a lot of time in the FileMaker forums, and have posted to several threads there. Right now, in fact, it is looking increasingly likely that the FileMaker/SQL problem is going to be more difficult to resolve than the EE/SQL problem. It appears that the MySQL ODBC driver that FileMaker uses is not capable of writing high-end characters at all. If that is the case, and if FileMaker cannot move in the near future to a more up to date driver, it looks like we will have to resort to conversion to html entities.

  • #14 / Nov 23, 2009 10:16am

    KeithW

    138 posts

    The Filemaker Forum link that I posted suggests that the latest MySQL ODBC connector works
    perfectly for UTF-8 conversion, whether it’s officially approved by the Filemaker people or not.

  • #15 / Nov 23, 2009 10:20am

    Chris Gill

    24 posts

    Yeah. Trouble is our FM databases are remotely hosted on a shared server, and the hosting firm won’t use an unsupported driver.

.(JavaScript must be enabled to view this email address)

ExpressionEngine News!

#eecms, #events, #releases