We have successfully shifted a fair-sized site from a proprietary CMS to EE, but now have problems with non-ASCII characters.
The database includes a lot of placenames in Austria, France, Switzerland and other countries which include accented characters. It also includes lots of fancy typesetting characters - en rules and curly quotes, mainly - because the content originates from an annual book. We use InCatalog to sync the InDesign book pages with a FileMaker database. Until now, we have annually exported the data from that database to spreadsheets, and imported those into the CMS. With the move to EE, we saw the opportunity to keep the FileMaker database and the site database in sync more elegantly and efficiently, because FileMaker can access an SQL database as an External SQL Source. We have such a system working, but we can’t use it in practice because we can’t get these fancy characters to work properly.
We’re working on Macs, but our Filemaker database is remotely hosted on Windows. This has not caused us any problems up to now.
We seem to have no control over how FileMaker encodes what it sends to the EE SQL database, because FileMaker does not support the latest version of the MySQL connector in which apparently that encoding can be specified.
When we write text to an EE SQL field from the FM database, all looks fine from the FM end, but in EE (back end and site) we get the dreaded black diamond question marks replacing all the fancy characters. If we put fancy characters into the EE backend they appear correctly on the site, but look like garbage when inspected in MySQL via FileMaker.
We do have control over the MySQL database behind our EE site, so we thought the solution would be to set the character set/collation of the affected tables or fields appropriately. We were surprised to find that the database as a whole is set to use UTF-8, but that everything else seemed to be set to a collation of latin1_swedish. But changing the collation of a test field has not solved the problem. We’ve tried utf8_general and macroman_general, and the results change but the problem is not resolved.
I have read countless threads on this forum related to encoding difficulties with moving data from one system to another, which miraculously we seem to have managed without problems. But we do need to be able to edit the data behind the EE site directly, without producing garbage in the site. It is, surely, just a matter of rectifying some settings, somewhere. But where?
Chris