ExpressionEngine CMS
Open, Free, Amazing

Thread

This is an archived forum and the content is probably no longer relevant, but is provided here for posterity.

The active forums are here.

Foreign alphabet characters.

June 30, 2008 1:09pm

Subscribe [4]
  • #31 / Jun 30, 2008 8:52pm

    julianps

    175 posts

    Surely though the function of the Default character set variable is to force the connection to behave according to a set of standards in the manner you earlier described?

    Not exactly.  Because ExpressionEngine version 1.x supports versions of MySQL all the way back to 3.23.32, it connects to your database with the default client connection collation, in PHP’s case, this is Latin-1.  However, MySQL automagically converts this to the character set of your database when storing it.  It’s a little double-change dance that PHP and MySQL do, each talking to each other in a common language, so to speak, but still able to act independently on the data.  In other words, EE and MySQL would know that they are working with UTF-8 characters, but PHP and MySQL would be using Latin-1 radios to talk to each other.

    I would not rule out the possibility that it might have some impact on the aforementioned MySQL sorting idiosyncrasies.  If you’re feeling bold, you might try implementing this hack and see if it makes a difference.  Mind you, it will not operate on existing data, so you’d need to create new entries with which to compare.

    The article says /core/db/db.mysql.php - I assume that’s an error or v164 structure changed or I’ve only got a partial install..😉

    I can ignore the conversion stuff because we tried the ISO->binary->UTF8 and it’s only partial (data’s great but we couldn’t add templates, upload folders and lost other functionality as well); I’ll do it on a fresh EH-based install.

    I guess that if the hack works I need to change the same file each time I upgrade, on into perpetuity (I ‘ate core mods’)?

    I will certainly give all of that a go in the morning (we’re 2am now and I’m a full day tomorrow) and report back.

    jiF

  • #32 / Jun 30, 2008 10:36pm

    Derek Jones

    7561 posts

    The database driver is in /system/db/.  I don’t have details for you about all future versions, but can give you an assurance that it will not require a hack in 2.0 to accomplish what this wiki article covers.  Until then, yes, you’d need to note and maintain the hack if you choose to use it.

  • #33 / Jul 01, 2008 4:46am

    Ingmar

    29245 posts

    Aha - you cheated!

    What exactly do you mean? I have a few entries with high ASCII characters and have EE sort them, is that not what we are after?

    Here’s the code I whipped up quickly:

    <meta http-equiv="Content-type" content="text/html; charset=utf-8" />
    {exp:weblog:entries orderby="{segment_3}" sort="asc" entry_id="197|195|196|198|199|200" dynamic="off" }
    {title}
    
    {/exp:weblog:entries}

    Fair enough? Here’s the result. If I change the orderby parameter to, say, “date”, the order of entries changes as well. So unless I completely misunderstood you, I don’t see where I should have “cheated” here.

  • #34 / Jul 01, 2008 5:20am

    Ingmar

    29245 posts

    Ingmar, can you PM me with screen-caps of you table, rows and the weblog_fields with those German/Austrian entries to see how they look to you on your screen. Is your account on EH as well?

    This particular account is not on EH, but I don’t think it would make any difference. The collation and charsets are the same. These (largely nonsensical) German entries show the correct umlaut both on the fronted, as you have seen, as well as on the backend. I have a screenshot attached.

    I also ran a manual query in the CP

    SELECT title FROM `exp_weblog_titles` WHERE entry_id >= '196' ORDER BY title DESC

    and it worked perfectly. I admit to not having used phpMyAdmin.

  • #35 / Jul 01, 2008 6:08am

    julianps

    175 posts

    The database driver is in /system/db/.  I don’t have details for you about all future versions, but can give you an assurance that it will not require a hack in 2.0 to accomplish what this wiki article covers.  Until then, yes, you’d need to note and maintain the hack if you choose to use it.

    Derek, I followed the links and thoughts and nothing changes (you see the stuff and nonsense at http://www.stockting.com).
    My only observation however was that Forum-member Sasha refers to a change that seems to have disappeared from the “thread”.

    Do you have any thoughts on;

    $this->query("SET CHARACTER SET utf8");
            $this->query("SET COLLATION_CONNECTION=utf8_general_ci");

    and where I might safely add that to test it?

    jiF

  • #36 / Jul 01, 2008 6:14am

    julianps

    175 posts

    Aha - you cheated!

    What exactly do you mean?

    I was teasing….....Lisa’s point was to use her code and no more, to see a “raw” dump.

    This is because we already believe that the issue is installation specific; what we do not know is why it is happening to a fresh installation of EE on an EH-based hosting account. That’s the clever answer..😉

    Do you have any thoughts on “where”, “when” and “how” for;

    $this->query("SET CHARACTER SET utf8");
            $this->query("SET COLLATION_CONNECTION=utf8_general_ci");

    jiF

  • #37 / Jul 01, 2008 6:29am

    julianps

    175 posts

    Ingmar, can you PM me with screen-caps of you table, rows and the weblog_fields with those German/Austrian entries to see how they look to you on your screen. Is your account on EH as well?

    This particular account is not on EH, but I don’t think it would make any difference. The collation and charsets are the same. These (largely nonsensical) German entries show the correct umlaut both on the fronted, as you have seen, as well as on the backend. I have a screenshot attached.

    I also ran a manual query in the CP

    SELECT title FROM `exp_weblog_titles` WHERE entry_id >= '196' ORDER BY title DESC

    and it worked perfectly. I admit to not having used phpMyAdmin.

    Firstly, as I have said repeatedly, there is no issue with seeing foreign characters in either the CP or on the page; I attach my EDIT list of entries. The challenge lies in the underlying DB and the characters there (I changed the sort order to ASC); That entry that starts AZ is what we all know and love as Île-de-France…

    Well I will upload them when I can get around “Error Message:  The file you are attempting to upload has invalid content for its MIME type.” For now you see them here

  • #38 / Jul 01, 2008 7:19am

    julianps

    175 posts

    To take this to the next level I loaded the sister application in our family of applications, Open-Realty.

    Reset O-R to UTF8 and created two entries (city = Île-de-France & city = Franche-Comté).

    Looking in the DB I see identical glyphs for Î and é and using Ingmar’s SQL script I get the same sort-order result (based on glyph order).

    So both applications are communicating with the back-end DB in the same way and producing identically shoddy results.

    2 positives; i) it’s not unique to EE; and ii) the fix should improve both applications.

    1 negative; this issue is not specific to EngineHosting, we have the same issue on the DS.

    Interestingly when I run the same test on mySQL5 I get errors like “Warning: cannot yet handle MBCS in html_entity_decode()!” Admittedly this is not an EE error but if the automagical stuff is intended to do entity replication/replacement that will go back to the heart of the issue - I can make EE work in French using ISO-8859-1 and I can make EEWiki work in French using UTF8 but there’s no way in the world to make them both work together - until someone can actually prove me wrong

    jiF

  • #39 / Jul 01, 2008 8:34am

    julianps

    175 posts

    Aha - you cheated!

    What exactly do you mean? I have a few entries with high ASCII characters and have EE sort them, is that not what we are after?

    Here’s the code I whipped up quickly:

    <meta http-equiv="Content-type" content="text/html; charset=utf-8" />
    {exp:weblog:entries orderby="{segment_3}" sort="asc" entry_id="197|195|196|198|199|200" dynamic="off" }
    {title}
    
    {/exp:weblog:entries}

    Fair enough? Here’s the result. If I change the orderby parameter to, say, “date”, the order of entries changes as well. So unless I completely misunderstood you, I don’t see where I should have “cheated” here.

    A false proof! Your list on page looks like this;

    Ärger
    Île de Fleurön
    Österreich
    Überkonto
    Zachä
    Zachö

    Mine looks the same, because Î appears as AZ in my database therefor in an alpha-sort comes after Ä

    You need to add a value between A & I to prove your point.

    .....and doesn’t {segment_3} read a different field in the DB........

    jiF

  • #40 / Jul 02, 2008 5:36pm

    Ingmar

    29245 posts

    We’re looking into this internally. I can reproduce that behavior to some degree.

  • #41 / Jul 03, 2008 4:22am

    julianps

    175 posts

    We’re looking into this internally. I can reproduce that behavior to some degree.

    Thank you; you could liaise with EngineHosting as I have set up accounts with these errors on their servers and opened a ticket with them too. Please PM if you would like the account number.

    I found this though I am not sure what it means;

    http://bugs.mysql.com/bug.php?id=33791

    and;

    http://www.php.net/manual/en/intro.mbstring.php

    jiF

    Edited to add link

  • #42 / Oct 07, 2008 1:09pm

    Robin Sowell

    13255 posts

    Jules, my apologies for this one lingering so long without a resolution.  The development crew put in a good deal of work on the issue and 2.0 may be handling some of these character set issues differently.

    For now, are you still stuck on this?  At the moment, it may boil down to a mysql limitation.

  • #43 / Oct 07, 2008 1:45pm

    julianps

    175 posts

    Thank you for your observations Robin.

    In this particular case we moved to a different application; not as elegant but more effective.

    I look forward to v2 and the opportunity to revisit the decision.

    jiF

  • #44 / Oct 07, 2008 2:57pm

    Derek Jones

    7561 posts

    Thanks Jules, sorry EE did not work for you in this instance.

.(JavaScript must be enabled to view this email address)

ExpressionEngine News!

#eecms, #events, #releases