Ok, so the EH install sounds like a reliable test bed, thank you for the clarification. Can you verify with EngineHosting that the column collation is also set correctly? That environment value will take precedent over any server, database, and connection collation in this instance. I don’t know if you read that article, or have read other articles regarding MySQL’s handling of high ASCII characters with sorting, but it’s a bundle of insanity even when all the character sets and planets are aligned.
Unrelated to MySQL’s sorting, but you have some whitespace above your DOCTYPE (on the stockting site), which can cause some problems both with character display and DOM rendering in certain browsers and locales.
Unrelated to MySQL’s sorting, but you have some whitespace above your DOCTYPE (on the stockting site), which can cause some problems both with character display and DOM rendering in certain browsers and locales.
Thanks for the tip; yep, forgot to remove those pesky tags from the top of the page - we only installed it today and there’s been way too much going on.
Ok, so the EH install sounds like a reliable test bed, thank you for the clarification. Can you verify with EngineHosting that the column collation is also set correctly? That environment value will take precedent over any server, database, and connection collation in this instance. I don’t know if you read that article, or have read other articles regarding MySQL’s handling of high ASCII characters with sorting, but it’s a bundle of insanity even when all the character sets and planets are aligned.
I only keep the EH account for situations like this so you’re more than welcome to have the UN/PW if it helps.
Can I not see the collations in phpMyAdmin; they all look to be utf8-general-ci - there the ones when I’m looking at the individual rows aren’t they?
I’ve read a hundred links and more and bore-for-britain on the subject but the truth is that even with a clean clear EE install on EE’s servers we cannot see what, other than the equipment used to key the data into the EECP could possibly different (ie a circumstantial variable).
Ingmar, can you PM me with screen-caps of you table, rows and the weblog_fields with those German/Austrian entries to see how they look to you on your screen. Is your account on EH as well?
verify with EngineHosting that the column collation is also set correctly?
Not anything against your ability to read and examine MySQL settings, but it would be best to get the answer from the horse’s mouth.
Once that’s verified, then it’s a matter of discovering what’s being entered vs. being stored (phpMyAdmin has its own client connection and page charset settings that might distort the truth here as well), and finding out whether or not its acting in a manner consistent with expected behavior for the character set, the characters involved, and MySQL’s interpretation thereof of what comes before what.
verify with EngineHosting that the column collation is also set correctly?
Not anything against your ability to read and examine MySQL settings, but it would be best to get the answer from the horse’s mouth.
Once that’s verified, then it’s a matter of discovering what’s being entered vs. being stored (phpMyAdmin has its own client connection and page charset settings that might distort the truth here as well), and finding out whether or not its acting in a manner consistent with expected behavior for the character set, the characters involved, and MySQL’s interpretation thereof of what comes before what.
I’m a big boy; you can tell me when I’m not up to it (most of the time with this stuff, I’m afraid); I have a ticket open on this at EH and have asked Daniel for confirmation.
We’d had a frank exchange of views with our server admins over phpMyAdmin connections and never got an intelligent answer out them on whether our installed version was reputable and whether we looking at things through sh!t=coloured glasses a reality distortion field.
Surely though the function of the Default character set variable is to force the connection to behave according to a set of standards in the manner you earlier described?
As it is we’re going to wipe the EH server clean in the morning (no caches, do data, no nothing; I’d rather do a second install but that’ll cost another license and I’m feeling a bit mean right now) so we’ll see what that throws up.
Anything else you suggest?
jiF
You do recall (down Lisa, or we’ll go back over the subject of collations) that if I run ISO-8859-1 on the db (I cannot, Pages Module will not allow me) then the sort order is fine. So this certainly (woops; another assumption. sorry) has to do with the way data is getting from the client to mySQL.
Surely though the function of the Default character set variable is to force the connection to behave according to a set of standards in the manner you earlier described?
Not exactly. Because ExpressionEngine version 1.x supports versions of MySQL all the way back to 3.23.32, it connects to your database with the default client connection collation, in PHP’s case, this is Latin-1. However, MySQL automagically converts this to the character set of your database when storing it. It’s a little double-change dance that PHP and MySQL do, each talking to each other in a common language, so to speak, but still able to act independently on the data. In other words, EE and MySQL would know that they are working with UTF-8 characters, but PHP and MySQL would be using Latin-1 radios to talk to each other.
I would not rule out the possibility that it might have some impact on the aforementioned MySQL sorting idiosyncrasies. If you’re feeling bold, you might try implementing this hack and see if it makes a difference. Mind you, it will not operate on existing data, so you’d need to create new entries with which to compare.
Surely though the function of the Default character set variable is to force the connection to behave according to a set of standards in the manner you earlier described?
Not exactly. Because ExpressionEngine version 1.x supports versions of MySQL all the way back to 3.23.32, it connects to your database with the default client connection collation, in PHP’s case, this is Latin-1. However, MySQL automagically converts this to the character set of your database when storing it. It’s a little double-change dance that PHP and MySQL do, each talking to each other in a common language, so to speak, but still able to act independently on the data. In other words, EE and MySQL would know that they are working with UTF-8 characters, but PHP and MySQL would be using Latin-1 radios to talk to each other.
I would not rule out the possibility that it might have some impact on the aforementioned MySQL sorting idiosyncrasies. If you’re feeling bold, you might try implementing this hack and see if it makes a difference. Mind you, it will not operate on existing data, so you’d need to create new entries with which to compare.
The article says /core/db/db.mysql.php - I assume that’s an error or v164 structure changed or I’ve only got a partial install..
I can ignore the conversion stuff because we tried the ISO->binary->UTF8 and it’s only partial (data’s great but we couldn’t add templates, upload folders and lost other functionality as well); I’ll do it on a fresh EH-based install.
I guess that if the hack works I need to change the same file each time I upgrade, on into perpetuity (I ‘ate core mods’)?
I will certainly give all of that a go in the morning (we’re 2am now and I’m a full day tomorrow) and report back.
The database driver is in /system/db/. I don’t have details for you about all future versions, but can give you an assurance that it will not require a hack in 2.0 to accomplish what this wiki article covers. Until then, yes, you’d need to note and maintain the hack if you choose to use it.
Fair enough? Here’s the result. If I change the orderby parameter to, say, “date”, the order of entries changes as well. So unless I completely misunderstood you, I don’t see where I should have “cheated” here.
Ingmar, can you PM me with screen-caps of you table, rows and the weblog_fields with those German/Austrian entries to see how they look to you on your screen. Is your account on EH as well?
This particular account is not on EH, but I don’t think it would make any difference. The collation and charsets are the same. These (largely nonsensical) German entries show the correct umlaut both on the fronted, as you have seen, as well as on the backend. I have a screenshot attached.
I also ran a manual query in the CP
SELECT title FROM `exp_weblog_titles` WHERE entry_id >= '196' ORDER BY title DESC
and it worked perfectly. I admit to not having used phpMyAdmin.
The database driver is in /system/db/. I don’t have details for you about all future versions, but can give you an assurance that it will not require a hack in 2.0 to accomplish what this wiki article covers. Until then, yes, you’d need to note and maintain the hack if you choose to use it.
Derek, I followed the links and thoughts and nothing changes (you see the stuff and nonsense at http://www.stockting.com).
My only observation however was that Forum-member Sasha refers to a change that seems to have disappeared from the “thread”.
Do you have any thoughts on;
$this->query("SET CHARACTER SET utf8"); $this->query("SET COLLATION_CONNECTION=utf8_general_ci");
I was teasing….....Lisa’s point was to use her code and no more, to see a “raw” dump.
This is because we already believe that the issue is installation specific; what we do not know is why it is happening to a fresh installation of EE on an EH-based hosting account. That’s the clever answer..
Do you have any thoughts on “where”, “when” and “how” for;
$this->query("SET CHARACTER SET utf8"); $this->query("SET COLLATION_CONNECTION=utf8_general_ci");
Ingmar, can you PM me with screen-caps of you table, rows and the weblog_fields with those German/Austrian entries to see how they look to you on your screen. Is your account on EH as well?
This particular account is not on EH, but I don’t think it would make any difference. The collation and charsets are the same. These (largely nonsensical) German entries show the correct umlaut both on the fronted, as you have seen, as well as on the backend. I have a screenshot attached.
I also ran a manual query in the CP
SELECT title FROM `exp_weblog_titles` WHERE entry_id >= '196' ORDER BY title DESC
and it worked perfectly. I admit to not having used phpMyAdmin.
Firstly, as I have said repeatedly, there is no issue with seeing foreign characters in either the CP or on the page; I attach my EDIT list of entries. The challenge lies in the underlying DB and the characters there (I changed the sort order to ASC); That entry that starts AZ is what we all know and love as Île-de-France…
Well I will upload them when I can get around “Error Message: The file you are attempting to upload has invalid content for its MIME type.” For now you see them here