ExpressionEngine CMS
Open, Free, Amazing

Thread

This is an archived forum and the content is probably no longer relevant, but is provided here for posterity.

The active forums are here.

Alphabetical order involving special characters?

November 04, 2010 2:42pm

Subscribe [6]
  • #1 / Nov 04, 2010 2:42pm

    fjldude

    50 posts

    EE 1.6.8, Build:  20100121

    My client just noticed that the order of entries is incorrect if they contain special characters.  This includes the order on webpages as well as the order shown via Playa in the control panel.  All special characters display correctly, they just don’t sort properly.

    Here is example text from one of my web pages, the template for which uses orderby = ACustomField.  In this case, entries starting with kö and kü are incorrectly listed before entries stating with kh and ki:

    ***************
    Körner, H. (1970). Morphologie und Taxonomie der Diatomeen Gattung Asterionella. Nova Hedwigia 20: 557-724. 
    Asterionella |

    Kützing, F.T. (1844). Die Kieselschaligen. Bacillarien oder Diatomeen. Nordhausen. 152 pp.

    Khursevich, G. and Stachura-Suchoples, K. (2008). The genus Pliocaenicus Round & Hakansson (Bacillariophyta): Morphology, taxonomy, classification and biogeography. Nova Hedwigia 86: 419-444. 
    Pliocaenicus |

    Kilham, S.C., Theriot, E.C. and Fritz, S.C. (1996). Linking planktonic diatoms and climate change in the large lakes of the Yellowstone ecosystem using resource theory. Limnology and Oceanography 41: 1052-1062. 
    Stephanodiscus |
    ***************

    I’ve checked my database, database table, EE preferences, HTML meta tags… all seem to be set to UTF-8.

    A peek at exp_weblog_data, shows that Kützing is entered in the database as Kützing… which probably explains why the sort order is incorrect.

    An old forum thread seems to imply that this sorting problem might not have gotten fixed until EE2….

    Any suggestions on how to proceed?  how to troubleshoot this issue?

    Thanks for any help you can offer.

  • #2 / Nov 05, 2010 4:37am

    John Henry Donovan

    12339 posts

    fjldude,

    An old forum thread seems to imply that this sorting problem might not have gotten fixed until EE2…

    This may well be the case but lets explore your situation a but first.

    What type of field is ACustomField?

    How does the title of these entries differ from the custom field?

    Would orderby=“url_title” work in this instance?

  • #3 / Nov 07, 2010 6:23pm

    fjldude

    50 posts

    Good idea.  I might be able to use url_title as a workaround in this instance. It wouldn’t solve the control panel sorting issues (playa and edit), but might just work for the website.

    Regarding the bigger situation, could you let me know how UTF-8 characters are supposed to be stored in the exp_weblog_titles table for EE 1.x?  In my case, any special character -such as those with umlauts - ends up being stored in the database with a different encoding.  For example ü gets stored as ü, but still shows up correctly as ü in both the control panel and website.  What happens on your testing setup?  Is ü actually stored as ü?

    Thanks!

  • #4 / Nov 08, 2010 11:23am

    Sue Crocker

    26054 posts

    fjldude, my use of EE1.7.0 is a very vanilla site. Special characters with umlauts are converted to a two letter designation.

    Körner, H. (1970) becomes koerner-h.-1970

  • #5 / Nov 08, 2010 11:33am

    fjldude

    50 posts

    Hi Sue. 

    I am also seeing that a title of Körner appears as a url_title of koerner. 

    Can you also check your install to see what a title of Körner looks like in the database itself?  What does the entry with a title of Körner look like in the exp_weblog_titles table?  Does it appear as Körner? Or is it encoded differently?

    Thanks

  • #6 / Nov 09, 2010 4:06am

    John Henry Donovan

    12339 posts

    fjldude,

    Körner appears as Körner in the exp_weblog_titles table

  • #7 / Nov 09, 2010 5:41am

    Ingmar

    29245 posts

    Adding to what John Henry said: this is fine, and expected behavior. Umlauts and other such extended characters are a difficult subject. Since the English language doesn’t have the concept anything non-7-bit-ASCII sometimes looks a bit like an afterthought.

    So we know that high-ASCII characters need to be encoded specifically. For German (and in fact most Western and Central European languages) ISO-8859-1 (aka Latin-1) has long been a de facto standard. We’re currently in the middle of a transitional period towards Unicode and there are some growing pains, as it were:

    Your data is stored as utf-8 in your database and then pulled into the website, where it’s being treated as utf-8 and thus displayed correctly. If you take a look at the database “directly”, using some other encoding, you’ll get “ö” instead of ö. (You would get the same result if you put something like

    <meta charset="iso-8859-1">

    in your webpage. Clearly not what you want.) As I’ve said in the previous paragraph, that’s expected behavior: just put on your Unicode goggles (tell phpMyAdmin to use utf-8 instead of iso-8859-1), and you should get the correct results.

    So why does it not sort correctly? This is an issue of “database collation”. Which leads us to the question, just which one are you using? Try to run this SQL statement in your control panel:

    SELECT CHARSET(USER()), COLLATION(USER())

    What do you get?

  • #8 / Nov 10, 2010 1:19am

    fjldude

    50 posts

    Thank you John and Ingmar for helping move things forward…

    Igmar seems to be saying that John did *not* have his PhpMyAdmin set to unicode and that EE 1.x really does store titles and other values as true unicode.

    But when I
    SELECT CHARSET(USER()), COLLATION(USER()) 
    I get
    utf8   utf8_general_ci

    And I’ve also been using a mysql client called Navicat, which uses unicode (utf-8) goggles for viewing the database directly.  When I view other utf-8 databases on the same mysql install, I can see ö.  But the EE database only shows ö. Moreover,  I can use Navicat to directly change ö back to ö in the database.  The ö then looks fine in the database, but shows up as � in both the EE control panel and in the browser (meta set to utf-8).

    Your ideas on how to proceed?  The lack of correct alphabetization in the EE control panel is messing up my clients’ workflow.

    Could my issue be related to a plug-in?  I’m using quite a few.

  • #9 / Nov 10, 2010 11:26am

    Sue Crocker

    26054 posts

    Which plugins/extensions are you using? Do you have a link we can look at?

  • #10 / Nov 10, 2010 2:27pm

    fjldude

    50 posts

    If possible, I’d like to first hear a confirmation from Ingmar that he has looked at his EE 1.x install and it really truly stores titles with umlauts as utf-8 in the database.  And I’d like to hear his followup thoughts on collation.

    Thanks.

  • #11 / Nov 11, 2010 8:58am

    Sue Crocker

    26054 posts

    Hi, fjldude. Please check your email in a few minutes.

  • #12 / Nov 21, 2010 2:23am

    fjldude

    50 posts

    Hello Sue et al.,

    Apologies for my slow response…
    I’m going to have to work on this later as I’ve been slammed with deadlines (incl. planning a new EE MSM -powered site). 

    Fortunately this issue is not critical to my client’s workflow.  Nonetheless, it remains important to them.  Moreover, I suspect that I’ve set up something incorrectly, which could impact other websites.

    I’ll be back in touch when the timing is better.

    Thanks for all your help so far.

  • #13 / Nov 21, 2010 6:44am

    Yvonne Martinsson

    204 posts

    Hi fjldude,

    Unless things have changed recently, not being able to sort foreign characters alphabetically is a flaw in MySQL and thus not an EE issue. There may be some threads about this in the archived forums, if still around.

  • #14 / Nov 21, 2010 5:09pm

    Greg Salt

    3988 posts

    Hi fjldude,

    We’ll keep this thread open so please get back in touch when the timing is better to further explore this issue.

    Cheers

    Greg

    @Yvonne, thanks for the note.

.(JavaScript must be enabled to view this email address)

ExpressionEngine News!

#eecms, #events, #releases