ExpressionEngine CMS
Open, Free, Amazing

Thread

This is an archived forum and the content is probably no longer relevant, but is provided here for posterity.

The active forums are here.

Search - problem with foreign characters

December 20, 2010 10:53am

Subscribe [2]
  • #1 / Dec 20, 2010 10:53am

    gylfisteinn

    41 posts

    Hi,
    I’m having trouble with foreign characters in search. Icelandic characters like Á, Í, Æ, Ð, Ö, Þ, Ú, Ó, Ý, É are causing problems on a few websites.

    In this case the problem is as follows:

    If I search for “Ásgeir” I get search result if “Ásgeir” is in the title. But no result if “Ásgeir” is in body text but not in the title. On other words the search for foreign characters works for titles only.

    If I goto line 157 in mod.search.php and try to convert the string to htmlentities like this:

    $this->keywords = htmlentities($this->keywords, 1, “utf-8”, 1);

    Then the situation is turned around. I can search for “Ásgeir” in the body text but get no result if “Ásgeir” is only in the title field. The body text is an MX UniEditor fieldtype.

    EE version: 1.6.8
    MySQL connection collation: utf8_general_ci
    MYSQL charset: UTF-8 Unicode (utf8)

    Any ideas on this?

    regards,
    Gylfi

  • #2 / Dec 20, 2010 11:24pm

    Lisa Wess

    20502 posts

    Hi, Gylfi -

    Can you post your search form code, please?  Also, EE 1.7 is a security release so I’d recommend updating to 1.7 before continuing troubleshooting, both to see if that helps, and to protect your site.

    Thank you!

  • #3 / Dec 20, 2010 11:31pm

    gylfisteinn

    41 posts

    Hi,
    here is the search form code:

    {exp:search:simple_form weblog="starfsmenn|vorur-bordplotur|vorur-annad|vorur-flisar|vorur-legsteinar|pages|news" no_result_page="search/noresult" search_in="everywhere" show_expired="yes" where="all" limit="1000"}                            
    <input class="text" name="keywords" id="keywords" type="text" value="Sláðu inn leitarorð" />
    <input class="btn-go" type="image" value="submit" src="/images/btn-go.gif" alt="Go" />
    {/exp:search:simple_form}
  • #4 / Dec 21, 2010 3:40pm

    Ingmar

    29245 posts

    Have you turned on “Convert HTML into character entities”? Please turn it off if that’s the case. Also, double check that your custom fields are actually searchable. An upgrade to EE 1.7. is recommended, too, as Lisa mentioned.

  • #5 / Dec 22, 2010 11:57am

    gylfisteinn

    41 posts

    Tried converting to high ascii encoding didn’t work because the body text is not in high ascii. I need to be able to search in both entries and titles.

    I am sure the problem is that the body text (from TinyMCE editor) is utf8 with html entity encoding. The character Í is for example in the table exp_weblog_data as Í  In the exp_weblog_title on the other hand the Í character is in some other format. See attachments. This must be the reason for me being able to search in titles and not entries or vice versa.


    I have another site where the search is working fine. There the exp_weblog_data table has the same encoding but the Í character is just plain Í in the exp_weblog_title table. Is there any way to control the formatting of the title? See attachments.

    Gylfi

  • #6 / Dec 22, 2010 3:07pm

    Ingmar

    29245 posts

    Tried converting to high ascii encoding didn’t work because the body text is not in high ascii. I need to be able to search in both entries and titles.

    Yes, of course. As I’ve said, please turn it off if you should have been using it.

    I am sure the problem is that the body text (from TinyMCE editor) is utf8 with html entity encoding. The character Í is for example in the table exp_weblog_data as & Iacute;

    I would agree. It looks as though TinyMCE performs that conversion, can you confirm that? Have you tried a regular EE custom field?

  • #7 / Dec 28, 2010 7:55am

    gylfisteinn

    41 posts

    I have convert to high ascii turned off.

    I would agree. It looks as though TinyMCE performs that conversion, can you confirm that? Have you tried a regular EE custom field?

    Yes, the biggest problem is the title field (in the exp_weblog_titles table). No TinyMCE used there, just a plain EE field. That’s the one has strange encoding and I cannot control that (as far as I know). See attachment - there you can see the strange encoding on the title field.

    The TinyMCE field is not the problem, I can search and find those entries.

    Gylfi

  • #8 / Dec 28, 2010 3:12pm

    Ingmar

    29245 posts

    Yes, the biggest problem is the title field (in the exp_weblog_titles table). No TinyMCE used there, just a plain EE field. That’s the one has strange encoding and I cannot control that (as far as I know).

    It’s not so strange, it’s UTF-8 encoded characters shown asa Latin-1. Make sure that phpMyAdmin (or whatever tool you are using) actually uses Unicode as well. How does this differ from a custom channel field? They both show correctly on your site?

    Please do try with TinyMCE, does that look like the title field then?

  • #9 / Jan 06, 2011 10:50am

    gylfisteinn

    41 posts

    Please do try with TinyMCE, does that look like the title field then?

    Hi I cannot use Tiny on the title field? That is not a custom field that I can mess with (please correct me if I’m wrong).

    But in this case I killed two birds with one stone and upgraded to EE2. The search works fine there. I have a feeling that the database got corrupted while being exported and moved between servers.

    Gylfi

  • #10 / Jan 06, 2011 1:40pm

    Ingmar

    29245 posts

    I am very glad to hear that upgrading took care of things. Please don’t hesitate to post again as needed. Thanks.

.(JavaScript must be enabled to view this email address)

ExpressionEngine News!

#eecms, #events, #releases