ExpressionEngine CMS
Open, Free, Amazing

Thread

This is an archived forum and the content is probably no longer relevant, but is provided here for posterity.

The active forums are here.

Understanding search module and problems with HTML entities

October 26, 2011 8:31am

Subscribe [5]
  • #1 / Oct 26, 2011 8:31am

    Timo Häkkinen

    25 posts

    Correct me if I’m wrong, but doesn’t the search module try to match the search query “as is” to content on database that has been marked searchable? There’s no encoding/decoding taking place anywhere as far as the search is concerned?

    My problem is being Finnish and doing most of my work on sites that are not in English. Take my last name for example. I type it in as “Häkkinen” and that’s how it’s displayed, but because WYSIWYGs like to play it safe, the code segment below shows you what’s actually saved in the database. Again, it looks fine on a page, obviously, but it doesn’t show up in search results because, if I’m correct, search module searches for the query and not what it might look like encoded.

    Häkkinen

    I’m not pointing fingers, I’m just trying to understand if this is in fact how it works. And if so, is there maybe a setting I can change to make search work for me? I know I’m using a third-party add-on that’s doing this encoding, but I figured this is also something Ellislab should at least be aware of. I don’t claim to understand everything that goes into doing a search, but maybe this is easy to fix in search module? Maybe you don’t want to touch search module and I should change my WYSIWYG settings?

    Again, looking for discussion, insight and suggestions.

  • #2 / Oct 26, 2011 7:50pm

    Dan Decker

    7338 posts

    Hi Timo Häkkinen,

    I’m going to post this question internally to our developers. They will be able to provide some insight on this for sure.

    Cheers,

  • #3 / Oct 27, 2011 11:23am

    Robin Sowell

    13255 posts

    What I would probably do?  Write a hook that converts the entity back to ascii so everything is saved that way.  If some fields it’s going in as ascii and some it’s being converted to entities, the search isn’t going to look for both.

    That said- in 1.x before we were utf8 all the way, there was a setting to convert ascii to entities.  That is deprecated in 2.x, but the code itself is still in there for legacy reasons.  And one thing the search does?  Is honor that setting.

    $this->keywords = ($this->EE->config->item('auto_convert_high_ascii') == 'y') ? ascii_to_entities($this->keywords) : $this->keywords;

    Basically- it converts high ascii in search terms to entities- which will then match what was saved in the database.

    Like I say- regardless of the setting you need to be sure what’s going into the database is consistent.  And if I had a choice in the matter, I would go with consistently ascii.  But it is possible to have the search terms converted to entities by adding a config variable for auto_convert_high_ascii and setting it to ‘y’.

    So basically?  Two options.  I would go with trying to get the editor to allow the ascii and if not, switching it back to ascii when saving.  But I don’t know how easy that will be to do.

    Make sense?

  • #4 / Oct 29, 2011 5:56am

    Mark Bowen

    12637 posts

    Hi Timo,

    Just wanted to see if the information from Robin above helped at all?

    Thanks,

    Mark

  • #5 / Nov 01, 2011 4:10pm

    Kevin Smith

    4784 posts

    Hi Timo,

    We’ll keep this thread open a few more days to give you time to respond. Let us know if there’s anything else we can do to help!

  • #6 / Nov 18, 2011 1:27pm

    Timo Häkkinen

    25 posts

    For some reason I didn’t get reply notifications and thought no one had time to look into this.

    Robin, so basically as long as everything I need to have available for search is consistently either ascii or entities in database, everything will work? With ascii it works out of the box and with entities I just need to config auto_convert_high_ascii to y.

    Thanks, this helps me and now I just have to see what options Wygwam gives me and then decide if I want to go back and re-save all of my entries everywhere.

  • #7 / Nov 21, 2011 4:59pm

    Robin Sowell

    13255 posts

    Ah- you might have ‘smart notifications’ on.  That nails me sometimes!

    And- yes, exactly.  Except I really don’t like the idea of converting it to ascii given a utf8 option.  Ponders a bit… also, 1.x would convert the ascii so it looked less confusing in the forms.  Which 2.x doesn’t do- assuming if you wanted to put in ascii you want to see it as ascii.

    So if I had a choice with the editor- it wouldn’t be converting it to ascii.  But as long as it’s consistent, you shouldn’t have an issue.

.(JavaScript must be enabled to view this email address)

ExpressionEngine News!

#eecms, #events, #releases