ExpressionEngine CMS
Open, Free, Amazing

Thread

This is an archived forum and the content is probably no longer relevant, but is provided here for posterity.

The active forums are here.

Suggestion: Search class?

April 13, 2011 6:49pm

Subscribe [8]
  • #16 / Apr 16, 2011 4:15pm

    CI Coder

    44 posts

    ...
    Question: why is “My tax form is late” relevant ( score > 0 ) and “My tax forms are late” is not ( score = 0 )? Shouldn’t they be at least close?

    Or, put another way, why is “My tax forms are late” just as relevant as “what is the formula for the area of a circle?” when you are searching for “form?”

    I read the whole boolean fulltext search article and could see no clue as to why “My tax form is late” is not close to “My tax forms are late” given that they both contain the word “form” which is main requirement of the * operator.

    It’s because when in natural language mode “*”, and all the other signs “+”, “-”, etc. are not logical operators like they are in boolean mode. The rule is that if it’s not alphabetic or numeric, with the exception of apostrophe (’) and underscore (_), then it is a word separator. Basically if you have “form*of*life” it’s the same as “form of life” as far as the parser is concerned.

    The query that @wh1tel1te was suggesting uses the “*” character in the SELECT statement (that is in natural language mode) where it is a word separator, therefore no role in the outcome (you might as well not use it) and conditions the presence of those records in the query output based on the usage of the “*” character as a wild card character. In other words the MATCH in the SELECT part of the query is basically looking for documents that contain only the word “form” and nothing else, and the MATCH in the WHERE clause is looking for documents that contain all the words that start with “form.” That is why the document that contains “formula” has a 0 score, because it doesn’t contain the word “form” but it is present in the recordset because it contains the a word that starts with “form.” That’s why the next two queries have the same output:

    mysql> select *, match (body) against ('form*' ) as score from test_searches order by score desc;
    +----+-------------------------------------------------------------------------------------+-------------------+
    | id | body                                                                                | score             |
    +----+-------------------------------------------------------------------------------------+-------------------+
    |  1 | This is about your tax form                                                         | 0.332646816968918 |
    |  8 | form, form form form form                                                           | 0.332646816968918 |
    | 11 | This contains form*                                                                 | 0.332646816968918 |
    | 10 | My tax form is late                                                                 | 0.328907370567322 |
    | 12 | This document is form*alized                                                        | 0.325251072645187 |
    |  2 | I am formulating a new theory about searching and about different forms it can take |                 0 |
    |  3 | what is the formula for the area of a circle?                                       |                 0 |
    |  4 | New forms of life                                                                   |                 0 |
    |  5 | some text that does not contain the word                                            |                 0 |
    |  6 | irrelevant text                                                                     |                 0 |
    |  7 | etc                                                                                 |                 0 |
    |  9 | My tax forms are late                                                               |                 0 |
    +----+-------------------------------------------------------------------------------------+-------------------+
    12 rows in set
    mysql> select *, match (body) against ('form' ) as score from test_searches order by score desc;
    +----+-------------------------------------------------------------------------------------+-------------------+
    | id | body                                                                                | score             |
    +----+-------------------------------------------------------------------------------------+-------------------+
    |  1 | This is about your tax form                                                         | 0.332646816968918 |
    |  8 | form, form form form form                                                           | 0.332646816968918 |
    | 11 | This contains form*                                                                 | 0.332646816968918 |
    | 10 | My tax form is late                                                                 | 0.328907370567322 |
    | 12 | This document is form*alized                                                        | 0.325251072645187 |
    |  2 | I am formulating a new theory about searching and about different forms it can take |                 0 |
    |  3 | what is the formula for the area of a circle?                                       |                 0 |
    |  4 | New forms of life                                                                   |                 0 |
    |  5 | some text that does not contain the word                                            |                 0 |
    |  6 | irrelevant text                                                                     |                 0 |
    |  7 | etc                                                                                 |                 0 |
    |  9 | My tax forms are late                                                               |                 0 |
    +----+-------------------------------------------------------------------------------------+-------------------+
    12 rows in set
  • #17 / Apr 16, 2011 4:47pm

    Ninjabear

    54 posts

    In the search “form” you would probably want “forms” to match better than “formulae” so boolean mode isn’t perfect by any stretch of the imagination. But I suppose at least this method will match the exact word you ask for (i.e. form) with similar words (forms, formulae) being present in the search although not ordered by relevancy (being given 0 relevancy).

  • #18 / May 08, 2011 8:04pm

    shoe-maker

    4 posts

    the problem raise when we have two mysql table to join

    we can not use FULLTEXT ( MATCH ... AGAINST ... )

    example

    create table article (
    id int(11) not null auto_increment primary key,
    title text,
    keyword text,
    content text
    );
    
    create table tags (
    article_id int(11),
    tags text
    );

    one article can have multiple tags

    but if we just

    SELECT * ,
    MATCH( a.title,a.keyword,a.content,t.tags) 
      AGAINST ( 'lorem ipsum' IN BOOLEAN MODE ) AS score
    
    FROM article a INNER JOIN tags t on a.id=t.article_id 
    WHERE 
    MATCH( a.title,a.keyword,a.content,t.tags) 
      AGAINST ( 'lorem ipsum' IN BOOLEAN MODE ) 
    GROUP BY a.id
    ORDER BY score DESC

    will result empty


    any suggestion?

  • #19 / May 09, 2011 1:23pm

    Ninjabear

    54 posts

    I think this is because if there is no link between article and tags then you end up with no rows, you would need to display the results of one table regardless of the connection with the other with left/right join, but then you have to know which one to display. No full outer join in mysql unfortunately, don’t know whether it would solve the problem if there was though.

  • #20 / May 10, 2011 7:15am

    shoe-maker

    4 posts

    i have little trick for this problem
    using nested query

    select * from article where id in 
    (
      select id from (
        select id, sum( match(title,keyword,content) against ('lorem ipsum' in boolean mode)) as score 
        from article 
        where match(title,keyword,content) against ('lorem ipsum' in boolean mode)
        UNION
        select id,sum( match(tags) against ('lorem ipsum' in boolean mode)) as score 
        from tags 
        where match(tags) against ('lorem ipsum' in boolean mode)
      ) as table_1
      group by id
      order by score desc
    )

    hope this will help

  • #21 / May 10, 2011 9:24am

    Ninjabear

    54 posts

    i have little trick for this problem
    using nested query

    select * from article where id in 
    (
      select id from (
        select id, sum( match(title,keyword,content) against ('lorem ipsum' in boolean mode)) as score 
        from article 
        where match(title,keyword,content) against ('lorem ipsum' in boolean mode)
        UNION
        select id,sum( match(tags) against ('lorem ipsum' in boolean mode)) as score 
        from tags 
        where match(tags) against ('lorem ipsum' in boolean mode)
      ) as table_1
      group by id
      order by score desc
    )

    hope this will help

    Please could you explain this code shoe-maker? What’s going on here?

  • #22 / May 10, 2011 9:15pm

    shoe-maker

    4 posts

    how to do fulltext search across multiple table in mySql.
    this is all about.

    coz, fulltext only work in “single” table

  • #23 / May 11, 2011 7:39am

    danmontgomery

    1802 posts

  • #24 / May 11, 2011 1:27pm

    Ninjabear

    54 posts

    how to do fulltext search across multiple table in mySql.
    this is all about.

    coz, fulltext only work in “single” table

    I know I was just hoping you could explain it a little instead of just pasting.

  • #25 / Jan 31, 2012 4:40pm

    mike7418

    8 posts

    Thanks for posting this. I am working on a site that this will help me. 😉

  • #26 / Jul 26, 2012 6:46am

    predat0r

    99 posts

    Is it really working for you? I get an error and can’t solve that :/
    check the ` character before the select’s against. Maybe that’s the problem?!

    Error Number: 1064
    
    You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '` AGAINST ('"+lorem* +ipsum*"')) AS score FROM (`news`) WHERE (' at line 1
    
    SELECT `news`.*, (MATCH(`title`, `preview`, `content`, `tags`)` AGAINST ('"+lorem* +ipsum*"')) AS score FROM (`news`) WHERE (MATCH(`title`, `preview`, `content`, `tags`) AGAINST ('"+lorem* +ipsum*"')) AND `status` = 'active' ORDER BY `score` DESC LIMIT 10

    —-
    my query:

    $query = $this->db->select('news.*')
            ->select("(MATCH(`title`, `preview`, `content`, `tags`) AGAINST ('" . $phrase . "')) AS score") 
            ->where("(MATCH(`title`, `preview`, `content`, `tags`) AGAINST ('" . $phrase . "'))")
            ->where('status', 'active')        
            ->order_by('score', 'DESC')            
            ->limit($limit, $offset)
            ->get('news');

    I use 3.0dev
    thanks

    Fulltext searching is the best option, it is extremely fast and has extra features (e.g. score based on keyword relevancy). I would think though that creating a generic library would be quite difficult due to different database table structures. Here’s a mini-tutorial on how to implement fulltext searching:

    Example table:

    CREATE TABLE IF NOT EXISTS `blog_posts` (
      `post_id` int(9) NOT NULL auto_increment,
      `title` varchar(128) default NULL,
      `contents` text NOT NULL,
      `status` enum('Draft','Publish') default NULL,
      `creation` datetime NOT NULL,
      PRIMARY KEY  (`post_id`)
    ) ENGINE=MyISAM  DEFAULT CHARSET=utf8 AUTO_INCREMENT=1;

    First, determine which fields you are going to search. Note that this only works on varchar and text field types (someone please correct me if I’m wrong).
    For this table, we will use title and contents.

    ALTER TABLE `blog_posts` ADD FULLTEXT `search_index` (`title`, `contents`);

    Your table is now all set up for fulltext searching! Now you just need to run your SQL query like so:

    $results = $this->db
        ->select('blog_posts.*')
        ->select("(MATCH(`title`, `contents`) AGAINST ('" . $keywords . "')) AS score")
        ->where("(MATCH(`title`, `contents`) AGAINST ('" . $keywords . "'))")
        ->order_by('score', 'desc')
        ->get('blog_posts')
        ->result_array();

    As an added bonus, these results are ordered by relevancy, the most relevant result will be at the top.

.(JavaScript must be enabled to view this email address)

ExpressionEngine News!

#eecms, #events, #releases