...
Question: why is “My tax form is late” relevant ( score > 0 ) and “My tax forms are late” is not ( score = 0 )? Shouldn’t they be at least close?Or, put another way, why is “My tax forms are late” just as relevant as “what is the formula for the area of a circle?” when you are searching for “form?”
I read the whole boolean fulltext search article and could see no clue as to why “My tax form is late” is not close to “My tax forms are late” given that they both contain the word “form” which is main requirement of the * operator.
It’s because when in natural language mode “*”, and all the other signs “+”, “-”, etc. are not logical operators like they are in boolean mode. The rule is that if it’s not alphabetic or numeric, with the exception of apostrophe (’) and underscore (_), then it is a word separator. Basically if you have “form*of*life” it’s the same as “form of life” as far as the parser is concerned.
The query that @wh1tel1te was suggesting uses the “*” character in the SELECT statement (that is in natural language mode) where it is a word separator, therefore no role in the outcome (you might as well not use it) and conditions the presence of those records in the query output based on the usage of the “*” character as a wild card character. In other words the MATCH in the SELECT part of the query is basically looking for documents that contain only the word “form” and nothing else, and the MATCH in the WHERE clause is looking for documents that contain all the words that start with “form.” That is why the document that contains “formula” has a 0 score, because it doesn’t contain the word “form” but it is present in the recordset because it contains the a word that starts with “form.” That’s why the next two queries have the same output:
mysql> select *, match (body) against ('form*' ) as score from test_searches order by score desc;
+----+-------------------------------------------------------------------------------------+-------------------+
| id | body | score |
+----+-------------------------------------------------------------------------------------+-------------------+
| 1 | This is about your tax form | 0.332646816968918 |
| 8 | form, form form form form | 0.332646816968918 |
| 11 | This contains form* | 0.332646816968918 |
| 10 | My tax form is late | 0.328907370567322 |
| 12 | This document is form*alized | 0.325251072645187 |
| 2 | I am formulating a new theory about searching and about different forms it can take | 0 |
| 3 | what is the formula for the area of a circle? | 0 |
| 4 | New forms of life | 0 |
| 5 | some text that does not contain the word | 0 |
| 6 | irrelevant text | 0 |
| 7 | etc | 0 |
| 9 | My tax forms are late | 0 |
+----+-------------------------------------------------------------------------------------+-------------------+
12 rows in setmysql> select *, match (body) against ('form' ) as score from test_searches order by score desc;
+----+-------------------------------------------------------------------------------------+-------------------+
| id | body | score |
+----+-------------------------------------------------------------------------------------+-------------------+
| 1 | This is about your tax form | 0.332646816968918 |
| 8 | form, form form form form | 0.332646816968918 |
| 11 | This contains form* | 0.332646816968918 |
| 10 | My tax form is late | 0.328907370567322 |
| 12 | This document is form*alized | 0.325251072645187 |
| 2 | I am formulating a new theory about searching and about different forms it can take | 0 |
| 3 | what is the formula for the area of a circle? | 0 |
| 4 | New forms of life | 0 |
| 5 | some text that does not contain the word | 0 |
| 6 | irrelevant text | 0 |
| 7 | etc | 0 |
| 9 | My tax forms are late | 0 |
+----+-------------------------------------------------------------------------------------+-------------------+
12 rows in set