Personally I would say that is acceptable if you still wanted to return results such as “forms”, because that is the only way this will work. If you order by the “score” that MySQL returns (relevancy) descending, results such as “formulas” will display at the bottom because it isn’t as relevant as results with an exact match of “form”.
I guess it’s your choice really: non-boolean searching which will return exact matches only, or boolean searching which will return partial matches with a chance that it will return slightly irrelevant results.
Here is a little experiment:
Create the following table with the following data:
DROP TABLE IF EXISTS `test_searches`;
CREATE TABLE `test_searches` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`body` varchar(250) DEFAULT NULL,
PRIMARY KEY (`id`),
FULLTEXT KEY `fulltext_search_index` (`body`)
) ENGINE=MyISAM AUTO_INCREMENT=8 DEFAULT CHARSET=latin1;
-- ----------------------------
-- Records of test_searches
-- ----------------------------
INSERT INTO `test_searches` VALUES ('1', 'This is about your tax form');
INSERT INTO `test_searches` VALUES ('2', 'I am formulating a new theory about searching and about different forms it can take');
INSERT INTO `test_searches` VALUES ('3', 'what is the formula for the area of a circle?');
INSERT INTO `test_searches` VALUES ('4', 'New forms of life');
INSERT INTO `test_searches` VALUES ('5', 'some text that does not contain the word');
INSERT INTO `test_searches` VALUES ('6', 'irrelevant text');
INSERT INTO `test_searches` VALUES ('7', 'etc');
INSERT INTO `test_searches` VALUES ('8', 'form, form form form form');
Now I run the following query:
SELECT *, MATCH (body) AGAINST( 'form*' IN BOOLEAN MODE ) AS score FROM test_searches;
Here are the results:
+----+-------------------------------------------------------------------------------------+-------+
| id | body | score |
+----+-------------------------------------------------------------------------------------+-------+
| 1 | This is about your tax form | 1 |
| 2 | I am formulating a new theory about searching and about different forms it can take | 1 |
| 3 | what is the formula for the area of a circle? | 1 |
| 4 | New forms of life | 1 |
| 5 | some text that does not contain the word | 0 |
| 6 | irrelevant text | 0 |
| 7 | etc | 0 |
| 8 | form, form form form form | 1 |
+----+-------------------------------------------------------------------------------------+-------+
How do you differentiate those records?
I think that in this particular case, if you only need a one-term search, my initial example with the inflector helper is actually best, because it gives you only those docs that contain the term or its plural or its Saxon genitive. Here is an example:
SELECT *, MATCH (body) AGAINST( 'form forms form\'s forms\'' IN BOOLEAN MODE ) AS score FROM test_searches;
+----+-------------------------------------------------------------------------------------+-------+
| id | body | score |
+----+-------------------------------------------------------------------------------------+-------+
| 1 | This is about your tax form | 2 |
| 2 | I am formulating a new theory about searching and about different forms it can take | 2 |
| 3 | what is the formula for the area of a circle? | 0 |
| 4 | New forms of life | 2 |
| 5 | some text that does not contain the word | 0 |
| 6 | irrelevant text | 0 |
| 7 | etc | 0 |
| 8 | form, form form form form | 2 |
+----+-------------------------------------------------------------------------------------+-------+
In this case “what is the formula for the area of a circle?” has a score of 0, which is to be expected because someone who’s looking for “form” is certainly not looking for “formula,” or they would type in “formula.”
However, if you were to program a generic search library you would have to use all of these techniques, and many more, and combine them into something that would give relevant results in most situations. But then what if the tables are not MyISAM tables? For instance I use only InnoDB tables in my projects to ensure data integrity and guard against orphan records and other things like that which MyISAM doesn’t support. What if the database is not MySQL but Oracle, Postage or even MongoDB (like I used in my last two projects)? Just thinking about the task only “what ifs” pop up in my mind.
I do however enjoy the discussion. That’s why I pulled my glass of Pinot Noir close and I spent about an hour and a half on a Friday evening doing this experiment. 😊 Another reason is that this idea of “relevancy,” if solved in an unbiased way, has potential. So far Google’s Page Rank is the best solution to relevancy, but only inasmuch as democracy is the best solution to human society.