In a lucene based search engine, BM25 rewards shorter snippets of text. I’ve mentioned that term frequency might not matter, it’s still good to bias towards shorter snippets of text with fewer terms.
Consider the case a user searches for angularjs
Which is more relevant?
- A book title mentions “angularJS” but also “web design” and “javascript”
- A book title JUST mentioning angularJS, and nothing else
The latter will be more “about” the concept than the one mentioning many topics. It’s a safer bet.
-Doug
This is part of Doug’s Daily Search tips - subscribe here
Enjoy softwaredoug in training course form!
Starting June 22!
I hope you join me at Cheat at Search with LLMs to learn how to apply LLMs to search applications. Check out this post for a sneak preview.