You may know BM25 lets you tune two parameters:
- k1: how quickly to saturate document term frequency’s contribution
- b: how much to bias towards below average length docs
What you may NOT know is there is another parameter k3
What does k3 do? It handles repeated query terms.
Old papers suggest k3=100 to 1000, which immediately saturates. That’s why Lucene ignores k3. It just uses the query term frequency. Some other search engines like Terrier set it to 8.
So for the query, “Best dog toys for rambunctious dog”
- Lucene engines count
dogtwice - Terrier, with k3=8, would count in 1.8 times: ((8 + 1) * 2) / (8 + 2) = 18 / 10 = 1.8
Which is right? For traditional search queries, we ignore k3.A few keywords usually don’t have repeated terms.
In today’s question answering world, though, its reasonable to wonder if we should bring k3 back?
If you’ve used k3 in search, let me know, I’d love to hear your story!
-Doug
AI Powered Search has STARTED - late signups available here
This is part of Doug’s Daily Search tips - subscribe here
Enjoy softwaredoug in training course form!
Starting June 22!
I hope you join me at Cheat at Search with LLMs to learn how to apply LLMs to search applications. Check out this post for a sneak preview.