The higher the scale, the stronger the incentive to simplify your retrieval.

There’s two conflicting incentives:

  • Improving relevance: Requiring more complex retrieval to get all the best candidates
  • Improving reliability: Consistent latency and throughput + easier for an infra engineer to manage / debug

What does “simpler retrieval” look like?

  • Single vector retrieval with a few filters
  • A first pass BM25 retrieval with a recency boost
  • An assumption you’re fetching top 1000 and reranking outside the search engine

Of course, how far you sacrifice relevance for reliability requires measurement. And that requires actually deploying your retrieval changes early. Then measuring under actual load using shadow traffic.

Related articles

-Doug

This is part of Doug’s Daily Search tips - subscribe here


Enjoy softwaredoug in training course form!

Starting June 22!

I hope you join me at Cheat at Search with LLMs to learn how to apply LLMs to search applications. Check out this post for a sneak preview.

Doug Turnbull

More from Doug
Twitter | LinkedIn | Newsletter | Bsky
Take My New Course - Cheat at Search with LLMs