Relevant retrieval w/ predictable latency

The higher the scale, the stronger the incentive to simplify your retrieval.

There’s two conflicting incentives:

Improving relevance: Requiring more complex retrieval to get all the best candidates
Improving reliability: Consistent latency and throughput + easier for an infra engineer to manage / debug

What does “simpler retrieval” look like?

Single vector retrieval with a few filters
A first pass BM25 retrieval with a recency boost
An assumption you’re fetching top 1000 and reranking outside the search engine

Of course, how far you sacrifice relevance for reliability requires measurement. And that requires actually deploying your retrieval changes early. Then measuring under actual load using shadow traffic.

Liberating Search from the Search Engine

-Doug

This is part of Doug’s Daily Search tips - subscribe here

Enjoy softwaredoug in training course form!

Starting May 18!

Signup here - http://maven.com/softwaredoug/cheat-at-search

I hope you join me at Cheat at Search with Agents to learn use agents in search. build better RAG and use LLMs in query understanding.

Doug Turnbull

More from Doug
Twitter | LinkedIn | Newsletter | Bsky