Evaluating search?

Don’t jump to complex labeling systems, just do simple side-by-sides.

When I worked at Reddit, I would socialize a spreadsheet of search results. On each side were search results for a test query. One side was “control” - autogenerated from prod. The other “test” - my new fancy algorithm.

But the reviewers didn’t know which was which. They were blind.

What they saw: one side labeled “Pepsi” the other “Coke” (referencing the old commercial). They’d give me a preference over several dozen queries - isolating to types of queries my change impacted. This gave me a good feeling whether continuing with deeper evaluation (ie A/B test) was worth it.

Your search evals need not involve a PhD. Start grug-brained. Don’t get ahead of your skis!

https://softwaredoug.com/blog/2025/06/22/grug-brained-search-eval

-Doug

This is part of Doug’s Daily Search tips - subscribe here


Enjoy softwaredoug in training course form!

Starting May 18!

Signup here - http://maven.com/softwaredoug/cheat-at-search

I hope you join me at Cheat at Search with Agents to learn use agents in search. build better RAG and use LLMs in query understanding.

Doug Turnbull

More from Doug
Twitter | LinkedIn | Newsletter | Bsky