Evaluating search?

Don’t jump to complex labeling systems, just do simple side-by-sides.

When I worked at Reddit, I would socialize a spreadsheet of search results. On each side were search results for a test query. One side was “control” - autogenerated from prod. The other “test” - my new fancy algorithm.

But the reviewers didn’t know which was which. They were blind.

What they saw: one side labeled “Pepsi” the other “Coke” (referencing the old commercial). They’d give me a preference over several dozen queries - isolating to types of queries my change impacted. This gave me a good feeling whether continuing with deeper evaluation (ie A/B test) was worth it.

Your search evals need not involve a PhD. Start grug-brained. Don’t get ahead of your skis!

https://softwaredoug.com/blog/2025/06/22/grug-brained-search-eval

-Doug

This is part of Doug’s Daily Search tips - subscribe here


Doug Turnbull

More from Doug
Twitter | LinkedIn | Newsletter | Bsky