Consider pairwise evals instead of pointwise

If pointwise evals asks “How relevant is this from 1-5” - pairwise search evals says “Which of these two results is more relevant - X or Y?”

Comparing two items at a time has some advantages:

Less chance for per-decision error - harder to screw up one is better than another
More precise results - fine grain details that can’t be shoved into a 1-5 scale
Faster decisions - comparisons often can be made quicker

However, two major downsides remain

Pairwise evals take more time - instead of rating 10 items 1-5, you need to compare 10 items against 9 other items to get a complete picture
Pairwise evals need to be transformed into pointwise - to use traditional search metrics or ranking data, we need a single score per-document

Luckily these factors can be mitigated.

LLMs can do a lot of the simpler evals / comparisons - such as my approach to LLM as a judge
A system like Elo - used in competitions like chess - can be used to turn 1 vs 1 competitions (like pairwise comparisons) into a pointwise rating

-Doug

This is part of Doug’s Daily Search tips - subscribe here

I hope you join me at Cheat at Search with Agents to learn use agents in search. build better RAG and use LLMs in query understanding.