If pointwise evals asks “How relevant is this from 1-5” - pairwise search evals says “Which of these two results is more relevant - X or Y?”

Comparing two items at a time has some advantages:

  • Less chance for per-decision error - harder to screw up one is better than another
  • More precise results - fine grain details that can’t be shoved into a 1-5 scale
  • Faster decisions - comparisons often can be made quicker

However, two major downsides remain

  • Pairwise evals take more time - instead of rating 10 items 1-5, you need to compare 10 items against 9 other items to get a complete picture
  • Pairwise evals need to be transformed into pointwise - to use traditional search metrics or ranking data, we need a single score per-document

Luckily these factors can be mitigated.

  • LLMs can do a lot of the simpler evals / comparisons - such as my approach to LLM as a judge
  • A system like Elo - used in competitions like chess - can be used to turn 1 vs 1 competitions (like pairwise comparisons) into a pointwise rating

-Doug

This is part of Doug’s Daily Search tips - subscribe here


Enjoy softwaredoug in training course form!

Starting May 18!

Signup here - http://maven.com/softwaredoug/cheat-at-search

I hope you join me at Cheat at Search with Agents to learn use agents in search. build better RAG and use LLMs in query understanding.

Doug Turnbull

More from Doug
Twitter | LinkedIn | Newsletter | Bsky