If pointwise evals asks “How relevant is this from 1-5” - pairwise search evals says “Which of these two results is more relevant - X or Y?”

Comparing two items at a time has some advantages:

  • Less chance for per-decision error - harder to screw up one is better than another
  • More precise results - fine grain details that can’t be shoved into a 1-5 scale
  • Faster decisions - comparisons often can be made quicker

However, two major downsides remain

  • Pairwise evals take more time - instead of rating 10 items 1-5, you need to compare 10 items against 9 other items to get a complete picture
  • Pairwise evals need to be transformed into pointwise - to use traditional search metrics or ranking data, we need a single score per-document

Luckily these factors can be mitigated.

  • LLMs can do a lot of the simpler evals / comparisons - such as my approach to LLM as a judge
  • A system like Elo - used in competitions like chess - can be used to turn 1 vs 1 competitions (like pairwise comparisons) into a pointwise rating

-Doug

This is part of Doug’s Daily Search tips - subscribe here


Enjoy softwaredoug in training course form!

Starting June 22!

I hope you join me at Cheat at Search with LLMs to learn how to apply LLMs to search applications. Check out this post for a sneak preview.

Doug Turnbull

More from Doug
Twitter | LinkedIn | Newsletter | Bsky
Take My New Course - Cheat at Search with LLMs