If pointwise evals asks “How relevant is this from 1-5” - pairwise search evals says “Which of these two results is more relevant - X or Y?”
Comparing two items at a time has some advantages:
- Less chance for per-decision error - harder to screw up one is better than another
- More precise results - fine grain details that can’t be shoved into a 1-5 scale
- Faster decisions - comparisons often can be made quicker
However, two major downsides remain
- Pairwise evals take more time - instead of rating 10 items 1-5, you need to compare 10 items against 9 other items to get a complete picture
- Pairwise evals need to be transformed into pointwise - to use traditional search metrics or ranking data, we need a single score per-document
Luckily these factors can be mitigated.
- LLMs can do a lot of the simpler evals / comparisons - such as my approach to LLM as a judge
- A system like Elo - used in competitions like chess - can be used to turn 1 vs 1 competitions (like pairwise comparisons) into a pointwise rating
-Doug
This is part of Doug’s Daily Search tips - subscribe here
Enjoy softwaredoug in training course form!
Starting June 22!
I hope you join me at Cheat at Search with LLMs to learn how to apply LLMs to search applications. Check out this post for a sneak preview.