As I’m sure you’re aware NDCG is a statistic used to measure the relevance of a search algorithm. It compares the returned results to a set of labels, aka a judgment list. These judgments might be explicit – a human being went through and labeled them – or they might be implicit – gathered from clickstream data. Or they might be LLM generated.

In any case, when compared against the returned search results, we can put a number on the relevance.

I’ve argued, of course, this whole NDCG way of evaluating search is overrated in real-life circumstances. Yet it’s still useful 😎.

And, by God, I’m exhausted from writing this same, boilerplate, code over-and-over-and-over again at very job I go to.

So here’s a useful notebook to walk you through the major decisions of NDCG calculation that matter to real-life search evaluation.

Enjoy!

</img>


Enjoy softwaredoug in training course form!

Starting May 18!

Signup here - http://maven.com/softwaredoug/cheat-at-search

I hope you join me at Cheat at Search with Agents to learn use agents in search. build better RAG and use LLMs in query understanding.

Doug Turnbull

More from Doug
Twitter | LinkedIn | Newsletter | Bsky