As I’m sure you’re aware NDCG is a statistic used to measure the relevance of a search algorithm. It compares the returned results to a set of labels, aka a judgment list. These judgments might be explicit – a human being went through and labeled them – or they might be implicit – gathered from clickstream data. Or they might be LLM generated.

In any case, when compared against the returned search results, we can put a number on the relevance.

I’ve argued, of course, this whole NDCG way of evaluating search is overrated in real-life circumstances. Yet it’s still useful 😎.

And, by God, I’m exhausted from writing this same, boilerplate, code over-and-over-and-over again at very job I go to.

So here’s a useful notebook to walk you through the major decisions of NDCG calculation that matter to real-life search evaluation.

Enjoy!

</img>


Doug Turnbull

More from Doug
Twitter | LinkedIn | Mastodon
Doug's articles at OpenSource Connections | Shopify Eng Blog