When I worked at Shopify, the gold standard was GMV (the dollar amount in revenue). Naturally, that’s what we wanted to move with search A/B tests.

Seems sane. BUT it doesn’t actually isolate search per-se. A lot can happen:

  • Someone buys a $50K rolex on control randomly, destroying an A/B test
  • User checkouts happen rarely, add to carts occasionally, and search clicks frequently

For these experiments, You’d have to wait for very sparse confounders to even out. That might take months? Or maybe never as inventory + indices shift.

A better way:

  • Focus on signals in the search interaction (clicks + dwell)
  • Focus on events that give users value - but adjacent to search (add to cart from search UI, clear signals of interest, hovers, reading, scrolling, “read more”, subscribing, etc)
  • Study the relationship between search value and lagging metrics: retention and conversion

As metrics move away from search, danger of confusing a final sale with poor/good search quality only increases.

-Doug

This is part of Doug’s Daily Search tips - subscribe here


Enjoy softwaredoug in training course form!

Starting June 22!

I hope you join me at Cheat at Search with LLMs to learn how to apply LLMs to search applications. Check out this post for a sneak preview.

Doug Turnbull

More from Doug
Twitter | LinkedIn | Newsletter | Bsky
Take My New Course - Cheat at Search with LLMs