What slows full-text search down? too many unique terms.

Consider search with numerical values. It’s unlikely you care about the distinction between 3.145927 and 3.14 when searching. Both are pi! 🥧

Instead of a postings list that looks like

3.145927 → [1, 5, 9]

3.14 → [1, 3, 9]

Collapse them to:

pi → [1, 3, 5, 9]

This requires you to pay attention to tokenization. Whether you actually have numbers, or more likely - you’re dealing with stemming or synonyms - collapsing terms to a single concept pays performance dividends.

And It helps improve recall too!

-Doug

This is part of Doug’s Daily Search tips - subscribe here


Enjoy softwaredoug in training course form!

Starting May 18!

Signup here - http://maven.com/softwaredoug/cheat-at-search

I hope you join me at Cheat at Search with Agents to learn use agents in search. build better RAG and use LLMs in query understanding.

Doug Turnbull

More from Doug
Twitter | LinkedIn | Newsletter | Bsky