How to actually choose a retrieval engine

How do teams choose vector databases / search engines?

People wrack their brains between Elasticsearch/OpenSearch/Solr/Vespa/Pinecone/Turbopuffer/Weaviate/…?

First things first - DO NOT start with a feature matrix. Start with the simple question:

What is my team most comfortable with? That’s the default. If everyone can go deep in one system, don’t overcomplicate the decision. It might be good enough to stop here.

NEXT - consider the high-level characteristics of the project. Use these as veto points for the original choice.

Pace of development - do you see that the project is actively being maintained and improved? If not, consider something else.
Scale - what scale does the project target? Does it match what you need? High scale, you want simple operations, executed predictably. Lower scale you’ll have richer features but won’t get predictable performance from those as you scale out. Choose the right fit.
Company capitalization - Who builds the technology? Will they exist in one year? If they don’t exist, who takes on the project?

FINALLY - think about how you make it easy to migrate OFF the technology. Don’t over-couple to one system / company. Avoid the advanced features unless they’re really killer. Build code that modularizes the dependency on the search backend so you can swap them out as needed.

-Doug

This is part of Doug’s Daily Search tips - subscribe here

Enjoy softwaredoug in training course form!

Starting May 18!

Signup here - http://maven.com/softwaredoug/cheat-at-search

I hope you join me at Cheat at Search with Agents to learn use agents in search. build better RAG and use LLMs in query understanding.

Doug Turnbull

More from Doug
Twitter | LinkedIn | Newsletter | Bsky