Classic RAG expects a linear process:

  1. An LLM translates the user’s prompt into a query
  2. We search on behalf of the LLM
  3. That context goes to the LLM
  4. The LLM tries to answer the user’s question

Every step contains a point of failure. With zero resilience in the process.

Below, a user wants hip restaurants from Paris for their vacation. We seemingly parse a “location” correctly (Paris). We get results back though for Paris Texas (maybe the closest Paris to us!?).

The first thing we see here: RAG doesn’t just dump a lengthy natural language query (“best restaurants in Paris”) to a search backend and hope for the best. It decomposes the query into something more structured: (query:restaurants location:Paris).

Yet it can be tempting to try to layer filter-after-filter, boost-after-boost into what’s processed. With little measurement behind the accuracy.

Then teams start to obsess (not incorrectly) on the accuracy of each of these steps. How accurately can an LLM select from a set of cities to filter? Should we enforce something more exact than just a name of a place? Postal code / lat long / bounding box?

Still when we take a step back, we have a fundamental flaw:

  • Every step MUST work without failing
  • There’s no way for the LLM to look at the results and say “hey wait a minute!” and try a new query

That’s what makes agentic search useful. It is NOT a linear process, but a loop. A way for the process to reflect on tool results (here search tool results) and reformulate. Like below: ​

So 100% we should make every retrieval step as efficient as possible. But we also need an escape hatch, for when that’s not working, and the agent needs to try again. Agents put the Resilience in RAG :).

-Doug

My next Agentic Search course starts May 18, prices go up Apr 6. http://maven.com/softwaredoug/cheat-at-search

This is part of Doug’s Daily Search tips - subscribe here


Doug Turnbull

More from Doug
Twitter | LinkedIn | Newsletter | Bsky