BiteSizeRAG | Richard Sonnen

Most RAG systems stuff retrieved content into a context window and hope for the best. It works, but the failure modes are familiar: noisy retrieval, context limits, inconsistent results.

Tool-using RAG takes a different approach. Instead of feeding documents directly to the model, you give agents tools to search and retrieve what they need. The agent decides when to search, what to look for, and how to use what it finds.

BiteSizeRAG explores one corner of this space: what happens when you break document collections into small, focused pieces instead of one big searchable pile? Each collection becomes its own MCP server with its own purpose-named tool. Not search_documents() but search_employee_benefits() or search_historical_recipes(). The hypothesis is that agents do better when tools have clear scope and purpose.

I built this to find out.

Why I Started

At work, I watch teams build RAG systems regularly. When traditional approaches hit their limits (context windows filling up, retrieval getting noisy) the fixes tend to be elaborate: prompt engineering to compensate for data issues, tagging systems that need constant maintenance, removing documents that cause bad outputs.

Tool-using RAG seemed like it might help by giving agents more control. But I’d heard enough from my engineering teams to know that MCP goes from hello-world to production headache fast.

I also had personal collections (historical books, recipes, research papers) that I wanted to make available to agents without rebuilding infrastructure every time. And if I was going to keep recommending tool-using RAG to clients, I wanted to know firsthand what building one actually involves.

Where I Went Wrong

MCP’s origins made things harder than expected. Anthropic designed it local-first and single-user. Security, HTTP, multi-tenancy: all afterthoughts or absent. The frameworks that emerged made demos easy and my use case difficult.

I built components in parallel: uploads, document processing, configuration, routing. Get each piece working, connect them, then test with real agents. Reasonable enough. But it meant I didn’t put tools in front of actual agents until late in the process.

When I finally did, I discovered how much I’d gotten wrong. Descriptions that seemed clear to me confused agents. Metadata formats weren’t useful to them. Agents needed ways to get full documents, not just chunks. Every lesson required changes that rippled back through the stack.

I should have started by handing a barely-working prototype to a real agent and watching what happened. The infrastructure exists to support what you learn from that process. I had it backwards.

What I Found

Watching agents use my tools, and refuse to use them, taught me things I hadn’t expected.

The hypothesis was right: focused tools work better than generic ones. An agent with search_documents() often ignores it. An agent with search_employee_benefits() uses it, because the name signals when the tool applies.

But the name just needs to be distinct. search_benefits works as well as anything longer. The real work is in the description.

A tool description is prompt engineering. You’re teaching the agent what’s in scope, what queries work well, and what the tool should NOT be used for. “Use for PTO and health coverage questions. Not for salary or compensation.” Adding concrete examples and explicit boundaries made agents dramatically more reliable.

Curation matters more than I expected. Add more documents! More sources! Broader coverage! But stale or conflicting content doesn’t improve with quantity. If the same document lives in multiple collections, that’s a sign your information architecture is off. Focused collections demand more maintenance. That forces a useful question: is this content worth keeping?

Building With an Agent

I built most of BiteSizeRAG with Claude Code, which created a useful feedback loop. When I wasn’t sure if a description would make sense to an agent, I could ask. When I wanted to test how an agent might interpret metadata, I could try it immediately. Using an agent to build tooling for agents caught problems faster than I expected.

Where It Stands

BiteSizeRAG works. Documents go in, get processed, become queryable through MCP. Each collection gets its own tool with a purpose-specific name and description. Search results carry attribution so agents can cite sources.

I’m refactoring the gateway now, replacing a framework that fought me with something that does what I need.

The core finding: focused, well-described tools do work better. But “focused” means more than a clever name. It means clear scope, detailed descriptions with examples and boundaries, and content that’s actually curated. The interface where agents meet your tools is where everything gets decided. I should have started there.

Tech Stack

PythonFastAPIQdrantPostgreSQLRabbitMQMCP