All posts
5 min readDanny Yau

How GitDB cut our AI coding bill by 95%

A walk-through of the actual token math — where the savings come from, what they don't help with, and how I'd reproduce them in your codebase.

Here's the math behind the 95% number, since that's the part most people seem to want to verify.

The headline isn't novel; the industry has been documenting the same shape for a year. Augment Code's research and independent measurements on Medium both converge on the finding that 60–80% of an AI coding agent's tokens go to finding things, not answering the question. A carefully reported case study on a 200-file TypeScript project swapped grep-based search for AST-level subgraph retrieval and watched Claude Code's input tokens fall from 8,200 to 2,100 on the same task — a 74% reduction from one change to the retrieval substrate. Our 95% figure isn't a different phenomenon; it's the same wedge with more of the surface area covered.

Our internal sample codebase is about 850 files, averaging 1,100 lines each — a fairly typical mid-sized Rust + TypeScript project. We tracked a month of agent activity in April and zoomed into one 24-hour window where four coding agents had collectively done the following:

  • Read 38 unique files
  • Opened those files an average of 6.2 times each (the same file getting read by different agents in the swarm)
  • Used ~1.4M input tokens
  • Spent about $28 in raw LLM cost across the four agents

For one feature.

That was the baseline we measured against. 1.4M tokens, roughly $28, to ship one rate-limiting middleware.

Where 95% comes from — the first wedge

The first wedge is the simpler one: stop reading whole files.

Most coding tasks need one function, not a file. When an agent asks "where is process_payment?" the traditional path is:

  1. Grep the codebase — already a multi-tool dance.
  2. Read the entire file the function lives in — ~9,000 tokens for a typical implementation file.
  3. Decide whether the function found is actually the one called from auth_middleware.rs.

That last step is the killer. Because grep is lossy, agents usually re-read the same file from a different vantage point just to confirm. So one logical operation becomes two or three reads of the same file.

GitDB collapses that whole dance into one tool call:

gitdb_find_function("process_payment")
  → ~15 tokens in, ~50 tokens out
    location: src/payments/handler.rs
    line_range: 142..198
    ast_node_id: ...

Then the agent reads exactly the lines it wants:

gitdb_read_lines("src/payments/handler.rs", 142, 198)
  → ~420 tokens — the function plus signature, nothing else

Total: about 470 tokens. Compared to the ~9,000-token full file read, that's 95% less. Per call. We measured this across hundreds of agent operations and the ratio holds within a few points.

The second wedge — 83% from agent memory

The token savings stack.

When agents work in a loop without persistent memory, they re-derive the same conclusion over and over. The first time the swarm hit our auth module, the architect agent spent ~2,000 tokens reasoning about "why is rate limiting tricky in this codebase?" The next time the swarm touched auth, three weeks later, a different architect agent burned the same 2,000 tokens reasoning about the same problem.

GitDB stores agent memory as a first-class citizen, scoped to the codebase. When the second agent runs, it pulls the prior decision back in ~200 tokens and a single API call:

gitdb_context_recall("rate limiting in auth handler")
  → 200 tokens of recovered context, instead of 2,000 tokens of fresh reasoning

In our benchmark, this saves about 83% of the tokens an agent spends re-discovering things its predecessor already learned. SAGE — a 2025 paper out of MSR — measured 59% on a different workload; 83% is what we see when you stack our AST retrieval on top.

One thing worth flagging: this only works if memory recall is actually exact. If you're recalling a prior decision at 87% accuracy, the agent silently re-derives the missing 13% and the savings shrink with it. The recent ICLR 2026 paper Breaking the Curse of Dimensionality: On the Stability of Modern Vector Retrieval walks through why approximate methods lose recall as dimensionality grows — and our memory layer sits on the same 100% recall vector engine we wrote about in a separate post for exactly this reason.

Combined math: the 1.4M-token feature run drops to roughly 140K tokens. Same output. $2.80 instead of $28.

What this doesn't help with

A few things stay expensive after GitDB, and I'd rather call them out than leave them implied:

  • Output tokens. If your agent writes 800 lines of code, that's 800 lines of tokens regardless of how it read the codebase. GitDB shrinks the prompt, not the response.
  • Pure reasoning tasks. If the agent spends 10,000 tokens thinking about an algorithm design, that's chain-of-thought, not retrieval. We don't touch that.
  • Code generation against unfamiliar libraries. External docs aren't in GitDB yet (something we're working on). Agents still pull from search or docs sites.

"95% off your LLM bill" is a tempting headline, but the more honest framing is "95% off the retrieval layer." For coding-heavy workloads that layer is most of the bill, which is why we focused there. For chat-style agents, the wins will look different.

How to measure this on your own codebase

If you want to check whether the same shape would show up for you:

  1. Instrument your current agent runner to log per-call input token counts.
  2. Run a feature end-to-end. Record total in/out tokens and how many file reads happened.
  3. Bucket reads into "file dumps" vs "targeted line ranges." The full-file bucket is where most of the savings tend to live.

A rough lower bound for any team is the percentage of input tokens that come from full-file reads. If that number sits at 70%+ — and on coding swarms we've seen, it usually does — there's probably a 60-90% input-token reduction available with retrieval that respects function boundaries. Your mileage will vary based on the codebase shape.

If you'd like to compare notes on the actual benchmark setup, please reach out. I'd be glad to share the raw numbers and the comparison script.

— Danny