Agent-to-Agent workflows in GitDB

The most expensive moment in a multi-agent coding loop, at least in the runs we've measured, is the handoff.

Not the inference. Not the code generation. The handoff — when Agent A finishes its work and passes context to Agent B. In most frameworks we tried, that handoff is implemented as "stuff the relevant code into B's prompt," which means the swarm ends up paying the full token cost of the work again every time the baton moves.

The shape of the problem is well-documented at this point. A2A — Google's open Agent-to-Agent protocol — now has 150+ organizations supporting it and is integrated natively into Azure AI Foundry, Amazon Bedrock AgentCore, and Google Cloud. The fact that the major cloud platforms agreed on a wire format for agent handoffs in less than a year tells you how much pressure the multi-agent pattern is putting on people's bills.

The cost story underneath that pressure is real. In AutoGen's GroupChat model, every agent turn is a full LLM call with the entire accumulated conversation history — a 4-agent debate over 5 rounds is 20 LLM calls minimum, each one carrying the prior 19 in its prompt. Independent benchmarks show CrewAI carrying up to 3× the token footprint of LangGraph on simple single-tool-call workflows for the same reason. The cost isn't the work the agents do; it's the cost of telling each agent what the others did.

I spent a year poking at that. This post is about what we ended up building, and why I've come to think the substrate matters at least as much as the framework on top of it.

Why A2A matters

Most engineering work isn't a single role. Even a solo human cycles between roles minute to minute — you write a function (coder), you re-read it (reviewer), you write a test (tester), you go back and adjust the spec because the test surfaced something (architect). Inside one head, those transitions are free. Across four agents, the transitions are where a lot of the cost lives.

A2A workflows make those role transitions explicit. Architect agent plans. Coder agent implements. Reviewer agent checks. Tester agent verifies. Each one stays good at one thing, and in our experience the swarm ships features more reliably than a single generalist agent because the specialists don't blow their context window on the wrong thing.

That story works in theory. The hard part, for us, was the substrate.

Where Git starts to bend, for A2A

Git is a beautiful tool. It just wasn't designed for software that runs as multiple agents handing work off to each other. Three places where we kept hitting friction:

1. There's no granularity below a file. Git tracks files. If Architect wants to tell Coder "change the function on line 42," Git's vocabulary is "here's the whole file." Coder gets the whole file dumped into its prompt, and you end up paying for 9,000 tokens to hand off 50 lines of intent.

2. Identity gets blurry inside a clone. The moment you git clone, "who is reading this code" becomes a question that's hard to answer cleanly. Was it Architect? Coder? A compromised webhook? Was it 09:14:32 or 11:08:45? The repo on disk doesn't keep that record, so an audit of "what did this agent actually see?" tends to be a reconstruction from VPN logs and SSH timestamps.

3. Policy lives outside the data. You can give an agent a scoped SSH key, of course. But the moment the agent has the repo on disk, scope is essentially over — the whole codebase is in the process. If the agent decides to read secrets/, Git won't stop it, because Git already handed the keys over at clone time.

For one human, none of this is a real problem. For four agents working in parallel, we found we needed granularity, identity, and policy at the data layer, not the application layer.

What GitDB gives A2A that Git can't

I'll keep this concrete. Three properties that fall out of treating code as a query target instead of a filesystem.

Every tool call lands in the audit trail. Architect calls gitdb_find_function("rate_limit"). That's a row. Coder calls gitdb_write_lines(...). That's a row. Reviewer posts a review comment. That's a row. Each event carries an identity, a timestamp, and the exact arguments. When something goes wrong six weeks later — and something will go wrong — you can replay the swarm step by step.

Per-agent seats = per-agent identity. There's no shared bot token. Architect has its own API key. So does Coder. So does Reviewer. So does Tester. They can't impersonate each other. If Coder's key leaks, you revoke Coder. Architect keeps shipping.

Sub-millisecond MCP reads and writes. This one matters more than it sounds. When the handoff cost is a 4,000-token full-file dump, agents talk in slow paragraphs. When the handoff cost is ~15 tokens of "here's the branch I committed to," they can negotiate at machine speed. We've seen swarms drop a feature in 11 minutes that used to take an hour — and the time wasn't lost to inference, it was lost to all the re-reading of the same files.

A worked example, in 60 seconds

I wrote a full walkthrough of a four-agent feature ship — the 11-minute, $0.95 rate-limiting story. The short version of the A2A pieces:

Architect plans the change — 4 tool calls, 1,500 tokens of code retrieved.
Hands off to Coder via a2a_handoff(to="coder", artifact="docs/specs/...") — 15 tokens.
Coder edits the function and commits — no whole-file reads.
Hands off to Reviewer via a2a_handoff(to="reviewer", artifact="branch:...") — 18 tokens.
Reviewer flags an issue; Coder fixes it; Tester writes tests; Tester pings me.

Every handoff above is a pointer, not a payload. Every event lands in the audit trail with the agent's identity, the timestamp, and the artifact reference. When I open the PR review, I can see exactly which agent did what — and exactly which functions each one read to make its decision.

The compliance angle, as a side effect

We didn't start building GitDB for compliance — we started because the AI bill was too high. But the compliance side has turned out to matter to the security teams we've shown it to, more than I expected.

The question that comes up every time, in some form, is who did what, and can you prove it?

With Git on a laptop, the answer is usually "we have VPN logs, and commit authors, and we can probably reconstruct it." With GitDB, the answer can be "here is the row in the audit trail." Per-file. Per-line. Per-agent. Per-millisecond.

That's a meaningful difference if you're trying to get clearance to put AI agents near a real codebase. We built GitDB for the engineers trying to ship faster; the audit story is something that came along with the design, and we're happy it did.

What's next

If you'd like a concrete read, there's a full walkthrough of a four-agent feature ship — what each agent did, where they handed off, and what the bill looked like.

If you're running A2A workflows on your own codebase and either the cost or the audit story isn't quite working, please drop me a line. I'd genuinely like to hear what's breaking for you.

— Danny