A real day with an agent swarm on GitDB

Here's what an A2A workflow looked like for us on our codebase last week. I've changed the feature name and the prompts slightly so I'm not putting anyone's hand-built spec on the public internet, but the structure and the numbers are real.

A bit of context on why the numbers look the way they do. The current industry baseline for AI coding agents is rough: independent measurements put 60–80% of an agent's tokens toward finding things, an agent like Claude Code routinely reads 25 files to answer a question about three functions, and a single grep against a real codebase returns 40+ hits and ~8,000 wasted tokens of context the model then has to filter. The run below is what the same kind of feature ship looks like when you take that overhead out of the loop.

Goal: add rate limiting to a webhook endpoint on a Node.js + Postgres service. Standard stuff. Three classes touched, one migration, one test file.

Four agents in the swarm: Architect, Coder, Reviewer, Tester. Each one has its own GitDB seat with scoped access — Architect can read everything, Coder can write to src/, Reviewer can read everything but write nothing, Tester can write to tests/. No agent can see the others' system prompts. Standard zero-trust setup.

09:14 — I prompt the swarm and walk away

I sent one message to the architect:

Add a 100 req/min rate limit per API key to the /v1/webhooks POST. Reuse the rate-limit middleware in src/middleware if it works for this case. If not, propose a new one. Hand off to the coder when the spec is clear.

That was it.

09:14 — Architect plans (38 seconds, ~4,200 tokens in)

Architect's first move was four GitDB tool calls:

gitdb_find_function("rate_limit")
  → found in src/middleware/throttle.ts:42, signature matches
gitdb_read_function("throttle.ts:42")
  → 380 tokens, exact function body
gitdb_find_callers("rate_limit")
  → 3 callers — health, search, billing endpoints
gitdb_read_lines("src/routes/webhooks.ts", 1, 60)
  → 540 tokens, current webhook handler

Total: ~1,500 tokens of code retrieved. Compare to the naive path of dumping throttle.ts (~4,200 tokens) plus webhooks.ts (~3,800 tokens) — 8,000 tokens for one decision.

Architect concluded the existing rate_limit would work with a minor extension to accept a key_extractor callback. It wrote a 14-line spec, committed it as docs/specs/webhook-rate-limit.md, and called the A2A handoff:

a2a_handoff(to="coder", artifact="docs/specs/webhook-rate-limit.md")
  → 15 tokens for the pointer; no code in the message

09:15 — Coder picks up (3 min 12 sec, ~6,800 tokens in)

Coder read the spec (380 tokens), then read the two functions Architect had identified (920 tokens combined). It did not re-read the whole files. It didn't need to.

Coder's edits:

Added key_extractor parameter to rate_limit — single write_function call.
Updated three call sites with the new signature — three write_lines calls.
Added the rate limit to the webhook route — one write_lines call.
Created a migration to add the rate_limit_buckets table — one create_file call.

Committed everything to a branch feat/webhook-rate-limit. Total Coder time: 3 minutes. Output: 47 lines of code, 8 lines of migration.

The handoff to Reviewer was again a pointer:

a2a_handoff(to="reviewer", artifact="branch:feat/webhook-rate-limit")
  → 18 tokens

09:18 — Reviewer reads the diff (1 min 40 sec, ~5,100 tokens in)

Reviewer's flow is different from Coder's. It doesn't read functions in isolation — it reads diffs and reasons about side effects.

gitdb_git_diff("main..feat/webhook-rate-limit")
  → 1,200 tokens — the actual diff
gitdb_find_callers("rate_limit")
  → confirms the 3 call sites Coder updated, finds no others
gitdb_read_function("rate_limit_buckets migration")
  → 280 tokens

Reviewer flagged one issue: the new key_extractor callback didn't handle the case where the API key was missing. It posted a review comment as a GitDB note attached to the branch:

gitdb_review_comment(
  branch="feat/webhook-rate-limit",
  line="src/middleware/throttle.ts:88",
  body="key_extractor returns undefined for unauthenticated requests —
        need a fallback bucket or this will throw"
)

A2A handoff back to Coder: 12 tokens.

09:20 — Coder fixes (45 seconds, ~1,200 tokens in)

Coder read the comment, ran gitdb_read_function("rate_limit") again (380 tokens, cached this time), patched the function to fall back to a global bucket when no key was present, and committed.

Re-handoff to Reviewer. 18 tokens.

09:21 — Reviewer approves, hands to Tester (~3,800 tokens)

Reviewer re-read the diff (1,400 tokens this time, since it's smaller), confirmed the fix, and handed off.

09:22 — Tester writes tests (2 min 18 sec, ~4,400 tokens in)

Tester read the changed function, read the existing test file pattern, wrote three new test cases:

gitdb_create_file("tests/webhook-rate-limit.test.ts", <test code>)
  → committed

Tester ran the tests via the CI hook — all green. Final A2A back to me with the PR link.

09:25 — I get a Slack notification

PR is ready. I review it the way I'd review any PR — eyes on the code, looking for things the swarm missed. I find nothing worth changing. I merge it.

Eleven minutes. Four agents. Three branches' worth of commits. One PR.

The bill

Agent	Input tokens	Output tokens	Cost
Architect	4,200	1,800	$0.18
Coder	8,000	2,400	$0.31
Reviewer	8,900	600	$0.29
Tester	4,400	1,400	$0.17
Total	25,500	6,200	$0.95

Compare that to the same feature run six weeks ago, before we cut over to GitDB's AST retrieval. The same feature ran on raw filesystem reads cost us $14.20, took 38 minutes, and produced a PR that needed two rounds of human review because Reviewer had hallucinated about a function that didn't exist (it had only seen the first half of a file).

What didn't work

A couple of things worth mentioning, since pretending they didn't happen would be dishonest.

The first version of Coder kept trying to clone the repo. Old habits from its base model. We had to add a system-prompt rule that explicitly said "you do not have filesystem access; please use GitDB tools." After that, no issues.

Reviewer over-flagged early on. The first three runs, Reviewer commented on style issues that weren't in our style guide. We gave it the linting config as a memory and the noise went away.

Both fixes took an afternoon. Neither was really about GitDB; both were about how to prompt agents that haven't worked this way before.

What you'd need to try this

Three things, in order:

A coding agent that supports MCP tools (Claude Code, Codex, or your in-house equivalent).
A GitDB instance with your codebase imported. About an hour.
Per-agent API keys — don't share a single key across agents, or you lose the audit trail.

That's about it. No framework, no orchestration layer. The A2A handoffs are just GitDB tool calls.

If you'd like to try this on your codebase, I'm happy to help you set up a sandbox. We're in private beta and I'd like to learn from more teams running multi-agent coding loops in real conditions — what works for you, and especially what doesn't.

— Danny