What complete recall could mean for AI trading agents

A short follow-up to last week's post on legal-grade memory. A few readers from the trading side asked, in different words: "What does this look like for our use case?" It's a fair question and the answer turns out to be different enough from legal that it deserves its own write-up.

AI is becoming a real participant in trading

The clearest signal is the structural one. The Lumenai Innovation Fund, launching around June 2026, is the first publicly announced fully agentic AI hedge fund — autonomous AI agents are designed to generate, evaluate and manage investment ideas continuously, while human oversight is retained for governance, risk management and strategic supervision. It won't be the last. The pattern across the more thoughtful proposals — Captide, HedgeAgents in arxiv, several open-source experiments — is the same: a swarm of specialist agents (research, signal generation, risk, execution) collaborating on decisions, with memory of past work as a first-class system.

For a memory layer underneath those agents, the requirements look different from anything else in AI. Two of them pull against each other almost directly:

Latency. A trading agent that has to recall "what happened the last 12 times this pattern showed up" can't wait 9 seconds for its memory to load. The recall is on the critical path of the decision.
Completeness, under audit. Every regulator-facing record has to be exact, retrievable, and reconstructable. "Mostly accurate" is a regulatory finding, not a feature.

Most memory services in production today have been optimized for one or the other. Trading-grade workloads tend to need both.

The completeness side, in regulator language

Trading has unusual rules. Under MiFID II, an investment firm has to keep every order, trade, and related communication for a minimum of 5 years in a write-once-read-many format, in a "readily accessible" medium, with the ability to perform full trade reconstruction within 72 hours of a regulator's request. The reconstruction has to bring together transaction details, conversations, emails, meeting notes — everything that touched the trade.

If your memory layer summarizes, extracts, or approximates, the regulator's question — "show me every order and message related to ISIN XYZ between dates A and B" — has only one acceptable answer: every record. "We've kept the facts our extraction LLM thought were important" is not a defense.

It's the same shape as the legal hallucination problem, just sharper. In legal AI, retrieval misses lead to a confident wrong answer. In trading, retrieval misses lead to either a missed signal (alpha left on the table) or a compliance gap (regulatory finding). Both have real costs.

The latency side, in trading language

Trading agents work in compressed time. The market data of the last 10 minutes is more useful than the market data of the last 10 days, but only if it's available now. The mem0 team's 2026 benchmark put it well: full-context memory approaches deliver the highest accuracy ceiling, but at a "median latency of 9.87 seconds and a p95 latency of 17.12 seconds," which makes them categorically unusable in production for any agent that needs to make real-time calls.

The right shape for trading is: the recall has to be exact, and it has to land in single-digit milliseconds, every time. Tail latency matters as much as median latency — a p99 spike during a market event is exactly when the agent most needs its memory back.

Where UQL sits

The memory service we built runs on UQL — Unified Query Language. Five composable levels (vector search → filtered search → SQL → vector join → analytics → pipeline plans), with a 100% recall floor at the vector layer. The shape that emerged isn't because we set out to build for trading; it's because the design pressure of "give me back exactly what I stored, in milliseconds, with structured filters" turned out to be a general shape.

Here are the end-to-end benchmarks across all six phases on the dbpedia-openai cosine benchmark — full dimensions, no truncation, no approximation.

Hardware: AMD Ryzen 9 · 64 GB RAM · NVMe Gen 4 2 TB · no GPU.

Phase	What it tests	Recall@10	p50	p99
L1 — INTUITION	Vector search	100%	4.70 ms	4.93 ms
L2 — EPISODIC	Filtered vector search	100%	5.51 ms	6.02 ms
L3 — RBAC STATE	SQL → Vector join	100%	7.73 ms	7.96 ms
L4 — ANALYTICS	GROUP BY / SUM / AVG on results	100%	5.46 ms	5.87 ms
L5 — CROSS-CHECK	Correctness across modes	100%	—	—
L6 — PIPELINE	Multi-step pipeline plans	100%	0.24 ms	8.69 ms

What I want to flag from this table for the trading reader specifically: every phase is well under 10 ms at p99, and every phase is at 100% recall (which means the result set equals brute force, not "close to brute force"). The tail is tight too — the p99-to-p50 gap is small everywhere except L6, and even L6 stays under 9 ms.

The post on 100% recall on disk covers where the recall floor comes from at the vector layer.

Three trading-shaped query patterns

These are the kinds of queries the design is meant for. I'm going to write them in the language a trading desk would actually use, because the abstract version doesn't carry the same weight.

The pattern-match across history

"Find the 50 most similar market regimes to the current one, looking back five years. Show me what happened next in each case."

That's a vector search (semantic similarity over feature vectors) against five years of history at single-digit ms. On UQL that's an L1 query. On a memory layer with 92-97% recall, you're missing 1–4 of the 50 closest matches every time, and the agent reasons from an incomplete distribution.

The composed query under audit

"For every trade in the equities book between dates A and B, with a notional over $X, where the trader used the order-management system v2.1+, surface the related communications mentioning 'volatility' or 'unwind' in the same 24-hour window."

That's structured predicates (book = "equities", date_range, notional > X, oms_version >= 2.1) combined with a vector match on communications. On UQL it's an L3 SQL → Vector join — one call, one network round-trip, exact recall over the eligible set. On extraction-based memory you'd be assembling this across the structured trade store, the unstructured comms store, and an application-side intersection, every time the compliance team wants a query run.

The aggregate over a strategy's lifetime

"For every trade the agent generated under strategy X, group by the regime the agent classified, and surface the realized PnL."

That's L4 analytics over a vector-retrieved subset. Same call. The trading agent doesn't need to round-trip back to the data warehouse to answer "did my classification correlate with PnL?" — the memory layer is the answer.

What this means for trade reconstruction

If a regulator asks for the lifecycle of ISIN XYZ between dates A and B, the answer is a single L3 query: instrument = "XYZ" AND date_range (the SQL filter) combined with the full text of communications (the vector layer). The result is every order, every trade, every related message, exactly as recorded, in milliseconds. The 72-hour reconstruction window stops being a fire drill and becomes a query.

Honest caveats

A few, because they matter:

We're an infrastructure layer, not a trading platform. The blog above is about what becomes possible at the memory layer for a firm building an AI trading workflow — not about replacing an OMS, a market-data feed, or an execution venue. We sit underneath all of those.
The benchmarks above are on dbpedia-openai cosine. That's a standard public benchmark with public query sets. For a firm-specific corpus — order history, comms archives, strategy notes — the numbers would need to be re-run on that data. We're confident in the engine's behavior; we don't want to overstate cross-domain transfer.
Co-location and the truly-microsecond path — the genuinely latency-sensitive part of HFT (order placement, market-making at the exchange edge) runs on dedicated hardware in colocated racks and isn't a memory-layer problem. What we're describing is the memory layer behind the decisions, not the wire-level execution path.
Maturity. We're in private beta. Smaller customer base, fewer integrations than the established memory services. That's real, and worth weighing.

If you're building in this space

If you're building an AI trading agent or an agentic research workflow, and either the recall floor or the audit/reconstruction requirements are part of what you're navigating, please reach out. I'd genuinely like to compare notes on what you're using today and where it's working.

— Danny