All posts
6 min readDanny Yau

Memory APIs in 2026 — where we fit beside mem0 and Supermemory

Two memory services lead the agent-memory category — mem0 and Supermemory. Both are well-engineered for conversational personalization. Here's where ours sits and where each of theirs is the better choice.

A follow-up to the legal-AI memory post, since the question that keeps coming up is the direct comparison: "How does this stack up against mem0 or Supermemory specifically?"

The honest answer is that both of them are doing impressive work in a category that's only about two years old. I want to walk through each, be specific about what they're good at, and then describe where ours fits differently — because for some workloads they're the right answer and for others ours is.

mem0 — extraction-first, multi-signal retrieval

mem0 is the most prominent open-source memory layer for AI agents. The architecture is extraction-first: an LLM reads conversation text, pulls facts out in a single-pass ADD-only flow, and stores them in a structured store. Retrieval is then multi-signal — semantic, BM25, entity matching, scored in parallel and fused, with temporal reasoning on top.

The published benchmarks are real: in the 91–94% range on LongMemEval and LoCoMo, with retrieval averaging under ~7,000 tokens per call after their April 2026 token-efficient release. For conversational personalization — "remember that the user prefers dark mode and that their daughter's name is Lin" — this is genuinely the right shape, and the open-source community around it is large and active.

The candid limitations their own documentation acknowledges: response times in the 7–10 second range under load, no document retrieval, data connectors, user profiles, or enterprise compliance features, and their team describes the layer as covering "one slice of a full context stack."

Supermemory — vector graph + ontology, sub-300 ms

Supermemory takes the same general problem and solves it with a vector graph engine with ontology-aware edges. The system tracks relationships between memories, handles contradictions, and resolves knowledge updates without corrupting prior state. Production numbers are strong — sub-300 ms p50 recall while processing 100B+ tokens monthly, with automatic profile maintenance and contextual chunking baked into the API.

If your application benefits from automatic entity tracking — "every time someone mentions Acme Corp, merge it with the prior Acme Corp record, and roll up X's relationships and roles over time" — this is the most mature option in the category. Their published benchmarks (85.4% on LongMemEval vs mid-60s for pure vector DBs, 59.7% P@1 on LoCoMo vs 34.4%) are real, and the team has shipped at scale.

The candid limitation is the same one that's true of any extraction-or-ontology approach: the service decides what counts as an entity, a fact, and a relationship at write time. If your domain doesn't quite fit the ontology, the abstraction is the part that pushes back hardest.

What both share, and where it matters

Both services are write-time-structuring approaches. An LLM reads your conversation, makes decisions about what to keep, and stores the result. Retrieval then operates over those decisions.

For chat-style use cases, that's exactly the right design choice. The cost — and this is the part worth being clear about — is that you don't get back what you stored; you get back the service's interpretation of what you stored. When a regulator, a partner, or a user later asks for "the exact wording of the conversation about indemnification on March 14," the answer is reconstructed from the extracted facts, not retrieved.

For most consumer-AI workloads, that's fine. For a small but growing set of workloads — legal, healthcare, regulated finance, anywhere wording is load-bearing — it's the wrong shape.

Where we fit

We took a different starting assumption: store what the user gives us, exactly as given, and put the structuring power in the query language instead of the write step. The query language is UQL, with five composable levels — vector search, filtered vector search, SQL → Vector join, analytics, pipeline plans — running on top of a 100% recall vector engine.

What that means concretely:

  • No write-time extraction at this layer. If you want to extract facts, you run an LLM pass yourself and store the structured output in our engine alongside the raw text. Several of our users do exactly this — they like the recall floor of 100% under whatever structure they put on top.
  • Composable recall. "Memories from premium-tier users in Q3 mentioning billing concerns" is one L3 SQL → Vector call (filter + semantic search), not a multi-step intersection in your application code.
  • Exact recall over a stored corpus. When the application needs the original wording — every original wording — it can recall it. No summary stood between the data and the answer.

The full benchmark table across all six UQL phases (100% recall, sub-10ms p99 on each phase) is in the legal-AI post. The shape that emerged is closer to a queryable corpus than a conversational memory — and that's the use case it fits.

Where each of the others is honestly the better fit

I want to be specific about this, because "we win" isn't a useful answer for a buyer deciding what to deploy.

  • mem0 is the better fit if you're building consumer-grade conversational personalization, you want an open-source layer with an active community, and the LongMemEval / LoCoMo numbers are how you'll evaluate. Their April 2026 token-efficient release is the right shape for that workload, and the open-source angle is meaningful.
  • Supermemory is the better fit if you need automatic entity merging, ontology maintenance, contradiction resolution, and very high write throughput. Their benchmark scores and operational maturity are real, and the API surface is more polished for chat-style consumer products than ours.

We're a worse fit when:

  • The workload is conversational personalization where "the model interprets the conversation" is the right behavior.
  • Your benchmark is LongMemEval or LoCoMo — those measure end-to-end conversational memory including the extraction step, and we don't extract at this layer.
  • You need automatic entity / relationship maintenance baked in.
  • You don't have a corpus with wording that matters.

We're a better fit when:

  • The recall has to be exact. "What I stored is what comes back" is non-negotiable.
  • The query is multi-step — filter, then search, then aggregate.
  • You want your own structure (your tables, your filters) rather than a service-defined schema.
  • Regulated workloads where summarization is a compliance problem.

Honest caveats

  • Different layers in the stack. mem0 and Supermemory are more application-aligned (they make decisions about what to remember for you). We're more substrate-aligned (we keep what you tell us to keep). Comparing them directly is a little like comparing PostgreSQL to a CRM — both touch "your data" but at different abstractions.
  • No LongMemEval / LoCoMo scores from us. Those benchmarks include the extraction step, which we don't do at this layer. Apples-to-apples there would require us to add an extraction pre-step or for them to disable theirs.
  • Maturity. Private beta. Smaller customer base, fewer integrations than the established options. That's real.

If your application sits in the "every word counts" category and you're weighing memory layers, please reach out. I'd genuinely like to compare notes — and if mem0 or Supermemory is the better fit for your workload, I'll tell you that honestly.

— Danny