3072-dim is the new hard mode for vector search

A quick follow-up to last week's post on 100% recall on disk.

A few folks who read the first post asked, in different words, basically the same question: "OK, but does it still hold up at 3,072 dimensions?" It's a fair one, and the answer needed a benchmark and a bit of context, so here it is.

Why 3072-dim is harder than 1536-dim

OpenAI's text-embedding-3-large produces 3,072-dimensional vectors. It's one of the strongest general-purpose embedding models in production today — MTEB around 64.6, MIRACL around 54.9, both real jumps over text-embedding-ada-002 at 1536-dim. For teams running RAG against legal documents, medical records, or anywhere wording is load-bearing, the bigger model tends to pick up signal the smaller one misses.

The catch a lot of teams run into: as dimensionality goes up, approximate nearest-neighbor algorithms tend to lose recall. A peer-reviewed ICLR 2026 paper, Breaking the Curse of Dimensionality: On the Stability of Modern Vector Retrieval, formalizes why — instability in higher dimensions causes approximate methods to lose accuracy, and the effect compounds in multi-vector and filtered settings.

The field's common workaround is to truncate the embedding. Matryoshka Representation Learning lets you drop text-embedding-3-large from 3072 → 1024 or 3072 → 256, and the literature has a fair number of teams reporting that 256-dim HNSW lands around ~65% accuracy while 1024-dim sits closer to ~85%. It's a reasonable trade — you accept a little less of the precision the larger model was trained to provide, in exchange for query throughput your index can keep up with. For a lot of workloads that's the right call.

We were curious how the engine would handle the full 3,072 dimensions without the trade, so we ran it. Here's dbpedia-openai-3-large-100k — the full 3,072-dim public benchmark on the same engine — in both run modes.

Hardware: AMD Ryzen 9 · 64 GB RAM · NVMe Gen 4 2 TB · no GPU.

mmap mode

Metric	Result
Dataset	`dbpedia-openai-3-large-100k`
Recall@10	100%
QPS	2,452
p50 latency	407 µs
p99 latency	411 µs

100% recall on the full 3,072 dimensions, no truncation, no Matryoshka shortcut. Sub-half-millisecond median and about 4 µs of tail. The embedding keeps all of its dimensions and the retrieval keeps all of the matches it would otherwise have found.

Pure-disk mode

Metric	Result
Dataset	`dbpedia-openai-3-large-100k`
Recall@10	100%
QPS	953
p50 latency	1.04 ms
p99 latency	1.07 ms

Same 3,072 dimensions, this time without page-cache help. About 1 ms p50 with full recall. Slower than mmap by design — that's the trade for a smaller memory footprint, and it lets this mode share a box with other workloads.

Where this might fit

A typical text-embedding-3-large RAG pipeline today usually picks one of three shapes:

Truncate to 256 or 1024 dimensions to keep recall high on an approximate index. You give back a little of the precision the larger model was trained to provide.
Keep the full 3072 dimensions and accept 85–95% recall from an approximate index. You give back some results.
Keep full dimensions and brute-force on a GPU. Works well; the bill is a real consideration at scale.

All three are reasonable for different workloads. The benchmark above is just a fourth shape to consider — full 3072 dimensions, 100% recall, on commodity disks — for cases where the answer set really has to be complete.

What's the same as last time

Same engine. Same disk path. Same algorithm. The only thing that changes is the dimensionality of the input vectors.
Same dbpedia-openai benchmark family — public dataset, public query set, reproducible.
Same trade between mmap and pure-disk: mmap is faster, pure-disk is smaller, both hit 100% recall.

What's different at 3072-dim

About 2× the vector storage cost (3072 vs 1536 floats). Index files grow proportionally.
A bit more math per comparison, which shows up as roughly 2.5× the per-query latency vs the 1536-dim 100K benchmark (407 µs vs 216 µs in mmap mode).
The same disk-vs-mmap throughput ratio — pure-disk runs about 40% the QPS of mmap, regardless of dimensionality.

What you can verify

The benchmark is dbpedia-openai-3-large-100k. The dataset, the vectors, and the query set are all public, so the run is reproducible. If the numbers don't come out the same on your hardware, please tell me — that's on us to explain.

And if you're running text-embedding-3-large somewhere and the truncation step in your pipeline has been on your mind, I'd be glad to compare notes against whatever you're using today. No pressure either way.

— Danny