A quick follow-up to last week's post on 100% recall on disk.
A few folks who read the first post asked, in different words, basically the same question: "OK, but does it still hold up at 3,072 dimensions?" It's a fair one, and the answer needed a benchmark and a bit of context, so here it is.
Why 3072-dim is harder than 1536-dim
OpenAI's text-embedding-3-large produces 3,072-dimensional vectors. It's one of the strongest general-purpose embedding models in production today — MTEB around 64.6, MIRACL around 54.9, both real jumps over text-embedding-ada-002 at 1536-dim. For teams running RAG against legal documents, medical records, or anywhere wording is load-bearing, the bigger model tends to pick up signal the smaller one misses.
The catch a lot of teams run into: as dimensionality goes up, approximate nearest-neighbor algorithms tend to lose recall. A peer-reviewed ICLR 2026 paper, Breaking the Curse of Dimensionality: On the Stability of Modern Vector Retrieval, formalizes why — instability in higher dimensions causes approximate methods to lose accuracy, and the effect compounds in multi-vector and filtered settings.
The field's common workaround is to truncate the embedding. Matryoshka Representation Learning lets you drop text-embedding-3-large from 3072 → 1024 or 3072 → 256, and the literature has a fair number of teams reporting that 256-dim HNSW lands around ~65% accuracy while 1024-dim sits closer to ~85%. It's a reasonable trade — you accept a little less of the precision the larger model was trained to provide, in exchange for query throughput your index can keep up with. For a lot of workloads that's the right call.
We were curious how the engine would handle the full 3,072 dimensions without the trade, so we ran it. Here's dbpedia-openai-3-large-100k — the full 3,072-dim public benchmark on the same engine — in both run modes.
Hardware: AMD Ryzen 9 · 64 GB RAM · NVMe Gen 4 2 TB · no GPU.
mmap mode
| Metric | Result |
|---|---|
| Dataset | dbpedia-openai-3-large-100k |
| Recall@10 | 100% |
| QPS | 2,452 |
| p50 latency | 407 µs |
| p99 latency | 411 µs |
100% recall on the full 3,072 dimensions, no truncation, no Matryoshka shortcut. Sub-half-millisecond median and about 4 µs of tail. The embedding keeps all of its dimensions and the retrieval keeps all of the matches it would otherwise have found.
Pure-disk mode
| Metric | Result |
|---|---|
| Dataset | dbpedia-openai-3-large-100k |
| Recall@10 | 100% |
| QPS | 953 |
| p50 latency | 1.04 ms |
| p99 latency | 1.07 ms |
Same 3,072 dimensions, this time without page-cache help. About 1 ms p50 with full recall. Slower than mmap by design — that's the trade for a smaller memory footprint, and it lets this mode share a box with other workloads.
Where this might fit
A typical text-embedding-3-large RAG pipeline today usually picks one of three shapes:
- Truncate to 256 or 1024 dimensions to keep recall high on an approximate index. You give back a little of the precision the larger model was trained to provide.
- Keep the full 3072 dimensions and accept 85–95% recall from an approximate index. You give back some results.
- Keep full dimensions and brute-force on a GPU. Works well; the bill is a real consideration at scale.
All three are reasonable for different workloads. The benchmark above is just a fourth shape to consider — full 3072 dimensions, 100% recall, on commodity disks — for cases where the answer set really has to be complete.
What's the same as last time
- Same engine. Same disk path. Same algorithm. The only thing that changes is the dimensionality of the input vectors.
- Same
dbpedia-openaibenchmark family — public dataset, public query set, reproducible. - Same trade between mmap and pure-disk: mmap is faster, pure-disk is smaller, both hit 100% recall.
What's different at 3072-dim
- About 2× the vector storage cost (3072 vs 1536 floats). Index files grow proportionally.
- A bit more math per comparison, which shows up as roughly 2.5× the per-query latency vs the 1536-dim 100K benchmark (407 µs vs 216 µs in mmap mode).
- The same disk-vs-mmap throughput ratio — pure-disk runs about 40% the QPS of mmap, regardless of dimensionality.
What you can verify
The benchmark is dbpedia-openai-3-large-100k. The dataset, the vectors, and the query set are all public, so the run is reproducible. If the numbers don't come out the same on your hardware, please tell me — that's on us to explain.
And if you're running text-embedding-3-large somewhere and the truncation step in your pipeline has been on your mind, I'd be glad to compare notes against whatever you're using today. No pressure either way.
— Danny