All posts
4 min readDanny Yau

3072-dim is the new hard mode for vector search

text-embedding-3-large produces 3,072-dim vectors, and most vector pipelines truncate them to stay performant. Here's what happened when we benchmarked the full dimension end-to-end.

A quick follow-up to last week's post on 100% recall on disk.

A few folks who read the first post asked, in different words, basically the same question: "OK, but does it still hold up at 3,072 dimensions?" It's a fair one, and the answer needed a benchmark and a bit of context, so here it is.

Why 3072-dim is harder than 1536-dim

OpenAI's text-embedding-3-large produces 3,072-dimensional vectors. It's one of the strongest general-purpose embedding models in production today — MTEB around 64.6, MIRACL around 54.9, both real jumps over text-embedding-ada-002 at 1536-dim. For teams running RAG against legal documents, medical records, or anywhere wording is load-bearing, the bigger model tends to pick up signal the smaller one misses.

The catch a lot of teams run into: as dimensionality goes up, approximate nearest-neighbor algorithms tend to lose recall. A peer-reviewed ICLR 2026 paper, Breaking the Curse of Dimensionality: On the Stability of Modern Vector Retrieval, formalizes why — instability in higher dimensions causes approximate methods to lose accuracy, and the effect compounds in multi-vector and filtered settings.

The field's common workaround is to truncate the embedding. Matryoshka Representation Learning lets you drop text-embedding-3-large from 3072 → 1024 or 3072 → 256, and the literature has a fair number of teams reporting that 256-dim HNSW lands around ~65% accuracy while 1024-dim sits closer to ~85%. It's a reasonable trade — you accept a little less of the precision the larger model was trained to provide, in exchange for query throughput your index can keep up with. For a lot of workloads that's the right call.

We were curious how the engine would handle the full 3,072 dimensions without the trade, so we ran it. Here's dbpedia-openai-3-large-100k — the full 3,072-dim public benchmark on the same engine — in both run modes.

Hardware: AMD Ryzen 9 · 64 GB RAM · NVMe Gen 4 2 TB · no GPU.

mmap mode

MetricResult
Datasetdbpedia-openai-3-large-100k
Recall@10100%
QPS2,452
p50 latency407 µs
p99 latency411 µs

100% recall on the full 3,072 dimensions, no truncation, no Matryoshka shortcut. Sub-half-millisecond median and about 4 µs of tail. The embedding keeps all of its dimensions and the retrieval keeps all of the matches it would otherwise have found.

Pure-disk mode

MetricResult
Datasetdbpedia-openai-3-large-100k
Recall@10100%
QPS953
p50 latency1.04 ms
p99 latency1.07 ms

Same 3,072 dimensions, this time without page-cache help. About 1 ms p50 with full recall. Slower than mmap by design — that's the trade for a smaller memory footprint, and it lets this mode share a box with other workloads.

Where this might fit

A typical text-embedding-3-large RAG pipeline today usually picks one of three shapes:

  1. Truncate to 256 or 1024 dimensions to keep recall high on an approximate index. You give back a little of the precision the larger model was trained to provide.
  2. Keep the full 3072 dimensions and accept 85–95% recall from an approximate index. You give back some results.
  3. Keep full dimensions and brute-force on a GPU. Works well; the bill is a real consideration at scale.

All three are reasonable for different workloads. The benchmark above is just a fourth shape to consider — full 3072 dimensions, 100% recall, on commodity disks — for cases where the answer set really has to be complete.

What's the same as last time

  • Same engine. Same disk path. Same algorithm. The only thing that changes is the dimensionality of the input vectors.
  • Same dbpedia-openai benchmark family — public dataset, public query set, reproducible.
  • Same trade between mmap and pure-disk: mmap is faster, pure-disk is smaller, both hit 100% recall.

What's different at 3072-dim

  • About 2× the vector storage cost (3072 vs 1536 floats). Index files grow proportionally.
  • A bit more math per comparison, which shows up as roughly 2.5× the per-query latency vs the 1536-dim 100K benchmark (407 µs vs 216 µs in mmap mode).
  • The same disk-vs-mmap throughput ratio — pure-disk runs about 40% the QPS of mmap, regardless of dimensionality.

What you can verify

The benchmark is dbpedia-openai-3-large-100k. The dataset, the vectors, and the query set are all public, so the run is reproducible. If the numbers don't come out the same on your hardware, please tell me — that's on us to explain.

And if you're running text-embedding-3-large somewhere and the truncation step in your pipeline has been on your mind, I'd be glad to compare notes against whatever you're using today. No pressure either way.

— Danny