Scaling & performance
The numbers and trade-offs you actually hit as your index grows from 1M to 1B vectors.
Memory math
A float32 vector at 1536 dimensions costs 1536 × 4 = 6,144 bytes. Add roughly 1.5× for HNSW graph overhead. So one million vectors needs ≈ 9 GB RAM, ten million ≈ 90 GB, a hundred million ≈ 900 GB.
vectors | float32 HNSW | int8 HNSW | PQ-8x
---------+--------------+-----------+--------
1 M | ~9 GB | ~2.5 GB | ~80 MB
10 M | ~90 GB | ~25 GB | ~800 MB
100 M | ~900 GB | ~250 GB | ~8 GB
1 B | impractical | ~2.5 TB | ~80 GBQuantization
Scalar quantization to int8 typically loses < 1% recall and cuts memory 4×. Product quantization (PQ) compresses 30–100× with a larger but still acceptable recall hit when paired with a small re-rank pass over uncompressed candidates.
Sharding
Past ~50M vectors per node, shard horizontally by tenant or by a random hash. Query each shard in parallel and merge results — most managed services do this for you, but rolling your own with Qdrant or Milvus is straightforward.