Embeddings
The numerical fingerprints that make semantic search possible.
What is an embedding?
An embedding is a learned mapping from an input — a sentence, a code snippet, an image — to a dense vector of real numbers in a fixed-dimensional space (typically 384, 768, 1024 or 1536 dimensions). Inputs with similar meaning land near each other in that space.
The model that produces the embedding is responsible for compressing semantic information into geometry. Once you have the vector, you no longer care about the original modality — search reduces to "find the nearest points".
Typical dimensions
// Text
"text-embedding-3-small" → 1536 dims
"text-embedding-3-large" → 3072 dims
"bge-small-en-v1.5" → 384 dims
"all-MiniLM-L6-v2" → 384 dims
// Image / multimodal
"clip-vit-base-patch32" → 512 dimsNormalization matters
Many models output L2-normalized vectors (unit length). When they do, cosine similarity and dot product give identical rankings — and dot product is faster. If your model does not normalize, do it yourself before upserting:
function normalize(v: number[]): number[] {
const norm = Math.sqrt(v.reduce((s, x) => s + x * x, 0));
return v.map((x) => x / norm);
}