Embeddings are the heart of every RAG pipeline. They transform text into numerical vectors that capture semantic similarity. Two sentences with the same meaning but different words are close together in vector space.
An embedding model converts text into a high-dimensional vector (typically 768–3,072 dimensions). The similarity of two texts is measured by the cosine distance of their vectors.
Example:
| Model | Provider | Dimensions | Specialty |
|---|---|---|---|
| text-embedding-3-large | OpenAI | 3072 | Best all-rounder |
| voyage-3 | Anthropic/Voyage | 1024 | Strong for code + text |
| BGE-M3 | BAAI | 1024 | Open source, multilingual |
| Cohere Embed v4 | Cohere | 1024 | Multimodal (text + image) |
A vector database stores embeddings and enables fast similarity search (Approximate Nearest Neighbor — ANN).
| Solution | Type | Scaling | Best For |
|---|---|---|---|
| Pinecone | Managed Cloud | Automatic | Quick start, production |
| Weaviate | Self-hosted / Cloud | Horizontal | Hybrid search (vector + keyword) |
| pgvector | PostgreSQL extension | Vertical | Existing Postgres infra |
| Qdrant | Self-hosted / Cloud | Horizontal | Performance, filtering |
| ChromaDB | Embedded | Local | Prototyping, small datasets |
Hybrid approach: Most production systems combine both. Weaviate and Elasticsearch offer native hybrid search.
Practical tip: Start with pgvector if you already use PostgreSQL. ChromaDB is enough for prototypes. Switch to Pinecone or Qdrant when you need to scale beyond 1 million vectors.