Embeddings and Vector Databases

Embeddings are the heart of every RAG pipeline. They transform text into numerical vectors that capture semantic similarity. Two sentences with the same meaning but different words are close together in vector space.

How Do Embeddings Work?

An embedding model converts text into a high-dimensional vector (typically 768–3,072 dimensions). The similarity of two texts is measured by the cosine distance of their vectors.

Example:

"How do I cancel my subscription?" → Vector A
"I want to end my membership" → Vector B
Cosine similarity(A, B) ≈ 0.94 (very similar)

Popular Embedding Models (2026)

Model	Provider	Dimensions	Specialty
text-embedding-3-large	OpenAI	3072	Best all-rounder
voyage-3	Anthropic/Voyage	1024	Strong for code + text
BGE-M3	BAAI	1024	Open source, multilingual
Cohere Embed v4	Cohere	1024	Multimodal (text + image)

Vector Databases

A vector database stores embeddings and enables fast similarity search (Approximate Nearest Neighbor — ANN).

Options Compared

Solution	Type	Scaling	Best For
Pinecone	Managed Cloud	Automatic	Quick start, production
Weaviate	Self-hosted / Cloud	Horizontal	Hybrid search (vector + keyword)
pgvector	PostgreSQL extension	Vertical	Existing Postgres infra
Qdrant	Self-hosted / Cloud	Horizontal	Performance, filtering
ChromaDB	Embedded	Local	Prototyping, small datasets

Semantic Search vs. Keyword Search

Keyword: "cancel subscription" only finds documents with those exact words
Semantic: Also finds "end contract," "stop membership," "terminate plan"

Hybrid approach: Most production systems combine both. Weaviate and Elasticsearch offer native hybrid search.

Practical tip: Start with pgvector if you already use PostgreSQL. ChromaDB is enough for prototypes. Switch to Pinecone or Qdrant when you need to scale beyond 1 million vectors.