Lesson 2 of 6·12 min read

Embeddings and Vector Databases

Embeddings are the heart of every RAG pipeline. They transform text into numerical vectors that capture semantic similarity. Two sentences with the same meaning but different words are close together in vector space.

How Do Embeddings Work?

An embedding model converts text into a high-dimensional vector (typically 768–3,072 dimensions). The similarity of two texts is measured by the cosine distance of their vectors.

Example:

  • "How do I cancel my subscription?" → Vector A
  • "I want to end my membership" → Vector B
  • Cosine similarity(A, B) ≈ 0.94 (very similar)

Popular Embedding Models (2026)

ModelProviderDimensionsSpecialty
text-embedding-3-largeOpenAI3072Best all-rounder
voyage-3Anthropic/Voyage1024Strong for code + text
BGE-M3BAAI1024Open source, multilingual
Cohere Embed v4Cohere1024Multimodal (text + image)

Vector Databases

A vector database stores embeddings and enables fast similarity search (Approximate Nearest Neighbor — ANN).

Options Compared

SolutionTypeScalingBest For
PineconeManaged CloudAutomaticQuick start, production
WeaviateSelf-hosted / CloudHorizontalHybrid search (vector + keyword)
pgvectorPostgreSQL extensionVerticalExisting Postgres infra
QdrantSelf-hosted / CloudHorizontalPerformance, filtering
ChromaDBEmbeddedLocalPrototyping, small datasets

Semantic Search vs. Keyword Search

  • Keyword: "cancel subscription" only finds documents with those exact words
  • Semantic: Also finds "end contract," "stop membership," "terminate plan"

Hybrid approach: Most production systems combine both. Weaviate and Elasticsearch offer native hybrid search.

Practical tip: Start with pgvector if you already use PostgreSQL. ChromaDB is enough for prototypes. Switch to Pinecone or Qdrant when you need to scale beyond 1 million vectors.