Lesson 2 of 6·11 min read

Embedding Strategies

The quality of your RAG system stands and falls with the quality of your embeddings. The right model, the right chunking strategy, and thoughtful metadata enrichment make the difference between "sometimes finds something" and "always finds the right thing."

Embedding Models

Commercial Models

ModelProviderDimensionsStrengths
text-embedding-3-largeOpenAI3072Highest quality, expensive
text-embedding-3-smallOpenAI1536Good price-performance ratio
embed-v4.0Cohere1024Multilingual, compressible
voyage-3Voyage AI1024Specialized for code and legal

Open-Source Models

ModelDimensionsStrengths
BGE-large-en-v1.51024Top MTEB benchmark
E5-mistral-7b-instruct4096Instruction-based
GTE-large1024Alibaba, multilingual
nomic-embed-text-v1.5768Compact, efficient

Model Selection

Criteria:
1. Language → Multilingual model for DE/EN?
2. Domain → Specialized model (code, legal, medical)?
3. Budget → Commercial vs. open-source?
4. Latency → Smaller models = faster
5. Quality → Check benchmark results (MTEB)

Chunking Strategies

Fixed-Size Chunking

# Simple but not optimal
chunks = split_text(text, chunk_size=1000, overlap=200)

Semantic Chunking

# Splits at semantic boundaries
from langchain_experimental.text_splitter import SemanticChunker

chunker = SemanticChunker(
    embeddings=OpenAIEmbeddings(),
    breakpoint_threshold_type="percentile",
    breakpoint_threshold_amount=95
)
chunks = chunker.split_text(document)

Document-Structure-Aware Chunking

StrategyDescriptionWhen to use
Markdown headerSplits at headers, preserves hierarchyDocumentation, wiki
HTML sectionsSplits at HTML elementsWeb content
Paragraph-basedSplits at paragraphsProse texts
Code-awareSplits at functions/classesSource code
Sliding windowFixed size with overlapFallback / default

Chunk Size Optimization

Too small (< 200 tokens):
  ✗ Context loss — individual sentences without connection
  ✗ More chunks = higher retrieval costs

Too large (> 2000 tokens):
  ✗ Noise — irrelevant information in the chunk
  ✗ Lower retrieval precision

Optimal (300-800 tokens):
  ✓ Enough context for comprehensibility
  ✓ Focused enough for precise retrieval

Metadata Enrichment

Chunks without metadata are like books without a table of contents. Metadata dramatically improves retrieval:

chunk = {
    "text": "The new GDPR amendment affects...",
    "metadata": {
        "source": "compliance/gdpr-update-2026.pdf",
        "page": 12,
        "section": "Changes 2026",
        "category": "compliance",
        "date": "2026-01-15",
        "author": "Legal Team",
        "language": "en",
        "keywords": ["GDPR", "data protection", "compliance"]
    }
}

Metadata Filtering in Search

results = vectorstore.similarity_search(
    query="GDPR amendments",
    filter={"category": "compliance", "date": {"$gte": "2026-01-01"}},
    k=5
)

Hybrid Search

Combines vector search (semantic) with keyword search (BM25) for better results:

Query: "GDPR Article 15 right of access"

Vector search: Finds semantically similar texts
BM25 search:   Finds exact keyword matches

Hybrid (RRF):  Combines both rankings → best results

Practical tip: Always test at least 3 chunking strategies with your real data. The "right" strategy depends heavily on your document type. Invest in metadata enrichment — it's the biggest lever for retrieval quality after chunking strategy.