Standard RAG (Retrieve → Generate) works for simple cases. But complex scenarios require advanced patterns: HyDE, parent-child chunking, agentic RAG, and graph RAG.
The problem: The query "What are best practices for API security?" has a different embedding than a document describing those best practices.
Generate a hypothetical answer and use its embedding for search:
def hyde_retrieve(question: str, retriever, llm):
# 1. Generate hypothetical answer
hypothetical = llm.invoke(
f"Write a detailed answer to: {question}"
)
# 2. Use hypothetical answer's embedding for retrieval
results = retriever.invoke(hypothetical.content)
return results
| Scenario | HyDE helpful? |
|---|---|
| Factual questions | No — direct search suffices |
| Conceptual questions | Yes — better semantic matching |
| Questions with technical terms | Yes — hypothetical answer contains related terms |
| Short, vague questions | Yes — HyDE expands query context |
The dilemma: Small chunks deliver precise retrieval results but too little context. Large chunks deliver context but imprecise results.
# Parent chunks: Large sections (2000 tokens)
parent_splitter = RecursiveCharacterTextSplitter(chunk_size=2000)
parent_chunks = parent_splitter.split_documents(docs)
# Child chunks: Small sections (400 tokens)
child_splitter = RecursiveCharacterTextSplitter(chunk_size=400)
child_chunks = []
for parent in parent_chunks:
children = child_splitter.split_documents([parent])
for child in children:
child.metadata["parent_id"] = parent.metadata["id"]
child_chunks.extend(children)
# Retrieval: Search over child chunks
# Context: Deliver parent chunk to LLM
Search over: [Child 1] [Child 2] [Child 3] ← Precise matches
↓
Deliver to LLM: [────── Parent Chunk ──────] ← Full context
An agent dynamically decides on the retrieval strategy:
def agentic_rag(question: str):
# Agent decides: Which data source? How many hops? Filters?
plan = agent.plan(question)
if plan.needs_structured_data:
results = sql_retriever.invoke(plan.sql_query)
elif plan.needs_multiple_sources:
results = multi_source_retrieve(plan.sources, question)
else:
results = vector_retriever.invoke(question)
if plan.needs_verification:
results = verify_and_filter(results, question)
return generate_answer(question, results)
Question → Agent (Planner)
│
├── "Simple fact question" → Vector Retrieval → Answer
├── "SQL needed" → Text-to-SQL → DB Query → Answer
├── "Multiple sources" → Multi-Source Retrieval → Merge → Answer
└── "Not enough info" → Web Search → Answer
Graph RAG combines vector search with knowledge graphs for structured knowledge:
Documents → Entity Extraction → Knowledge Graph
↕
Vector Database
Query → Graph Traversal + Vector Search → Merged Context → LLM
| Aspect | Standard RAG | Graph RAG |
|---|---|---|
| Relationships | Implicit in chunks | Explicit in graph |
| Multi-hop | Difficult | Natural (graph traversal) |
| Aggregation | LLM must summarize | Graph queries aggregate |
| Transparency | Chunks as source | Entities and relations as source |
Different data types in different indices:
indices = {
"docs": vectorstore_docs, # Documentation
"code": vectorstore_code, # Source code
"tickets": vectorstore_tickets, # Support tickets
"faq": vectorstore_faq # FAQ
}
def smart_retrieve(question: str):
# Router decides which indices are relevant
relevant_indices = route_to_indices(question)
results = []
for index_name in relevant_indices:
results.extend(indices[index_name].similarity_search(question, k=3))
return rerank(results, question)
Practical tip: Only implement advanced patterns when standard RAG demonstrably isn't sufficient. Measure retrieval quality with metrics (see lesson 5) before introducing HyDE, Graph RAG, or agentic RAG. Each pattern increases complexity and cost.