Retrieval-Augmented Generation (RAG) connects LLMs with your own data — knowledge bases, documents, databases. The AI SDK offers native support for embedding generation and seamless integration with vector stores.
Embeddings are numerical representations of text in a high-dimensional vector space. Semantically similar texts have similar vectors — this enables semantic search.
import { embed, embedMany } from 'ai'
import { openai } from '@ai-sdk/openai'
// Single embedding
const { embedding } = await embed({
model: openai.embedding('text-embedding-3-large'),
value: 'What is Kubernetes?',
})
// Batch embeddings
const { embeddings } = await embedMany({
model: openai.embedding('text-embedding-3-large'),
values: ['Document 1...', 'Document 2...', 'Document 3...'],
})
| Model | Dimensions | Strength |
|---|---|---|
| text-embedding-3-large (OpenAI) | 3,072 | Best quality, versatile |
| text-embedding-3-small (OpenAI) | 1,536 | Good price-performance ratio |
| voyage-3-large (Anthropic/Voyage) | 1,024 | Strong for code and technical texts |
| multilingual-e5-large (open source) | 1,024 | Multilingual, self-hosted possible |
The AI SDK integrates with all major vector databases:
1. Index documents:
import { embedMany } from 'ai'
import { openai } from '@ai-sdk/openai'
import { supabase } from '@/lib/supabase'
async function indexDocuments(documents: string[]) {
const { embeddings } = await embedMany({
model: openai.embedding('text-embedding-3-large'),
values: documents,
})
for (let i = 0; i < documents.length; i++) {
await supabase.from('documents').insert({
content: documents[i],
embedding: embeddings[i],
})
}
}
2. Retrieve relevant documents:
async function findRelevantDocs(query: string) {
const { embedding } = await embed({
model: openai.embedding('text-embedding-3-large'),
value: query,
})
const { data } = await supabase.rpc('match_documents', {
query_embedding: embedding,
match_threshold: 0.7,
match_count: 5,
})
return data
}
3. Pass context to LLM:
export async function POST(req: Request) {
const { messages } = await req.json()
const lastMessage = messages[messages.length - 1]
const relevantDocs = await findRelevantDocs(lastMessage.content)
const context = relevantDocs.map(d => d.content).join('\n\n')
const result = streamText({
model: openai('gpt-4.1'),
system: `Answer questions based on this context:\n\n${context}`,
messages,
})
return result.toDataStreamResponse()
}
LLMs have a limited context window (128K–2M tokens). Effective RAG must pack relevant information into this window — without overloading it.
| Strategy | Description | When |
|---|---|---|
| Top-K retrieval | The K most similar documents | Standard |
| Reranking | Re-sort results with a reranker model | Higher quality |
| Chunking | Split documents into smaller pieces | Long documents |
| Hierarchical | Search coarsely first, then in detail | Large knowledge bases |
| Hybrid search | Combine vector + keyword + filters | Complex queries |
RAG reality: The quality of your RAG system depends 80% on data preparation (chunking, cleaning, metadata) and only 20% on the model. Invest in your data pipeline.