Retrieval-Augmented Generation (RAG) combines the language capabilities of Large Language Models with external knowledge. Instead of training the model on all information, relevant documents are retrieved at runtime and provided as context.
LLMs have three fundamental limitations:
RAG solves all three problems by providing relevant documents at runtime.
| Criterion | RAG | Fine-Tuning |
|---|---|---|
| Freshness | Real-time updates possible | Retraining needed |
| Cost | Low (infrastructure) | High (GPU training) |
| Hallucinations | Significantly reduced (source-based) | Still possible |
| Setup effort | Medium (build pipeline) | High (prepare data, train) |
| Best for | Factual knowledge, documents | Style, format, domain language |
Practical tip: RAG is the fastest way to make company knowledge AI-accessible. In 80% of enterprise use cases, RAG is the better choice over fine-tuning — cheaper, more current, and more controllable.
The following lessons dive deep into every building block of the RAG architecture, from embeddings through chunking to the finished pipeline.
Was löst RAG im Vergleich zu einem reinen LLM?