Lesson 1 of 6·10 min read

What is RAG?

Retrieval-Augmented Generation (RAG) combines the language capabilities of Large Language Models with external knowledge. Instead of training the model on all information, relevant documents are retrieved at runtime and provided as context.

Why RAG?

LLMs have three fundamental limitations:

  1. Knowledge cutoff: The model only knows data up to training time. It's blind to current information.
  2. Hallucinations: LLMs generate convincing-sounding but incorrect information — especially for niche knowledge.
  3. No company knowledge: Internal documents, processes, and policies are unknown to the model.

RAG solves all three problems by providing relevant documents at runtime.

RAG vs. Fine-Tuning

CriterionRAGFine-Tuning
FreshnessReal-time updates possibleRetraining needed
CostLow (infrastructure)High (GPU training)
HallucinationsSignificantly reduced (source-based)Still possible
Setup effortMedium (build pipeline)High (prepare data, train)
Best forFactual knowledge, documentsStyle, format, domain language

The RAG Architecture Overview

Ingestion Phase (Offline)

  1. Collect documents (PDFs, wikis, emails, databases)
  2. Extract and clean text
  3. Split into chunks
  4. Generate embeddings
  5. Store in vector database

Query Phase (Online)

  1. User asks a question
  2. Question is converted to an embedding
  3. Most similar chunks are retrieved from vector DB
  4. Chunks are passed as context to the LLM
  5. LLM generates an answer based on context

Practical tip: RAG is the fastest way to make company knowledge AI-accessible. In 80% of enterprise use cases, RAG is the better choice over fine-tuning — cheaper, more current, and more controllable.

The following lessons dive deep into every building block of the RAG architecture, from embeddings through chunking to the finished pipeline.

📝

Quiz

Question 1 of 3

Was löst RAG im Vergleich zu einem reinen LLM?