Lesson 2 of 6·11 min read

Chains & Retrieval

Chains are the heart of LangChain. They connect multiple processing steps into a pipeline — from input through retrieval and prompt preparation to the LLM response. Combined with retrieval, they create powerful RAG systems.

Sequential Chains

A sequential chain executes steps one after another. One step's output becomes the next step's input:

from langchain_core.runnables import RunnablePassthrough

chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | model
    | parser
)

Parallel Chains

With LCEL you can run chains in parallel and merge the results:

from langchain_core.runnables import RunnableParallel

parallel = RunnableParallel(
    summary=summary_chain,
    keywords=keyword_chain,
    sentiment=sentiment_chain
)
result = parallel.invoke({"text": document})

Document Loaders

LangChain provides loaders for various data sources:

LoaderData Source
PyPDFLoaderPDF files
CSVLoaderCSV files
WebBaseLoaderWeb pages
NotionDBLoaderNotion databases
S3FileLoaderAWS S3 buckets
GitLoaderGit repositories

Text Splitters

Documents must be split into chunks before loading into a vector database:

RecursiveCharacterTextSplitter

The default splitter — tries to split at natural boundaries (paragraphs, sentences, words):

from langchain_text_splitters import RecursiveCharacterTextSplitter

splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200,
    separators=["\n\n", "\n", " ", ""]
)
chunks = splitter.split_documents(documents)

Specialized Splitters

  • MarkdownHeaderTextSplitter: Splits at Markdown headers, preserves hierarchy
  • TokenTextSplitter: Splits by token count (more precise for LLM limits)
  • SemanticChunker: Splits based on semantic similarity

Vector Store Integration

LangChain integrates with all major vector databases:

from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings

vectorstore = Chroma.from_documents(
    documents=chunks,
    embedding=OpenAIEmbeddings()
)
retriever = vectorstore.as_retriever(search_kwargs={"k": 5})

Assembling a RAG Chain

rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | rag_prompt
    | model
    | StrOutputParser()
)

Practical tip: Chunk size and overlap are the most important RAG parameters. Start with chunk_size=1000 and overlap=200. Test systematically — small changes can drastically improve retrieval quality.