Chains & Retrieval

Chains are the heart of LangChain. They connect multiple processing steps into a pipeline — from input through retrieval and prompt preparation to the LLM response. Combined with retrieval, they create powerful RAG systems.

Sequential Chains

A sequential chain executes steps one after another. One step's output becomes the next step's input:

from langchain_core.runnables import RunnablePassthrough

chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | model
    | parser
)

Parallel Chains

With LCEL you can run chains in parallel and merge the results:

from langchain_core.runnables import RunnableParallel

parallel = RunnableParallel(
    summary=summary_chain,
    keywords=keyword_chain,
    sentiment=sentiment_chain
)
result = parallel.invoke({"text": document})

Document Loaders

LangChain provides loaders for various data sources:

Loader	Data Source
PyPDFLoader	PDF files
CSVLoader	CSV files
WebBaseLoader	Web pages
NotionDBLoader	Notion databases
S3FileLoader	AWS S3 buckets
GitLoader	Git repositories

Text Splitters

Documents must be split into chunks before loading into a vector database:

RecursiveCharacterTextSplitter

The default splitter — tries to split at natural boundaries (paragraphs, sentences, words):

from langchain_text_splitters import RecursiveCharacterTextSplitter

splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200,
    separators=["\n\n", "\n", " ", ""]
)
chunks = splitter.split_documents(documents)

Specialized Splitters

MarkdownHeaderTextSplitter: Splits at Markdown headers, preserves hierarchy
TokenTextSplitter: Splits by token count (more precise for LLM limits)
SemanticChunker: Splits based on semantic similarity

Vector Store Integration

LangChain integrates with all major vector databases:

from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings

vectorstore = Chroma.from_documents(
    documents=chunks,
    embedding=OpenAIEmbeddings()
)
retriever = vectorstore.as_retriever(search_kwargs={"k": 5})

Assembling a RAG Chain

rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | rag_prompt
    | model
    | StrOutputParser()
)

Practical tip: Chunk size and overlap are the most important RAG parameters. Start with chunk_size=1000 and overlap=200. Test systematically — small changes can drastically improve retrieval quality.