Chains are the heart of LangChain. They connect multiple processing steps into a pipeline — from input through retrieval and prompt preparation to the LLM response. Combined with retrieval, they create powerful RAG systems.
A sequential chain executes steps one after another. One step's output becomes the next step's input:
from langchain_core.runnables import RunnablePassthrough
chain = (
{"context": retriever, "question": RunnablePassthrough()}
| prompt
| model
| parser
)
With LCEL you can run chains in parallel and merge the results:
from langchain_core.runnables import RunnableParallel
parallel = RunnableParallel(
summary=summary_chain,
keywords=keyword_chain,
sentiment=sentiment_chain
)
result = parallel.invoke({"text": document})
LangChain provides loaders for various data sources:
| Loader | Data Source |
|---|---|
| PyPDFLoader | PDF files |
| CSVLoader | CSV files |
| WebBaseLoader | Web pages |
| NotionDBLoader | Notion databases |
| S3FileLoader | AWS S3 buckets |
| GitLoader | Git repositories |
Documents must be split into chunks before loading into a vector database:
The default splitter — tries to split at natural boundaries (paragraphs, sentences, words):
from langchain_text_splitters import RecursiveCharacterTextSplitter
splitter = RecursiveCharacterTextSplitter(
chunk_size=1000,
chunk_overlap=200,
separators=["\n\n", "\n", " ", ""]
)
chunks = splitter.split_documents(documents)
LangChain integrates with all major vector databases:
from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings
vectorstore = Chroma.from_documents(
documents=chunks,
embedding=OpenAIEmbeddings()
)
retriever = vectorstore.as_retriever(search_kwargs={"k": 5})
rag_chain = (
{"context": retriever | format_docs, "question": RunnablePassthrough()}
| rag_prompt
| model
| StrOutputParser()
)
Practical tip: Chunk size and overlap are the most important RAG parameters. Start with chunk_size=1000 and overlap=200. Test systematically — small changes can drastically improve retrieval quality.