A CrewAI prototype in a notebook is the first step. But for production you need error handling, retries, cost control, testing, and professional deployment. This lesson shows how to make CrewAI production-ready.
from crewai import Crew
crew = Crew(
agents=[researcher, writer],
tasks=[research_task, writing_task],
max_rpm=10, # Rate limiting: max 10 requests/minute
max_tokens=50000, # Token limit per crew run
verbose=True
)
try:
result = crew.kickoff(inputs={"topic": "AI Trends 2026"})
except Exception as e:
logger.error(f"Crew error: {e}")
fallback_result = simple_chain.invoke(inputs)
resilient_agent = Agent(
role="Resilient Researcher",
goal="Complete research even with temporary failures",
backstory="...",
max_retry_limit=3, # Maximum 3 retries
max_iter=15, # Maximum 15 iterations
respect_context_window=True # Automatic context trimming
)
LLM costs can escalate quickly with CrewAI since multiple agents work in parallel:
| Strategy | Description | Impact |
|---|---|---|
| Model mixing | Agents with different models | High |
| Token limits | max_tokens per crew/agent | Medium |
| RPM limits | Rate limiting for API calls | Medium |
| Iteration limits | Limit max_iter per agent | High |
| Caching | Cache repeated queries | Medium |
# Expensive model only for complex tasks
analyst = Agent(
role="Senior Analyst",
llm=ChatOpenAI(model="gpt-4o"), # Complex → expensive model
# ...
)
# Cheaper model for simpler tasks
formatter = Agent(
role="Report Formatter",
llm=ChatOpenAI(model="gpt-4o-mini"), # Simple → cheap model
# ...
)
def test_search_tool():
result = web_search.run("test query")
assert isinstance(result, str)
assert len(result) > 0
def test_database_tool():
result = db_query.run("SELECT COUNT(*) FROM users")
assert "count" in result.lower()
def test_research_crew():
crew = Crew(
agents=[test_researcher],
tasks=[test_task],
process=Process.sequential
)
result = crew.kickoff(inputs={"topic": "Test Topic"})
assert result is not None
assert len(result.raw) > 100
assert result.token_usage.total_tokens < 10000
FROM python:3.12-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
EXPOSE 8000
CMD ["uvicorn", "api:app", "--host", "0.0.0.0", "--port", "8000"]
from fastapi import FastAPI, BackgroundTasks
from pydantic import BaseModel
app = FastAPI()
class CrewRequest(BaseModel):
topic: str
output_format: str = "markdown"
@app.post("/run-crew")
async def run_crew(request: CrewRequest, background_tasks: BackgroundTasks):
job_id = str(uuid4())
background_tasks.add_task(execute_crew, job_id, request)
return {"job_id": job_id, "status": "started"}
Practical tip: Set a token budget per crew run from the start. Without a budget, a crew with delegation loops can cost hundreds of dollars per run. Combine
max_tokens,max_iter, andmax_rpmfor comprehensive cost control.
Welche Strategie ist am effektivsten zur Kostenkontrolle bei CrewAI?