Production CrewAI

A CrewAI prototype in a notebook is the first step. But for production you need error handling, retries, cost control, testing, and professional deployment. This lesson shows how to make CrewAI production-ready.

Error Handling

Crew-Level Error Handling

from crewai import Crew

crew = Crew(
    agents=[researcher, writer],
    tasks=[research_task, writing_task],
    max_rpm=10,           # Rate limiting: max 10 requests/minute
    max_tokens=50000,     # Token limit per crew run
    verbose=True
)

try:
    result = crew.kickoff(inputs={"topic": "AI Trends 2026"})
except Exception as e:
    logger.error(f"Crew error: {e}")
    fallback_result = simple_chain.invoke(inputs)

Agent-Level Retries

resilient_agent = Agent(
    role="Resilient Researcher",
    goal="Complete research even with temporary failures",
    backstory="...",
    max_retry_limit=3,       # Maximum 3 retries
    max_iter=15,             # Maximum 15 iterations
    respect_context_window=True  # Automatic context trimming
)

Cost Control

LLM costs can escalate quickly with CrewAI since multiple agents work in parallel:

Cost Strategies

Strategy	Description	Impact
Model mixing	Agents with different models	High
Token limits	max_tokens per crew/agent	Medium
RPM limits	Rate limiting for API calls	Medium
Iteration limits	Limit max_iter per agent	High
Caching	Cache repeated queries	Medium

Model Mixing Example

# Expensive model only for complex tasks
analyst = Agent(
    role="Senior Analyst",
    llm=ChatOpenAI(model="gpt-4o"),  # Complex → expensive model
    # ...
)

# Cheaper model for simpler tasks
formatter = Agent(
    role="Report Formatter",
    llm=ChatOpenAI(model="gpt-4o-mini"),  # Simple → cheap model
    # ...
)

Testing Strategies

Unit Tests for Tools

def test_search_tool():
    result = web_search.run("test query")
    assert isinstance(result, str)
    assert len(result) > 0

def test_database_tool():
    result = db_query.run("SELECT COUNT(*) FROM users")
    assert "count" in result.lower()

Integration Tests for Crews

def test_research_crew():
    crew = Crew(
        agents=[test_researcher],
        tasks=[test_task],
        process=Process.sequential
    )
    result = crew.kickoff(inputs={"topic": "Test Topic"})
    assert result is not None
    assert len(result.raw) > 100
    assert result.token_usage.total_tokens < 10000

Deployment with Docker

FROM python:3.12-slim

WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY . .

EXPOSE 8000
CMD ["uvicorn", "api:app", "--host", "0.0.0.0", "--port", "8000"]

FastAPI Wrapper

from fastapi import FastAPI, BackgroundTasks
from pydantic import BaseModel

app = FastAPI()

class CrewRequest(BaseModel):
    topic: str
    output_format: str = "markdown"

@app.post("/run-crew")
async def run_crew(request: CrewRequest, background_tasks: BackgroundTasks):
    job_id = str(uuid4())
    background_tasks.add_task(execute_crew, job_id, request)
    return {"job_id": job_id, "status": "started"}

Monitoring

Token usage: Track per agent, per task, per crew run
Latency: Measure duration of each task and identify bottlenecks
Success rate: How often does the crew deliver the expected result?
Costs: Monitor daily/monthly costs per crew

Practical tip: Set a token budget per crew run from the start. Without a budget, a crew with delegation loops can cost hundreds of dollars per run. Combine max_tokens, max_iter, and max_rpm for comprehensive cost control.