Lesson 5 of 6·10 min read

Monitoring & Observability

A multi-agent system without monitoring is like a car without a dashboard — you don't know if it's working until it's too late. Observability goes beyond simple logging: you need to understand what each agent does, how long it takes, and what it costs.

The Three Pillars of Observability

PillarWhat Is CapturedTools
LogsWhat happened? (Textual records)n8n Execution Log, Loki
MetricsHow much? How fast? (Numbers over time)Prometheus, Grafana
TracesWhat path did the request take? (End-to-end path)OpenTelemetry, Jaeger

Execution Logging in n8n

Structured Logging per Agent

Implement a consistent log format for all agents:

{
  "timestamp": "2026-02-20T14:30:00Z",
  "pipeline_id": "abc-123",
  "agent": "researcher",
  "action": "execute",
  "status": "completed",
  "duration_ms": 4523,
  "input_tokens": 250,
  "output_tokens": 1200,
  "model": "gpt-4o",
  "cost_usd": 0.0185,
  "metadata": { "sources_found": 5, "confidence": 87 }
}

Log Levels for Multi-Agent Systems

LevelUsageExample
DEBUGAgent input/output (development only)Full prompt and response
INFOSuccessful agent execution"Researcher completed in 4.5s"
WARNRetry or fallback triggered"Writer retry 2/3 after timeout"
ERRORAgent failure, DLQ entry"Reviewer failed: invalid JSON"
FATALPipeline aborted"Circuit breaker open for all agents"

Performance Metrics

Key Performance Indicators (KPIs)

MetricDescriptionTarget
Agent latency (p50/p95/p99)How long does an agent take?p95 < 10s
Pipeline latencyEnd-to-end duration of entire pipeline< 30s
Success rateProportion of successful executions> 99%
Retry rateHow often are retries needed?< 5%
Fallback rateHow often does the fallback kick in?< 1%
Token consumptionInput + output tokens per pipelineBudget-dependent

Prometheus Metrics (Example)

# Agent latency
agent_execution_duration_seconds{agent="researcher", status="success"} 4.523

# Token consumption
agent_tokens_total{agent="writer", type="input"} 250
agent_tokens_total{agent="writer", type="output"} 1200

# Error counter
agent_errors_total{agent="reviewer", error_type="timeout"} 3

Cost Tracking per Agent

Cost transparency is critical in multi-agent systems — each agent consumes tokens.

Cost Dashboard

AgentModelAvg. Tokens/RunCost/RunRuns/DayCost/Day
ResearcherGPT-4o1,500$0.023500$11.50
WriterGPT-4o2,000$0.030500$15.00
ReviewerGPT-4o-mini800$0.002500$1.00
Total$27.50

Cost Optimization

  • Model tiering: Simple agents use cheaper models (GPT-4o-mini, Claude Haiku)
  • Caching: Cache identical requests (Redis, 5 min TTL)
  • Token limits: Cap maximum tokens per agent
  • Batch processing: Bundle requests instead of processing individually

OpenTelemetry Integration

For end-to-end tracing across all agents:

Pipeline Start
  └── Orchestrator (span: 28.5s)
       ├── Researcher Agent (span: 4.5s)
       │    ├── LLM Call (span: 3.8s) [model: gpt-4o, tokens: 1500]
       │    └── DB Write (span: 0.2s)
       ├── Writer Agent (span: 8.2s)
       │    ├── DB Read (span: 0.1s)
       │    ├── LLM Call (span: 7.5s) [model: gpt-4o, tokens: 2000]
       │    └── DB Write (span: 0.3s)
       └── Reviewer Agent (span: 3.1s)
            ├── DB Read (span: 0.1s)
            └── LLM Call (span: 2.8s) [model: gpt-4o-mini, tokens: 800]

Practical tip: Start with three metrics: agent latency, success rate, and cost per pipeline. These three alone reveal 80% of problems. Add OpenTelemetry tracing when you have more than 5 agents and cross-pipeline debugging becomes necessary.