Monitoring & Observability

A multi-agent system without monitoring is like a car without a dashboard — you don't know if it's working until it's too late. Observability goes beyond simple logging: you need to understand what each agent does, how long it takes, and what it costs.

The Three Pillars of Observability

Pillar	What Is Captured	Tools
Logs	What happened? (Textual records)	n8n Execution Log, Loki
Metrics	How much? How fast? (Numbers over time)	Prometheus, Grafana
Traces	What path did the request take? (End-to-end path)	OpenTelemetry, Jaeger

Execution Logging in n8n

Structured Logging per Agent

Implement a consistent log format for all agents:

{
  "timestamp": "2026-02-20T14:30:00Z",
  "pipeline_id": "abc-123",
  "agent": "researcher",
  "action": "execute",
  "status": "completed",
  "duration_ms": 4523,
  "input_tokens": 250,
  "output_tokens": 1200,
  "model": "gpt-4o",
  "cost_usd": 0.0185,
  "metadata": { "sources_found": 5, "confidence": 87 }
}

Log Levels for Multi-Agent Systems

Level	Usage	Example
DEBUG	Agent input/output (development only)	Full prompt and response
INFO	Successful agent execution	"Researcher completed in 4.5s"
WARN	Retry or fallback triggered	"Writer retry 2/3 after timeout"
ERROR	Agent failure, DLQ entry	"Reviewer failed: invalid JSON"
FATAL	Pipeline aborted	"Circuit breaker open for all agents"

Performance Metrics

Key Performance Indicators (KPIs)

Metric	Description	Target
Agent latency (p50/p95/p99)	How long does an agent take?	p95 < 10s
Pipeline latency	End-to-end duration of entire pipeline	< 30s
Success rate	Proportion of successful executions	> 99%
Retry rate	How often are retries needed?	< 5%
Fallback rate	How often does the fallback kick in?	< 1%
Token consumption	Input + output tokens per pipeline	Budget-dependent

Prometheus Metrics (Example)

# Agent latency
agent_execution_duration_seconds{agent="researcher", status="success"} 4.523

# Token consumption
agent_tokens_total{agent="writer", type="input"} 250
agent_tokens_total{agent="writer", type="output"} 1200

# Error counter
agent_errors_total{agent="reviewer", error_type="timeout"} 3

Cost Tracking per Agent

Cost transparency is critical in multi-agent systems — each agent consumes tokens.

Cost Dashboard

Agent	Model	Avg. Tokens/Run	Cost/Run	Runs/Day	Cost/Day
Researcher	GPT-4o	1,500	$0.023	500	$11.50
Writer	GPT-4o	2,000	$0.030	500	$15.00
Reviewer	GPT-4o-mini	800	$0.002	500	$1.00
Total					$27.50

Cost Optimization

Model tiering: Simple agents use cheaper models (GPT-4o-mini, Claude Haiku)
Caching: Cache identical requests (Redis, 5 min TTL)
Token limits: Cap maximum tokens per agent
Batch processing: Bundle requests instead of processing individually

OpenTelemetry Integration

For end-to-end tracing across all agents:

Pipeline Start
  └── Orchestrator (span: 28.5s)
       ├── Researcher Agent (span: 4.5s)
       │    ├── LLM Call (span: 3.8s) [model: gpt-4o, tokens: 1500]
       │    └── DB Write (span: 0.2s)
       ├── Writer Agent (span: 8.2s)
       │    ├── DB Read (span: 0.1s)
       │    ├── LLM Call (span: 7.5s) [model: gpt-4o, tokens: 2000]
       │    └── DB Write (span: 0.3s)
       └── Reviewer Agent (span: 3.1s)
            ├── DB Read (span: 0.1s)
            └── LLM Call (span: 2.8s) [model: gpt-4o-mini, tokens: 800]

Practical tip: Start with three metrics: agent latency, success rate, and cost per pipeline. These three alone reveal 80% of problems. Add OpenTelemetry tracing when you have more than 5 agents and cross-pipeline debugging becomes necessary.