Multi-Agent Monitoring

Monitoring individual agents is one thing — but what happens when 10, 20, or 50 agents collaborate? Multi-agent systems produce emergent behavior that is not predictable from observing individual agents. OpenClaw was built precisely for this challenge.

Challenges of Multi-Agent Observability

Challenge	Single Agent	Multi-Agent
Tracing	Linear, single thread	Branching, parallel threads
Causality	Directly traceable	Indirect causal chains across agents
Errors	Localizable	Cascading errors across system
Performance	Single measurement	System-wide latency chains
Costs	Per agent	Interaction costs between agents
Compliance	Per agent	System-wide compliance assessment

Core problem: In a multi-agent system, Agent A can make a decision that causes Agent B to take an action that puts Agent C into an error state. Without system-wide tracing, you'll never find the root cause.

Distributed Tracing for Agents

OpenClaw extends the OpenTelemetry model with agent-specific concepts:

Trace Propagation

# Orchestrator Agent
with oc.trace("orchestrator") as parent_trace:
    # Delegation to Research Agent
    research_result = await research_agent.run(
        query=user_query,
        trace_context=parent_trace.context  # Trace is propagated
    )

    # Delegation to Writing Agent
    draft = await writing_agent.run(
        input=research_result,
        trace_context=parent_trace.context
    )

    # Delegation to Review Agent
    final = await review_agent.run(
        draft=draft,
        trace_context=parent_trace.context
    )

Resulting Trace Structure

Trace: content-pipeline (tr_multi_001)
├── Span: orchestrator (total: 8,240ms)
│   ├── Span: research-agent (3,120ms)
│   │   ├── Span: web-search (1,800ms)
│   │   ├── Span: summarization (980ms)
│   │   └── Span: fact-check (340ms)
│   ├── Span: writing-agent (3,450ms)
│   │   ├── Span: outline-generation (450ms)
│   │   ├── Span: draft-writing (2,600ms)
│   │   └── Span: formatting (400ms)
│   └── Span: review-agent (1,670ms)
│       ├── Span: quality-check (890ms)
│       ├── Span: tone-check (380ms)
│       └── Span: compliance-check (400ms)

Agent Interaction Graphs

OpenClaw visualizes the communication structure between agents:

┌──────────────┐    query     ┌──────────────┐
│  Orchestrator ├────────────→│ Research Agent│
│              │←────────────┤              │
└──────┬───────┘   results    └──────────────┘
       │
       │ research + instructions
       ▼
┌──────────────┐  draft       ┌──────────────┐
│ Writing Agent ├────────────→│ Review Agent  │
│              │←────────────┤              │
└──────────────┘  feedback    └──────┬───────┘
                                     │
                              ┌──────▼───────┐
                              │ Compliance   │
                              │ Agent        │
                              └──────────────┘

The interaction graph dashboard shows:

Communication frequency — Which agents communicate how often?
Data volume — How many tokens flow between agents?
Latency edges — How long does communication take?
Error edges — Where do communication failures occur?

Bottleneck Identification

OpenClaw automatically detects bottlenecks in multi-agent systems:

Bottleneck Types

Latency bottleneck — One agent slows down the entire pipeline
Throughput bottleneck — One agent cannot handle the request load
Data bottleneck — Payloads between agents too large
Dependency bottleneck — Sequential dependencies instead of parallelization

Automatic Recommendations

Bottleneck detected: writing-agent
──────────────────────────────────
Type:           Latency bottleneck
Impact:         Increases end-to-end latency by 42%
Cause:          GPT-4o with 2,600ms avg. response time
Recommendation: Parallelize outline and draft phases
                or switch to faster model for outline
Savings:        ~1,200ms end-to-end (-15%)

Key takeaway: Multi-agent monitoring is not optional — it is the prerequisite for reliable multi-agent systems. Without system-wide observability, you're operating blind in a complex system.