Lesson 5 of 6·9 min read

Log Analysis & Debugging

When an agent exhibits unexpected behavior, you need to find the root cause fast. OpenClaw provides a Trace Explorer with step-by-step replay — you see exactly what the agent thought and decided at every step.

Trace Explorer

The Trace Explorer is the centerpiece of the debugging workflow:

Waterfall View

Shows each span chronologically with timing:

[12:04:01.000] Trace Start: order-processing-agent
[12:04:01.012] ├── intent-classification       12ms   ✅
[12:04:01.024] ├── order-lookup                 89ms   ✅
[12:04:01.113] ├── inventory-check              45ms   ✅
[12:04:01.158] ├── price-calculation            23ms   ✅
[12:04:01.181] ├── llm-response-generation    1,203ms  ⚠️ (slow)
[12:04:02.384] ├── guardrail-check             140ms   ❌ (blocked)
[12:04:02.524] └── fallback-response             8ms   ✅

Prompt/Response Inspection

For each LLM call, you can inspect:

  • System Prompt — What instructions did the agent have?
  • User Input — What was the input?
  • Context — What documents/data were in the context?
  • Raw Response — What did the LLM respond?
  • Parsed Output — How did the agent interpret the response?
  • Token Count — Input/output/total with costs

Step-by-Step Replay

The replay function lets you trace an agent interaction step by step:

  1. Click a trace in the Explorer
  2. Select "Replay" in the toolbar
  3. Navigate forward/backward through each span
  4. See the agent's state at each point in time (memory, context, decision)

Error Root Cause Analysis

OpenClaw categorizes errors automatically:

Error TypeDescriptionCommon Cause
LLM TimeoutAPI response not timelyOverload, large prompts
Rate LimitAPI limit reachedToo many parallel requests
HallucinationFact-check failedInsufficient context
Guardrail BlockOutput blocked by policyToxic/unsafe content
Tool FailureExternal tool call failedAPI down, wrong parameters
Loop DetectedAgent in infinite loopMissing exit condition
Alignment DriftScore below thresholdPrompt degradation over time

Automatic Correlation

OpenClaw correlates errors automatically:

  • Temporally: Which errors occur in clusters?
  • Causally: Which span triggered the error?
  • Cross-agent: Does the error affect multiple agents?

Debugging Workflow

The recommended debugging process:

  1. Receive alert — OpenClaw reports anomalous behavior
  2. Identify trace — Find affected traces via filters
  3. Analyze waterfall — Where in the flow does the problem occur?
  4. Inspect prompt — What does the agent see? What does the LLM respond?
  5. Determine root cause — Context issue? Prompt issue? Tool issue?
  6. Deploy fix — Adjust prompt, fix tool, update guardrail

Practical Tip: Use the bookmark function to save interesting traces. Over time, you'll build a library of typical failure patterns that helps new team members during onboarding.