Metrics & Alerting

Raw data alone doesn't help — you need actionable metrics and intelligent alerting that notifies you before a trend becomes a problem. OpenClaw provides a four-tier system for this.

The Four Core Metrics for AI Agents

1. Latency (Response Time)

P50 / P95 / P99 — Percentile-based latency per agent
Time-to-First-Token — How quickly does the agent respond?
End-to-End Latency — Total duration including tool calls and retrieval

2. Token Usage (Consumption)

Input Tokens — Prompt size per request
Output Tokens — Response length per request
Context Window Utilization — How full is the context window?
Cost per Interaction — Cost per agent interaction in EUR

3. Error Rate

LLM Errors — Rate limits, timeouts, API errors
Agent Errors — Wrong tool calls, loop detection, stuck states
Guardrail Violations — Outputs caught by the guardrail system
Hallucination Rate — Hallucinations detected through fact-checking

4. Quality Score

User Satisfaction — Feedback-based (thumbs up/down, NPS)
Task Completion Rate — Successfully completed tasks
Alignment Score — Conformity with defined guidelines

Configuring Threshold Alerts

# openclaw-alerts.yml
alerts:
  - name: high-latency
    metric: p95_latency
    threshold: 5000ms
    window: 5m
    severity: warning
    channels: [slack, email]

  - name: error-spike
    metric: error_rate
    threshold: 5%
    window: 10m
    severity: critical
    channels: [slack, pagerduty]

  - name: cost-overrun
    metric: daily_cost
    threshold: 500 EUR
    window: 24h
    severity: warning
    channels: [email]

  - name: alignment-drift
    metric: alignment_score
    threshold: "<0.85"
    window: 1h
    severity: critical
    channels: [slack, pagerduty, email]

Notification Channels

Channel	Use Case	Latency
Slack	Team notifications	Seconds
Email	Management reports, summaries	Minutes
PagerDuty	Critical alerts, on-call	Seconds
Webhook	Custom integrations	Seconds
Microsoft Teams	Enterprise environments	Seconds

Defining Custom Metrics

Use the SDK to track your own metrics:

oc.metric("customer_sentiment", value=0.82, tags={"agent": "support-v2"})
oc.metric("retrieval_relevance", value=0.91, tags={"index": "knowledge-base"})
oc.metric("compliance_check_passed", value=1, tags={"check": "pii-scan"})

Alert Escalation

OpenClaw supports multi-level escalation:

Warning — Slack message to the team (5 minutes)
Critical — PagerDuty + Slack + email (immediately)
Emergency — Auto-shutdown of affected agent + escalation to management

Best Practice: Start with loose thresholds and tighten them gradually. Too many alerts lead to alert fatigue — then even critical warnings get ignored.