Production Multi-Agent System

Theory is important — but now let's build a complete multi-agent system. In this chapter, you'll create a research-analyze-report pipeline with quality gates, human review, and everything needed for production.

The Pipeline Overview

Trigger (Webhook/Schedule)
    │
    ▼
┌──────────────┐     ┌──────────────┐     ┌──────────────┐
│  Research     │────▶│  Analyze     │────▶│  Report      │
│  Agent        │     │  Agent       │     │  Agent       │
└──────────────┘     └──────────────┘     └──────────────┘
    │                     │                     │
    ▼                     ▼                     ▼
Quality Gate 1       Quality Gate 2       Quality Gate 3
(≥ 3 sources?)       (Confidence ≥ 80?)    (Score ≥ 85?)
    │                     │                     │
    ▼                     ▼                     ▼
  Pass/Retry           Pass/Retry         Pass/Human Review

Step 1: Research Agent

The Research Agent collects information from multiple sources.

n8n Workflow Configuration

Node	Type	Configuration
Trigger	Webhook	POST /pipeline/start
Research Prompt	Set Node	System prompt + topic from trigger
LLM Call	OpenAI / Anthropic	model: gpt-4o, max_tokens: 2000
Parse Output	Function	JSON validation + schema check
Quality Gate	IF Node	findings.length ≥ 3 AND confidence ≥ 70
Save State	PostgreSQL	INSERT INTO agent_state

Quality Gate 1: Research Completeness

{
  "conditions": {
    "all": [
      { "field": "findings_count", "operator": "gte", "value": 3 },
      { "field": "confidence", "operator": "gte", "value": 70 },
      { "field": "knowledge_gaps", "operator": "lte_length", "value": 2 }
    ]
  },
  "on_fail": "retry_with_expanded_scope",
  "max_retries": 2
}

Step 2: Analyze Agent

The Analyze Agent processes research results and extracts key insights.

Prompt Structure

System: You are an analysis specialist. Based on the research data:
1. Identify the 3-5 most important findings
2. Evaluate trends and patterns
3. Create a SWOT analysis if applicable
4. Provide action recommendations with priority (high/medium/low)

Input: {{ $json.research_findings }}
Output format: JSON with { insights: [], trends: [], recommendations: [] }

Quality Gate 2: Analysis Depth

Criterion	Threshold	Action on Failure
Insights found	≥ 3	Retry with hint
Confidence score	≥ 80	Retry with more context
Recommendations	≥ 1 per insight	Retry with explicit instruction
JSON validation	Valid schema	Immediate retry

Step 3: Report Agent

The Report Agent creates the final report from research and analysis.

Template Integration

# {{ topic }} — Analysis Report

**Created:** {{ date }}
**Confidence:** {{ overall_confidence }}%
**Sources:** {{ sources_count }}

## Executive Summary
{{ executive_summary }}

## Key Insights
{{ insights_formatted }}

## Action Recommendations
{{ recommendations_table }}

## Appendix: Source List
{{ sources_list }}

Human Review Integration

Not every report should be auto-published:

Report Agent Output
    │
    ▼
┌──────────────┐
│ Quality Gate  │
│ Score ≥ 85?   │
└──────┬───────┘
       │
  ┌────┴────┐
  ▼         ▼
 YES        NO
  │         │
  ▼         ▼
Auto-      Slack message
Publish    to reviewer
           │
           ▼
        Human Review
        (Approve/Edit/Reject)

Slack Integration for Review

Action	Workflow
Approve	Report is published
Edit	Report goes back to Report Agent with feedback
Reject	Pipeline is stopped, DLQ entry

Production Checklist

Before taking your multi-agent pipeline live:

Area	Checklist
Error Handling	Retries configured, Fallback agents defined, Circuit breaker active
Monitoring	Execution logging, Latency metrics, Cost tracking
Quality Gates	At least 1 gate per agent, Human review for final output
Security	API keys in credentials (not in workflow), Rate limits set
Scaling	Concurrency limits per agent, Queue for load spikes
Documentation	Agent roles documented, Input/output contracts defined

Practical tip: Build the pipeline incrementally: First only the Research Agent with quality gate. Then add the Analyze Agent. Then the Report Agent. Each stage is tested and verified individually before the next one is added. Plan 2 weeks for a robust production deploy.