Lesson 4 of 5·11 min read

Telephony Integration

Voice agents only become truly powerful when they can make phone calls. Integration with Twilio, SIP providers, and traditional telephony connects the world of AI with the world of telecommunications.

Twilio Integration

Why Twilio?

Twilio is the de facto standard for programmable telephony:

  • Global phone numbers in 100+ countries
  • Programmable Voice API
  • WebSocket support for real-time audio
  • Reliability: 99.95% uptime SLA

Architecture: Twilio + ElevenLabs

Phone Network → Twilio → WebSocket → Your Server → ElevenLabs
  (Caller)     (SIP)   (Audio Stream)  (Logic)     (ASR + TTS)

Implementation

// 1. Twilio webhook receives incoming call
app.post('/api/twilio/incoming', (req, res) => {
  const twiml = new twilio.twiml.VoiceResponse()

  // WebSocket connection for real-time audio
  const connect = twiml.connect()
  connect.stream({
    url: 'wss://your-server.com/api/voice-agent/stream',
    parameters: {
      callSid: req.body.CallSid,
      callerNumber: req.body.From,
    },
  })

  res.type('text/xml').send(twiml.toString())
})

// 2. WebSocket handler for audio streaming
wss.on('connection', async (ws, req) => {
  const agent = await initVoiceAgent(req.params)

  ws.on('message', async (data) => {
    const message = JSON.parse(data)

    if (message.event === 'media') {
      // Audio from caller → ASR → LLM → TTS → Audio back
      const audioChunk = Buffer.from(message.media.payload, 'base64')
      const responseAudio = await agent.processAudio(audioChunk)

      ws.send(JSON.stringify({
        event: 'media',
        streamSid: message.streamSid,
        media: { payload: responseAudio.toString('base64') },
      }))
    }
  })
})

SIP Integration

For Enterprise Telephony

Large companies use SIP (Session Initiation Protocol) instead of Twilio:

AspectTwilioSIP Direct
SetupMinutesHours–days
CostHigher (markup)Lower (direct)
ControlLimitedFull
ScalingAutomaticManual
ComplianceTwilio cloudOwn infrastructure

SIP Trunk Configuration

SIP Trunk (e.g., sipgate, Plivo, Deutsche Telekom)
  ↓
SIP Server (Asterisk / FreeSWITCH / Kamailio)
  ↓
Media Server (process audio streams)
  ↓
Voice Agent (ASR → LLM → TTS)

Inbound & Outbound Call Handling

Inbound Calls

Customer calls → Greeting → Intent recognition → Processing → Conclusion

Best Practices:

  • Immediate answer: < 1 second to greeting
  • Context recognition: Phone number → load customer profile
  • Queue management: When overloaded: "I'll call you back in 5 minutes"

Outbound Calls

Trigger (CRM, schedule) → Voice agent calls customer → Conversation → Log result

Use Cases:

  • Appointment confirmation: "Hello Mrs. Mueller, your appointment tomorrow at 2 PM is set. Does that work?"
  • Payment reminder: "Your invoice No. 4711 has been open for 7 days..."
  • Survey: "How satisfied were you with our service? 1 to 5?"
  • Callback: "You had contacted us. How can I help you?"

Compliance for Outbound Calls

  • Consent: Prior customer agreement (opt-in)
  • Hours: Only weekdays 8 AM–8 PM (in Germany)
  • Identification: "This is an automated call from Company XY"
  • Opt-out: At any time: "Please don't call me again"

Call Recording

Why Record?

  • Quality assurance: Analyze conversations after the fact
  • Training: Successful conversations as training material
  • Compliance: Regulatory requirements (financial sector)
  • Dispute resolution: Proof in case of disputes

Privacy in Recording

  • Announcement: "This call is being recorded" (GDPR requirement)
  • Consent: Offer opt-out: "Press 1 if you don't want to be recorded"
  • Retention period: Defined deletion deadlines (e.g., 90 days)
  • Access control: Only authorized employees
  • Encryption: Store audio files encrypted

DTMF Handling

Touch-Tone Input

Despite voice AI, DTMF (Dual-Tone Multi-Frequency) remains relevant:

  • PIN entry: Credit card number, security code
  • Menu selection: Fallback when ASR doesn't work
  • Authentication: Enter numeric codes

Implementation

// DTMF detection in Twilio
ws.on('message', (data) => {
  const message = JSON.parse(data)

  if (message.event === 'dtmf') {
    const digit = message.dtmf.digit // '0'-'9', '*', '#'
    handleDtmfInput(digit)
  }
})

function handleDtmfInput(digit: string) {
  switch (digit) {
    case '1': transferToSales(); break
    case '2': transferToSupport(); break
    case '0': transferToHuman(); break
    case '*': repeatLastMessage(); break
  }
}

Practical tip: Start with Twilio — setup takes minutes instead of weeks. For enterprise customers with existing SIP infrastructure, offer SIP integration as a second phase. Most customers start with Twilio and only migrate to SIP at > 100,000 calls/month.