Voice agents only become truly powerful when they can make phone calls. Integration with Twilio, SIP providers, and traditional telephony connects the world of AI with the world of telecommunications.
Twilio is the de facto standard for programmable telephony:
Phone Network → Twilio → WebSocket → Your Server → ElevenLabs
(Caller) (SIP) (Audio Stream) (Logic) (ASR + TTS)
// 1. Twilio webhook receives incoming call
app.post('/api/twilio/incoming', (req, res) => {
const twiml = new twilio.twiml.VoiceResponse()
// WebSocket connection for real-time audio
const connect = twiml.connect()
connect.stream({
url: 'wss://your-server.com/api/voice-agent/stream',
parameters: {
callSid: req.body.CallSid,
callerNumber: req.body.From,
},
})
res.type('text/xml').send(twiml.toString())
})
// 2. WebSocket handler for audio streaming
wss.on('connection', async (ws, req) => {
const agent = await initVoiceAgent(req.params)
ws.on('message', async (data) => {
const message = JSON.parse(data)
if (message.event === 'media') {
// Audio from caller → ASR → LLM → TTS → Audio back
const audioChunk = Buffer.from(message.media.payload, 'base64')
const responseAudio = await agent.processAudio(audioChunk)
ws.send(JSON.stringify({
event: 'media',
streamSid: message.streamSid,
media: { payload: responseAudio.toString('base64') },
}))
}
})
})
Large companies use SIP (Session Initiation Protocol) instead of Twilio:
| Aspect | Twilio | SIP Direct |
|---|---|---|
| Setup | Minutes | Hours–days |
| Cost | Higher (markup) | Lower (direct) |
| Control | Limited | Full |
| Scaling | Automatic | Manual |
| Compliance | Twilio cloud | Own infrastructure |
SIP Trunk (e.g., sipgate, Plivo, Deutsche Telekom)
↓
SIP Server (Asterisk / FreeSWITCH / Kamailio)
↓
Media Server (process audio streams)
↓
Voice Agent (ASR → LLM → TTS)
Customer calls → Greeting → Intent recognition → Processing → Conclusion
Best Practices:
Trigger (CRM, schedule) → Voice agent calls customer → Conversation → Log result
Use Cases:
Despite voice AI, DTMF (Dual-Tone Multi-Frequency) remains relevant:
// DTMF detection in Twilio
ws.on('message', (data) => {
const message = JSON.parse(data)
if (message.event === 'dtmf') {
const digit = message.dtmf.digit // '0'-'9', '*', '#'
handleDtmfInput(digit)
}
})
function handleDtmfInput(digit: string) {
switch (digit) {
case '1': transferToSales(); break
case '2': transferToSupport(); break
case '0': transferToHuman(); break
case '*': repeatLastMessage(); break
}
}
Practical tip: Start with Twilio — setup takes minutes instead of weeks. For enterprise customers with existing SIP infrastructure, offer SIP integration as a second phase. Most customers start with Twilio and only migrate to SIP at > 100,000 calls/month.