Lesson 5 of 5·9 min read

Integration into Existing Systems

Integrating ElevenLabs into your existing infrastructure requires more than just API calls. From SDKs through WebSocket streaming to caching strategies — here you'll learn best practices for production-ready integrations.

Node.js SDK

Installation and Setup

npm install elevenlabs
import { ElevenLabsClient } from 'elevenlabs'

const client = new ElevenLabsClient({
  apiKey: process.env.ELEVENLABS_API_KEY,
})

// Text-to-Speech
const audioStream = await client.textToSpeech.convert('voice-id', {
  text: 'Hello from the Node.js SDK!',
  model_id: 'eleven_multilingual_v2',
})

// List voices
const voices = await client.voices.getAll()

Python SDK

pip install elevenlabs
from elevenlabs.client import ElevenLabs

client = ElevenLabs(api_key="sk_...")

# Text-to-Speech
audio = client.text_to_speech.convert(
    voice_id="21m00Tcm4TlvDq8ikWAM",
    text="Hello from the Python SDK!",
    model_id="eleven_multilingual_v2"
)

# Save audio
with open("output.mp3", "wb") as f:
    for chunk in audio:
        f.write(chunk)

WebSocket Streaming

Bidirectional Streaming

For real-time applications, connect via WebSocket for input streaming (send text chunk by chunk) and output streaming (receive audio chunk by chunk):

const ws = new WebSocket(
  `wss://api.elevenlabs.io/v1/text-to-speech/${voiceId}/stream-input?model_id=eleven_turbo_v2_5`
)

ws.onopen = () => {
  // Initialization
  ws.send(JSON.stringify({
    text: ' ',
    xi_api_key: process.env.ELEVENLABS_API_KEY,
    voice_settings: { stability: 0.5, similarity_boost: 0.75 },
  }))

  // Send text in chunks
  ws.send(JSON.stringify({ text: 'This is the first sentence. ' }))
  ws.send(JSON.stringify({ text: 'And here comes the second. ' }))

  // End stream
  ws.send(JSON.stringify({ text: '' }))
}

ws.onmessage = (event) => {
  const data = JSON.parse(event.data)
  if (data.audio) {
    // Process base64-encoded audio chunk
    const audioChunk = Buffer.from(data.audio, 'base64')
    // Forward to audio player or file
  }
}

Benefits of Input Streaming

  • LLM + TTS pipeline: LLM generates token by token, each sentence is immediately sent to TTS
  • Latency: First audio bytes while the LLM is still generating
  • Naturalness: Pauses between sentences feel more natural

Webhook Callbacks

Asynchronous Processing

For long texts (books, articles), use webhooks instead of polling:

// Start job
const job = await client.textToSpeech.convertAsStream('voice-id', {
  text: longArticleText,
  model_id: 'eleven_multilingual_v2',
  webhook_url: 'https://your-server.com/api/elevenlabs-webhook',
})

// Receive webhook
app.post('/api/elevenlabs-webhook', (req, res) => {
  const { status, audio_url, duration } = req.body
  if (status === 'completed') {
    // Download and process audio
    downloadAndStore(audio_url)
  }
  res.status(200).send('OK')
})

CDN Audio Delivery

Caching Strategy for Generated Audio

Request → Cache Check → [Hit] → CDN delivers audio
                      → [Miss] → ElevenLabs API → Generate audio
                                → Store in CDN → Deliver audio

Implementation with Cloudflare R2

async function getOrCreateAudio(text: string, voiceId: string) {
  const cacheKey = createHash('sha256')
    .update(`${voiceId}:${text}`)
    .digest('hex')

  // 1. Check cache
  const cached = await r2.get(`audio/${cacheKey}.mp3`)
  if (cached) return cached.url

  // 2. Generate new
  const audio = await elevenLabsClient.textToSpeech.convert(voiceId, {
    text,
    model_id: 'eleven_multilingual_v2',
  })

  // 3. Store in CDN
  await r2.put(`audio/${cacheKey}.mp3`, audio, {
    httpMetadata: { contentType: 'audio/mpeg' },
  })

  return `https://cdn.example.com/audio/${cacheKey}.mp3`
}

When Caching Makes Sense

  • Static content: Greetings, menu announcements, FAQ answers
  • E-commerce: Product descriptions (change rarely)
  • E-learning: Generate course audio once, play often
  • IVR systems: Cache standard announcements, generate dynamic parts live

Cost Savings Through Caching

ScenarioWithout CacheWith CacheSavings
10,000 identical requests/day10,000 API calls1 API call + CDN99.99%
100 products x 1,000 views100,000 API calls100 API calls99.9%
IVR with 5 standard prompts5 x call volume5 API calls~100%

Practical tip: Implement a caching layer from the start. Without caching, your ElevenLabs costs will explode at scale. The rule of thumb: Any audio played twice should be cached.

📝

Quiz

Question 1 of 3

Welchen Vorteil bietet WebSocket Input-Streaming gegenüber REST-Aufrufen?