Integrating ElevenLabs into your existing infrastructure requires more than just API calls. From SDKs through WebSocket streaming to caching strategies — here you'll learn best practices for production-ready integrations.
npm install elevenlabs
import { ElevenLabsClient } from 'elevenlabs'
const client = new ElevenLabsClient({
apiKey: process.env.ELEVENLABS_API_KEY,
})
// Text-to-Speech
const audioStream = await client.textToSpeech.convert('voice-id', {
text: 'Hello from the Node.js SDK!',
model_id: 'eleven_multilingual_v2',
})
// List voices
const voices = await client.voices.getAll()
pip install elevenlabs
from elevenlabs.client import ElevenLabs
client = ElevenLabs(api_key="sk_...")
# Text-to-Speech
audio = client.text_to_speech.convert(
voice_id="21m00Tcm4TlvDq8ikWAM",
text="Hello from the Python SDK!",
model_id="eleven_multilingual_v2"
)
# Save audio
with open("output.mp3", "wb") as f:
for chunk in audio:
f.write(chunk)
For real-time applications, connect via WebSocket for input streaming (send text chunk by chunk) and output streaming (receive audio chunk by chunk):
const ws = new WebSocket(
`wss://api.elevenlabs.io/v1/text-to-speech/${voiceId}/stream-input?model_id=eleven_turbo_v2_5`
)
ws.onopen = () => {
// Initialization
ws.send(JSON.stringify({
text: ' ',
xi_api_key: process.env.ELEVENLABS_API_KEY,
voice_settings: { stability: 0.5, similarity_boost: 0.75 },
}))
// Send text in chunks
ws.send(JSON.stringify({ text: 'This is the first sentence. ' }))
ws.send(JSON.stringify({ text: 'And here comes the second. ' }))
// End stream
ws.send(JSON.stringify({ text: '' }))
}
ws.onmessage = (event) => {
const data = JSON.parse(event.data)
if (data.audio) {
// Process base64-encoded audio chunk
const audioChunk = Buffer.from(data.audio, 'base64')
// Forward to audio player or file
}
}
For long texts (books, articles), use webhooks instead of polling:
// Start job
const job = await client.textToSpeech.convertAsStream('voice-id', {
text: longArticleText,
model_id: 'eleven_multilingual_v2',
webhook_url: 'https://your-server.com/api/elevenlabs-webhook',
})
// Receive webhook
app.post('/api/elevenlabs-webhook', (req, res) => {
const { status, audio_url, duration } = req.body
if (status === 'completed') {
// Download and process audio
downloadAndStore(audio_url)
}
res.status(200).send('OK')
})
Request → Cache Check → [Hit] → CDN delivers audio
→ [Miss] → ElevenLabs API → Generate audio
→ Store in CDN → Deliver audio
async function getOrCreateAudio(text: string, voiceId: string) {
const cacheKey = createHash('sha256')
.update(`${voiceId}:${text}`)
.digest('hex')
// 1. Check cache
const cached = await r2.get(`audio/${cacheKey}.mp3`)
if (cached) return cached.url
// 2. Generate new
const audio = await elevenLabsClient.textToSpeech.convert(voiceId, {
text,
model_id: 'eleven_multilingual_v2',
})
// 3. Store in CDN
await r2.put(`audio/${cacheKey}.mp3`, audio, {
httpMetadata: { contentType: 'audio/mpeg' },
})
return `https://cdn.example.com/audio/${cacheKey}.mp3`
}
| Scenario | Without Cache | With Cache | Savings |
|---|---|---|---|
| 10,000 identical requests/day | 10,000 API calls | 1 API call + CDN | 99.99% |
| 100 products x 1,000 views | 100,000 API calls | 100 API calls | 99.9% |
| IVR with 5 standard prompts | 5 x call volume | 5 API calls | ~100% |
Practical tip: Implement a caching layer from the start. Without caching, your ElevenLabs costs will explode at scale. The rule of thumb: Any audio played twice should be cached.
Welchen Vorteil bietet WebSocket Input-Streaming gegenüber REST-Aufrufen?