Getting an AI application running locally is the easy part. Running it reliably, cost-efficiently, and at scale in production requires specific strategies for Vercel and the AI SDK.
AI applications on Vercel require special attention:
// next.config.ts
export default {
experimental: {
serverActions: {
bodySizeLimit: '4mb', // For image uploads
},
},
}
| Variable | Description | Required |
|---|---|---|
OPENAI_API_KEY | OpenAI API key | Yes (with OpenAI) |
ANTHROPIC_API_KEY | Anthropic API key | Yes (with Anthropic) |
AI_PROVIDER | Default provider | Optional |
AI_MAX_TOKENS | Token limit per request | Recommended |
AI_RATE_LIMIT | Requests per minute | Recommended |
Vercel offers an AI Gateway as proxy between your application and AI providers:
Vercel Edge Runtime executes code at globally distributed edge locations — minimal latency:
export const runtime = 'edge'
export async function POST(req: Request) {
const { messages } = await req.json()
const result = streamText({
model: openai('gpt-4.1'),
messages,
})
return result.toDataStreamResponse()
}
Edge Runtime is suitable for:
Prefer Node.js Runtime for:
import { Ratelimit } from '@upstash/ratelimit'
import { Redis } from '@upstash/redis'
const ratelimit = new Ratelimit({
redis: Redis.fromEnv(),
limiter: Ratelimit.slidingWindow(10, '1 m'), // 10 requests/minute
})
export async function POST(req: Request) {
const userId = getUserId(req)
const { success } = await ratelimit.limit(userId)
if (!success) {
return new Response('Rate limit exceeded', { status: 429 })
}
// ... AI SDK logic
}
| Level | Limit | Purpose |
|---|---|---|
| Free tier | 20 requests/day | Testing and onboarding |
| Pro tier | 100 requests/hour | Normal usage |
| Enterprise | Custom | By agreement |
| Per-model | Variable | Limit expensive models more |
Every AI call consumes tokens — and tokens cost money:
const result = await generateText({
model: openai('gpt-4.1'),
prompt: '...',
})
// Log token consumption
console.log({
inputTokens: result.usage.promptTokens,
outputTokens: result.usage.completionTokens,
totalTokens: result.usage.totalTokens,
estimatedCost: calculateCost(result.usage),
})
| Model | Input | Output |
|---|---|---|
| GPT-4.1 | $2.00 | $8.00 |
| GPT-4.1 mini | $0.40 | $1.60 |
| Claude Sonnet 4 | $3.00 | $15.00 |
| Gemini 2.5 Flash | $0.15 | $0.60 |
maxTokens to avoid endless responsesconst model = abTest('ai-model-test', userId) === 'variant_a'
? openai('gpt-4.1-mini')
: anthropic('claude-sonnet-4-20250514')
const result = streamText({ model, messages })
What to A/B test:
Track these metrics in production:
Production rule: Deploy AI features behind feature flags. Start with 5% of users, measure costs and quality, and scale gradually to 100%. An uncontrolled rollout can blow up your API bill in hours.
Wann sollte man Edge Runtime statt Node.js Runtime für AI-Endpoints verwenden?