Cost-Benefit Analysis

Lesson 5 of 5·8 min read

Fine-tuning can pay off — or remain an expensive experiment. This guide helps you calculate Total Cost of Ownership (TCO) realistically and make the right decision.

When Fine-Tuning Pays Off

The Break-Even Formula

Monthly savings = (Cost_BaseModel × Requests) - (Cost_FT_Model × Requests)
Payback = One-time_FT_Cost / Monthly_Savings

Example:

GPT-4o: $5/1M output tokens × 10M tokens/month = $50/month
Fine-tuned GPT-4o-mini: $2/1M × 10M = $20/month + $100 training
Savings: $30/month → Break-even after 3.3 months ✅

When It Does NOT Pay Off

Less than 100,000 requests per month (too little volume)
Use case changes frequently (constant retraining)
Prompting/RAG already delivers 90%+ quality
No internal ML know-how available

Total Cost of Ownership (TCO)

One-Time Costs

Item	Managed (OpenAI)	Open Source
Data preparation	20–40 hours	20–40 hours
Training	$10–500	$5–200 (GPU)
Evaluation	10–20 hours	10–20 hours
Infrastructure setup	—	10–30 hours
Total	~$2,000–5,000	~$3,000–8,000

Ongoing Costs

Item	Managed	Self-hosted
Inference	$2–15/1M tokens	$500–3,000/month (GPU)
Monitoring	Included	5–10 hours/month
Retraining	$10–500/quarter	$5–200/quarter
Maintenance	—	10–20 hours/month

Check Alternatives First

Before starting fine-tuning, check cheaper alternatives:

1. Better Prompting

Cost: €0 (just time)
Potential: Often 80% of the desired improvement
Time: 1–2 days

2. RAG

Cost: $50–500/month (vector DB + embeddings)
Potential: Ideal for factual knowledge
Time: 1–2 weeks

3. Model Switch

Cost: Potentially lower
Potential: Newer models are often better than FT on older ones
Time: 1 day

4. Prompt Caching

Cost: 50–90% cheaper than standard API
Potential: Enormous for repetitive system prompts
Time: 1 hour

Decision Checklist

✅ Start fine-tuning when:

Prompting and RAG have been tested and aren't sufficient
At least 100 quality training examples available
Use case is stable (rarely changes)
Volume justifies the investment (> 100K requests/month)
Internal know-how or budget for external support available
Evaluation plan ready (metrics, baselines, test set)

❌ Avoid fine-tuning when:

Prompting delivers > 90% of desired quality
Use case changes frequently
Little training data available
No budget for ongoing maintenance

Practical tip: Create a simple table: Cost for 12 months (FT vs. prompting vs. RAG) × expected quality improvement. The decision usually becomes clear once you see the numbers.

📝

Quiz

Question 1 of 3

Ab welchem monatlichen Anfragevolumen lohnt sich Fine-Tuning typischerweise?

Previous lessonPrevious lesson Back to course