"Should we fine-tune the model?" is one of the most expensive questions a founder can ask. In 2026, the answer is almost always "no, build retrieval first." Here's the framework.
The default: retrieval-augmented generation (RAG)
For 90% of business AI use cases — support, search, knowledge, document Q&A — a well-built RAG pipeline beats a fine-tuned model on accuracy, freshness, and cost. Fine-tuning teaches a model style; retrieval gives it facts. Most of the time, you need facts.
When fine-tuning actually helps
- You need a very specific output format (legal clauses, code in a niche DSL).
- You need to compress an expensive prompt into a cheaper, faster small model.
- You have 10K+ high-quality examples and the workflow is stable.
The hidden cost of fine-tuning
Every model upgrade — and there's a major one every 4–6 months — invalidates your fine-tune. You're now on a treadmill of re-training, re-evaluating, and re-deploying. Most teams don't budget for this.
Our default stack
GPT-4o or Claude as the reasoning model, pgvector or Pinecone for retrieval, a small reranker, and a careful evaluation harness. Fine-tune only when retrieval has demonstrably hit its ceiling.
