When a default LLM doesn't do what you want, you have three real options: RAG (give it relevant context at runtime), prompt engineering with examples (better instructions in the prompt), or fine-tuning (modify the model's weights). LoRA is a specific kind of fine-tuning that's much cheaper than full fine-tuning. Most teams reach for fine-tuning when RAG would have been the right answer. Most teams reach for full fine-tuning when LoRA would have sufficed.
The three approaches at a glance
RAG (Retrieval-Augmented Generation) — fetch relevant documents at runtime, include them in the prompt, model answers based on them. The model itself doesn't change. Good for: knowledge questions, dynamic information, citing sources.
Fine-tuning (full or LoRA) — adjust model weights so it behaves differently. The model's behavior changes permanently. Good for: style, format, specific task patterns.
Prompt engineering — better instructions and examples in the prompt. No weight changes, no retrieval. Good for: most things, surprisingly.
What problem does each solve
"The model doesn't know about my company's products" → RAG. Fetch the product docs at runtime; the model uses them to answer.
"The model uses formal language; I want casual" → Prompt engineering first. "Respond in a casual, friendly tone" works most of the time. Fine-tuning if prompt-only doesn't get there.
"The model doesn't follow my exact JSON schema" → Prompt engineering with examples first. Most modern models do well with schema instructions. Fine-tuning if you need 99% reliability and prompt-only gets you 95%.
"The model doesn't know specialized medical terminology" → RAG with medical sources. Fine-tuning is overkill unless you're operating at very large scale.
"The model is too slow for our use case" → Smaller model + fine-tuning to make it match larger model's quality on your specific tasks. This is a real fine-tuning use case.
"The model says things we don't want it to" → Prompt engineering and output filtering first. Fine-tuning to internalize your brand voice and policies if your product is high-volume.
What is LoRA
LoRA (Low-Rank Adaptation) is a fine-tuning technique that updates only a small number of parameters by inserting trainable "adapter" matrices into the model. It produces a small file (~10-100MB) that modifies the base model's behavior when applied.
Advantages over full fine-tuning:
- 10-100x cheaper to train
- Much faster to train (hours vs days)
- Smaller artifacts (megabytes vs gigabytes)
- Multiple LoRAs can be swapped in/out on the same base model
- Less likely to catastrophically forget the original capabilities
Disadvantages:
- Slightly less powerful than full fine-tuning
- Requires the base model to be available (you can't ship just the LoRA)
- Sometimes doesn't capture deep behavioral changes
For 90% of fine-tuning needs in 2026, LoRA is the right choice.
When RAG beats fine-tuning
- Information is in documents you already have or can collect
- Information changes (current events, prices, policies update over time)
- You need citations / verifiability
- You need to add new information without retraining
- You don't have thousands of training examples
RAG is also faster to build (days, not weeks) and easier to update.
When fine-tuning beats RAG
- You need consistent output format (always strict JSON, always with these tags)
- You need a specific style or voice that's hard to describe in prompt
- You need to internalize implicit knowledge from many examples
- Latency matters and adding RAG context is too slow
- Cost matters and the system prompt + retrieved chunks is too expensive per query
Classic fine-tuning use case: tone-matching for a brand. "Write like Apple's marketing copy" — RAG can supply examples but fine-tuning bakes the style in.
When you actually need both
Many production systems use both. Example: a customer support agent fine-tuned for your brand voice + RAG over your help docs. The fine-tuning ensures consistent voice; the RAG ensures fresh and accurate information.
The classic mistake
Teams hear "fine-tune" and immediately want to do it. Fine-tuning sounds technical and impressive. The reality:
- 80% of teams who attempt fine-tuning could have solved their problem with better prompting and RAG
- 15% genuinely need fine-tuning but should use LoRA
- 5% need full fine-tuning
Before fine-tuning, ask: have I really exhausted prompt engineering? Have I tried RAG? Do I have hundreds of high-quality training examples (if no, you can't fine-tune well)?
Cost reality
- RAG: cheap to build (days). Inference cost is API + retrieval (usually under $5/1k queries).
- LoRA fine-tuning: $100-1000 for a one-time training run. Much cheaper inference if self-hosted.
- Full fine-tuning: $5,000-50,000+ per training run on a meaningful model. Plus serving infrastructure.
- Prompt engineering: free.
For most products, prompt engineering then RAG covers needs. Save fine-tuning for proven cases where the alternatives demonstrably fall short.
When NOT to use fine-tuning
- You don't have evaluation data showing prompt + RAG isn't enough
- You have under 500 high-quality examples (not enough to fine-tune well)
- The information you want the model to know changes regularly
- You don't have engineering capacity to maintain fine-tuned models
- Your provider is going to release a new base model in 6 months that's better than your fine-tune
When NOT to use RAG
- The information is small enough to fit in context (< 100k tokens total)
- The information is constant and could be in the system prompt
- Retrieval would add too much latency
- You actually need the model to change behavior, not just have new info
Decision tree
- New information / dynamic data: RAG
- New style / behavior / format: prompt engineering first, fine-tune if needed
- Style + specific format + at scale: LoRA fine-tuning
- Need to update knowledge base often: RAG
- Brand voice for high-volume product: LoRA fine-tuning + RAG for facts
- Compliance / specific terminology: RAG with curated sources
Next steps
- Identify your actual problem: missing info, wrong style, inconsistent format?
- Try prompt engineering first; measure how far you get
- For info problems, build basic RAG; measure improvement
- For behavior problems, only fine-tune if prompt + RAG don't suffice
- If fine-tuning, start with LoRA; consider full fine-tuning only as upgrade