LoRA vs fine-tuning vs RAG: which solves which problem

When a default LLM doesn't do what you want, you have three real options: RAG (give it relevant context at runtime), prompt engineering with examples (better instructions in the prompt), or fine-tuning (modify the model's weights). LoRA is a specific kind of fine-tuning that's much cheaper than full fine-tuning. Most teams reach for fine-tuning when RAG would have been the right answer. Most teams reach for full fine-tuning when LoRA would have sufficed.

The three approaches at a glance

RAG (Retrieval-Augmented Generation) — fetch relevant documents at runtime, include them in the prompt, model answers based on them. The model itself doesn't change. Good for: knowledge questions, dynamic information, citing sources.

Fine-tuning (full or LoRA) — adjust model weights so it behaves differently. The model's behavior changes permanently. Good for: style, format, specific task patterns.

Prompt engineering — better instructions and examples in the prompt. No weight changes, no retrieval. Good for: most things, surprisingly.

What problem does each solve

"The model doesn't know about my company's products" → RAG. Fetch the product docs at runtime; the model uses them to answer.

"The model uses formal language; I want casual" → Prompt engineering first. "Respond in a casual, friendly tone" works most of the time. Fine-tuning if prompt-only doesn't get there.

"The model doesn't follow my exact JSON schema" → Prompt engineering with examples first. Most modern models do well with schema instructions. Fine-tuning if you need 99% reliability and prompt-only gets you 95%.

"The model doesn't know specialized medical terminology" → RAG with medical sources. Fine-tuning is overkill unless you're operating at very large scale.

"The model is too slow for our use case" → Smaller model + fine-tuning to make it match larger model's quality on your specific tasks. This is a real fine-tuning use case.

"The model says things we don't want it to" → Prompt engineering and output filtering first. Fine-tuning to internalize your brand voice and policies if your product is high-volume.

What is LoRA

LoRA (Low-Rank Adaptation) is a fine-tuning technique that updates only a small number of parameters by inserting trainable "adapter" matrices into the model. It produces a small file (~10-100MB) that modifies the base model's behavior when applied.

Advantages over full fine-tuning:

10-100x cheaper to train
Much faster to train (hours vs days)
Smaller artifacts (megabytes vs gigabytes)
Multiple LoRAs can be swapped in/out on the same base model
Less likely to catastrophically forget the original capabilities

Disadvantages:

Slightly less powerful than full fine-tuning
Requires the base model to be available (you can't ship just the LoRA)
Sometimes doesn't capture deep behavioral changes

For 90% of fine-tuning needs in 2026, LoRA is the right choice.

When RAG beats fine-tuning

Information is in documents you already have or can collect
Information changes (current events, prices, policies update over time)
You need citations / verifiability
You need to add new information without retraining
You don't have thousands of training examples

RAG is also faster to build (days, not weeks) and easier to update.

When fine-tuning beats RAG

You need consistent output format (always strict JSON, always with these tags)
You need a specific style or voice that's hard to describe in prompt
You need to internalize implicit knowledge from many examples
Latency matters and adding RAG context is too slow
Cost matters and the system prompt + retrieved chunks is too expensive per query

Classic fine-tuning use case: tone-matching for a brand. "Write like Apple's marketing copy" — RAG can supply examples but fine-tuning bakes the style in.

When you actually need both

Many production systems use both. Example: a customer support agent fine-tuned for your brand voice + RAG over your help docs. The fine-tuning ensures consistent voice; the RAG ensures fresh and accurate information.

The classic mistake

Teams hear "fine-tune" and immediately want to do it. Fine-tuning sounds technical and impressive. The reality:

80% of teams who attempt fine-tuning could have solved their problem with better prompting and RAG
15% genuinely need fine-tuning but should use LoRA
5% need full fine-tuning

Before fine-tuning, ask: have I really exhausted prompt engineering? Have I tried RAG? Do I have hundreds of high-quality training examples (if no, you can't fine-tune well)?

Cost reality

RAG: cheap to build (days). Inference cost is API + retrieval (usually under $5/1k queries).
LoRA fine-tuning: $100-1000 for a one-time training run. Much cheaper inference if self-hosted.
Full fine-tuning: $5,000-50,000+ per training run on a meaningful model. Plus serving infrastructure.
Prompt engineering: free.

For most products, prompt engineering then RAG covers needs. Save fine-tuning for proven cases where the alternatives demonstrably fall short.

When NOT to use fine-tuning

You don't have evaluation data showing prompt + RAG isn't enough
You have under 500 high-quality examples (not enough to fine-tune well)
The information you want the model to know changes regularly
You don't have engineering capacity to maintain fine-tuned models
Your provider is going to release a new base model in 6 months that's better than your fine-tune

When NOT to use RAG

The information is small enough to fit in context (< 100k tokens total)
The information is constant and could be in the system prompt
Retrieval would add too much latency
You actually need the model to change behavior, not just have new info

Decision tree

New information / dynamic data: RAG
New style / behavior / format: prompt engineering first, fine-tune if needed
Style + specific format + at scale: LoRA fine-tuning
Need to update knowledge base often: RAG
Brand voice for high-volume product: LoRA fine-tuning + RAG for facts
Compliance / specific terminology: RAG with curated sources

Next steps

Identify your actual problem: missing info, wrong style, inconsistent format?
Try prompt engineering first; measure how far you get
For info problems, build basic RAG; measure improvement
For behavior problems, only fine-tune if prompt + RAG don't suffice
If fine-tuning, start with LoRA; consider full fine-tuning only as upgrade