Supervised fine-tuning (SFT)

Fine-tuning a base LLM on a dataset of (input, ideal-output) pairs so it learns to produce that style of response — the first step of post-training.

Supervised fine-tuning (SFT) takes a base LLM and trains it further on labeled examples — pairs of (prompt, ideal response) where humans wrote or curated the responses. The model adjusts its weights to make that style of response more likely. SFT is what turns a raw next-token-predictor into something that follows instructions, holds a conversation, or writes in a specific format. It matters because base models trained on internet text complete text but don't naturally answer questions, follow instructions, or refuse harmful requests. SFT is the first step of "post-training" — taking a base model like Llama 3 base and turning it into Llama 3 Instruct. After SFT typically comes preference optimization (RLHF or DPO) for further refinement. A concrete example: you want a customer-support model. You collect 10,000 examples of (customer question, ideal support reply) — written by your support team. SFT on these for 1-3 epochs and the model learns your tone, your products, your refund policy. The same recipe scales from "build me a domain expert" to "create a model that does exactly the JSON shape my pipeline needs". SFT can be full fine-tuning (update every weight, expensive) or parameter-efficient (LoRA, QLoRA — update tiny adapters, cheap). For most builders, LoRA-based SFT on 1-10k examples is the practical sweet spot. Related: fine-tuning, LoRA, RLHF, DPO, instruction tuning.