Instruction Tuning

A fine-tuning technique that trains a language model on (instruction, response) pairs so it learns to follow natural-language commands instead of just predicting the next token.

Instruction tuning is a fine-tuning step where a pre-trained language model is further trained on a dataset of (instruction, desired response) pairs. The goal is to teach the model to behave like an assistant — to interpret a user's request and produce a useful answer — rather than just continue text statistically. This matters because raw pre-trained models (like the base GPT or LLaMA models) are good at predicting the next word but bad at following commands. If you ask a base model "Write a haiku about cats," it might respond with another writing prompt instead of an actual haiku. Instruction tuning is what turns a "text completer" into something that feels like ChatGPT or Claude. It's typically done before RLHF and is one of the cheaper, higher-leverage stages in the modern LLM training pipeline. A concrete example: Google's FLAN, OpenAI's InstructGPT, and Stanford's Alpaca all used instruction tuning. Alpaca took LLaMA and fine-tuned it on ~52,000 instruction-response pairs generated by GPT — and the result followed instructions dramatically better than the base model, despite a tiny training budget. The instructions cover everything from "Summarize this article" to "Translate to French" to "Explain like I'm five," so the model generalizes to unseen instruction types. Instruction tuning is usually supervised (sometimes called SFT — supervised fine-tuning), and the data can come from humans writing examples, from existing NLP datasets reformatted as instructions, or from a stronger model generating synthetic examples (distillation). Related concepts: fine-tuning, RLHF, SFT, FLAN, Alpaca, chat models, system prompts.