Capabilities that suddenly appear in large models but are absent in smaller ones — like multi-step reasoning, code generation, or following novel instructions.
Emergent abilities are LLM capabilities that don't show up in small models, then appear — sometimes sharply — once the model crosses a size or training threshold. Multi-step arithmetic, language translation between rare pairs, following novel multi-step instructions, basic code generation: these were all absent or near-random in models below a certain scale, then leaped to usable performance.
They matter because emergence is one of the most surprising features of modern LLMs and a big part of why scaling has driven progress. You couldn't predict from a 1B-parameter model that a 100B model would do arithmetic — but it does. This unpredictability is also why labs are cautious about scaling further; you can't rule out new emergent behaviors, including unwanted ones.
A concrete example: GPT-2 (1.5B) basically can't do multi-digit multiplication. GPT-3 (175B) often gets it right with chain-of-thought prompting. The same pattern shows up in dozens of capabilities — they're absent at small scale, present at large scale, with the threshold differing by task.
Debate: a 2023 Stanford paper argued some "emergence" was an artifact of how metrics are computed (sharp accuracy thresholds make smooth underlying improvement look like a sudden jump). But the practical observation — that big models can do things small ones can't — clearly holds. Related: scaling laws, frontier model, in-context learning.
We use cookies
Anonymous analytics help us improve the site. You can opt out anytime. Learn more