Temperature

A sampling parameter that controls randomness in LLM output — 0 is deterministic and "safe", higher values make output more diverse but more error-prone.

Temperature is a knob that controls how the LLM picks each next token. The model produces a probability distribution over all possible tokens. Temperature warps that distribution: at 0, you always pick the highest-probability token (deterministic, safe, repetitive). At 1, you sample from the original distribution (natural variety). Above 1, low-probability tokens get a boost (creative, weird, error-prone). It matters because the right temperature depends on the task. For structured output (JSON extraction, code, SQL), use temperature 0 or 0.1 — you want consistent, repeatable answers. For brainstorming, creative writing, or generating multiple candidates, use 0.7-1.0 — you want variety. For poetry or extreme creativity, push higher. A concrete example: ask a model to translate a legal contract clause. Temperature 0 gives you the same translation every time, ideal for review and version-control. Ask it to write five Twitter hooks for a product launch. Temperature 0.8 gives you five distinct angles instead of five rephrasings of the same hook. Note: temperature 0 doesn't actually guarantee identical outputs across calls — providers sometimes use slightly different sampling internally. For true determinism, use the seed parameter where supported. Related: top-p, top-k, sampling, decoding.