Diffusion Model

A generative model that creates images (or other data) by learning to reverse a step-by-step process of adding random noise.

A diffusion model is a type of generative model that learns to create new data — most famously images — by reversing a noise-adding process. During training, the model is shown clean images that have had random Gaussian noise progressively added until they become pure static. It learns to predict and remove that noise one step at a time. At inference, you start from pure random noise and run the model in reverse: step by step, it denoises the image into something coherent. Diffusion models are the engine behind almost every modern image generator you've heard of — Stable Diffusion, Midjourney, DALL·E 3, Google Imagen, and increasingly video tools like Sora and Runway. They replaced GANs as the dominant approach because they're more stable to train, produce more diverse outputs, and scale well with data and compute. A useful analogy: imagine taking a photograph and gradually shaking sand over it until you can't see anything. A diffusion model is trained to undo that shaking — given a sand-covered photo, predict what was underneath. Do this hundreds of times in a row and you can summon a photo from nothing but sand. To control what gets generated, the denoising steps are conditioned on a text prompt (via a text encoder like CLIP), so "a cat astronaut" steers the noise removal toward something matching that description. Related concepts worth looking up: latent diffusion (the trick that makes Stable Diffusion fast by working in a compressed space), DDPM and DDIM (the foundational sampling algorithms), classifier-free guidance, U-Net, and flow matching (a newer competing formulation).