Image generation

Producing images from text prompts (text-to-image) or other inputs — handled by diffusion models like Stable Diffusion, DALL-E, Midjourney, Flux, and Imagen.

Image generation is the task of producing an image from a text prompt (text-to-image), an image plus modifications (image-to-image, inpainting), or other inputs. The dominant technical approach is diffusion models, which iteratively refine noise into a coherent image guided by the prompt. It mattered as a category-defining moment for AI in 2022 — DALL-E 2, Midjourney, and Stable Diffusion arrived almost simultaneously and made it possible for anyone to generate near-photorealistic or stylized images from a sentence. The visual creative tools market has been reshaped: graphic design, marketing, concept art, illustration, advertising, e-commerce, and social media all now routinely use AI-generated imagery. A concrete example: "a Tang dynasty palace at sunset, oil painting style, dramatic clouds" → Midjourney or Flux produces several beautiful options in seconds. "Replace the sky in this photo with a stormy night sky" → SD with inpainting handles it. "This product photo, but with a blue background instead of white" → Adobe Firefly or similar handles it. Key models: Stable Diffusion family (open-source), DALL-E 3 (OpenAI), Midjourney (closed, web/Discord-only), Imagen (Google), Flux (Black Forest Labs, mix of open and commercial), Adobe Firefly. For controllability and customization, SD + ControlNet + LoRAs is unmatched. For top-end aesthetic quality, Midjourney and Flux Pro often win. Related: diffusion model, Stable Diffusion family, ControlNet, multi-modal.