DeepSeek (family)

DeepSeek's open-weight LLM family — DeepSeek V2/V3 (efficient MoE), DeepSeek R1 (open-weight reasoning model rivaling o1), DeepSeek-Coder, DeepSeek-VL.

The DeepSeek model family is the open-weight lineup from the Chinese lab DeepSeek. Notable releases: DeepSeek-V2 (2024, an efficient MoE that drew attention for performance-per-parameter), DeepSeek-V3 (Dec 2024, 671B parameter MoE with 37B active per forward pass, matching GPT-4o on many benchmarks), DeepSeek-R1 (Jan 2025, the first openly-released reasoning model rivaling OpenAI o1), DeepSeek-Coder (specialized for programming), DeepSeek-VL (vision-language). The family matters because it shifted industry expectations almost overnight in early 2025. DeepSeek-V3 demonstrated frontier-class performance from a non-US lab on a much smaller training budget than expected. DeepSeek-R1 then released a reasoning model — open-weight, with full chain-of-thought visible — that matched o1 on math and coding evals. The release was so impactful that it caused a measurable drop in Nvidia's stock and sparked debate about the value of US-style massive compute scaling. Distinctive technical innovations in the family: Multi-head Latent Attention (MLA, an efficient KV cache compression), large MoE architectures with many small experts, and aggressive open release of training papers and detailed technique notes. Licensing is permissive (MIT-style for many releases) — fully usable commercially, with weights and full inference code published. The family is widely deployed in self-hosted setups, used as a base for fine-tunes, and integrated into many Chinese AI products. Related: DeepSeek (company), Mixture of Experts, MLA, open-source, R1.