MODELS
Mixtral 8x7B
Mistral's classic open-source MoE — 47B total, 13B active per token.
Specs
- Context window
- 32,768
- Max output
- 4,096
- Modalities
- text
- Tool use
- —
- Vision
- —
- Streaming
- ✓
- License
- apache-2.0
- Released
- 2023-12-11
Pricing
Mixtral 8x7B (December 2023) is Mistral's first open-source mixture-of-experts model under Apache 2.0 — 8 experts of 7B each, 47B total parameters but only 13B active per token, giving 7B-class inference speed at 30B+ class quality. 32K context. The model that proved sparse MoE could be open-sourced cleanly. Successors: Mixtral 8x22B, Mistral Large family.
Editor's verdict
Architecturally important — every later open MoE (DeepSeek V3, Qwen MoE) inherits patterns Mixtral popularised. For new production builds, Llama 3.3 70B or Qwen 2.5 72B beats it on quality at similar serving cost; DeepSeek V3 destroys it on Chinese. Keep Mixtral on your radar as the canonical Apache-2.0 MoE if licence purity matters; otherwise newer is usually better.
Reviews
No reviews yet. Be the first.
Last updated: 2026-04-29