Google ships four Gemma 4 variants simultaneously, covering everything from edge to cloud:
- E2B (Effective 2B): edge-optimized, designed to run on phones and Raspberry Pi-class hardware
- E4B (Effective 4B): same edge focus, one tier up in capability
- 26B MoE (Mixture of Experts): balanced speed/quality, actual active params well under 26B
- 31B Dense: flagship, highest quality
The 31B Dense ranks #3 on Arena AI's open-source leaderboard; the 26B MoE ranks #6. Google's framing: "outcompetes models 20x its size."
Key technical highlights:
- 256K context: matches mainstream commercial closed models, freeing RAG and long-document workflows from context limits
- Native multimodal input: vision + audio without external encoders
- Native agent workflow support: function calling, multi-step reasoning, tool calls are baked in during training, not bolted on via fine-tuning
- Apache 2.0: same permissive license as Gemma 3, safe for commercial deployment
For developers, especially in the Chinese-language sphere where Gemma has historically been a fine-tuning starting point: 4th gen now spans 2B–31B, keeps the loose license, and covers edge to cloud. For self-hosted inference teams (vLLM, llama.cpp, and Ollama all ship day-one support), the optionality is bigger than the previous three generations combined. Combined with last week's DeepSeek V4 and Zhipu GLM-5 open-source moves, the 2026 open-source model landscape is approaching commercial closed-model density.