Google Releases Gemma 4: Four Variants, 256K Context, Apache 2.0

Google ships four Gemma 4 variants simultaneously, covering everything from edge to cloud:

E2B (Effective 2B): edge-optimized, designed to run on phones and Raspberry Pi-class hardware
E4B (Effective 4B): same edge focus, one tier up in capability
26B MoE (Mixture of Experts): balanced speed/quality, actual active params well under 26B
31B Dense: flagship, highest quality

The 31B Dense ranks #3 on Arena AI's open-source leaderboard; the 26B MoE ranks #6. Google's framing: "outcompetes models 20x its size."

Key technical highlights:

256K context: matches mainstream commercial closed models, freeing RAG and long-document workflows from context limits
Native multimodal input: vision + audio without external encoders
Native agent workflow support: function calling, multi-step reasoning, tool calls are baked in during training, not bolted on via fine-tuning
Apache 2.0: same permissive license as Gemma 3, safe for commercial deployment

For developers, especially in the Chinese-language sphere where Gemma has historically been a fine-tuning starting point: 4th gen now spans 2B–31B, keeps the loose license, and covers edge to cloud. For self-hosted inference teams (vLLM, llama.cpp, and Ollama all ship day-one support), the optionality is bigger than the previous three generations combined. Combined with last week's DeepSeek V4 and Zhipu GLM-5 open-source moves, the 2026 open-source model landscape is approaching commercial closed-model density.