Google Drops Gemma 4 with Apache 2.0 License, Finally Listening to Developers

4 0 0

Google’s Gemini models have gotten impressively capable over the last year, but you’re still locked into Google’s ecosystem if you want to use them. The Gemma line was supposed to fix that, offering open-weight alternatives you could run yourself. But Gemma 3 launched over a year ago, and in AI years that’s practically ancient history.

Starting today, developers can finally get their hands on Gemma 4. It comes in four sizes, all optimized for local deployment. And here’s the part that actually matters: Google is ditching its custom Gemma license and switching to Apache 2.0. That’s a big deal.

I’ve heard plenty of grumbling from developers about the old Gemma license. It had restrictions that made it awkward for commercial use, and nobody wants to spend hours parsing legal text just to figure out if they can fine-tune a model for their startup. Apache 2.0 is straightforward, well-understood, and permissive. Google deserves credit for this move.

Four Sizes, Two Architectures

Gemma 4 comes in two flavors: Mixture of Experts (MoE) and Dense, each in two sizes. The 26B MoE variant is the speed demon—it only activates 3.8 billion of its 26 billion parameters during inference. That means it can crank out tokens per second far beyond what you’d expect from a model this size. If you’re building a real-time application where latency matters, this is probably the one you want.

The 31B Dense model takes the opposite approach. It activates all its parameters all the time, which makes it slower but more capable per parameter. Google expects developers to fine-tune this one for specific tasks where quality trumps speed.

Both the 26B MoE and 31B Dense can run unquantized in bfloat16 on a single 80GB Nvidia H100 GPU. Yes, that’s a $20,000 accelerator—not exactly “local” in the consumer sense. But if you quantize them down to lower precision, they’ll fit on consumer GPUs. I’d expect most hobbyists to be running the smaller variants anyway.

Google also claims significant latency improvements over previous Gemma models, which makes sense given the MoE architecture. The company seems to have focused hard on making these models feel snappy when running locally, rather than just dumping raw parameter counts and calling it a day.

What This Means for Developers

The switch to Apache 2.0 is the headline here, but the model improvements matter too. Gemma 4 feels like Google finally understanding what the open-weight community actually wants: capable models that you can run on your own hardware, modify, and deploy without worrying about licensing gotchas.

I’m curious to see how these stack up against Llama 4 and the latest Mistral models in real-world benchmarks. Google’s numbers always look good in their own testing, but third-party evaluations will tell the real story. Still, having another serious player in the open-weight space with a permissive license is good for everyone.

If you’ve been waiting for a reason to try Gemma, this is probably it. The old licensing friction is gone, and the models look genuinely competitive. I’ll be spinning up the 26B MoE variant this weekend to see how it handles some code generation tasks.

Comments (0)

Be the first to comment!