Google’s Gemini models have gotten seriously good over the past year, but you’re still locked into Google’s ecosystem if you use them. The Gemma open-weight line has always offered more flexibility, but Gemma 3 launched over a year ago and was starting to feel dated. Today, Google is rolling out Gemma 4 in four sizes optimized for local use, and they’ve finally listened to developer complaints about licensing by ditching the custom Gemma license in favor of Apache 2.0.
Like previous Gemma releases, these models are designed to run on local hardware. That’s a broad statement, but Google has specific targets in mind. The two larger variants—a 26B Mixture of Experts model and a 31B Dense model—are built to run unquantized in bfloat16 format on a single 80GB Nvidia H100 GPU. Sure, that’s a $20,000 accelerator, but it’s still local hardware, not a cloud API. If you quantize them to lower precision, they’ll fit on consumer GPUs, which is where things get interesting for developers without deep pockets.
Google claims they’ve also focused on reducing latency to make local processing genuinely useful. The 26B MoE model activates only 3.8 billion of its 26 billion parameters during inference, which gives it significantly higher tokens-per-second than similarly sized dense models. The 31B Dense variant prioritizes quality over speed, but Google expects developers to fine-tune it for specific use cases. I’d like to see independent benchmarks before getting too excited, but the architectural choices here are sensible.
The switch to Apache 2.0 is the biggest news for me. Google’s custom Gemma license always felt like unnecessary friction for developers who wanted to build commercial products. Apache 2.0 is well-understood, permissive, and doesn’t require legal review before shipping. This move signals that Google actually wants developers to adopt these models, not just window-shop.
Comments (0)
Login Log in to comment.
Be the first to comment!