Google's New TPUs: Two Chips for the Agent Era, Not Just a Faster Nvidia Clone

Everyone and their mother is buying every Nvidia H100 and B200 they can get their hands on. Google? They’ve been doing their own thing with custom Tensor Processing Units (TPUs) for years, and they’re not stopping now.

Last year we got the seventh-gen Ironwood TPU. Now the eighth-gen is here, but it’s not just a faster version of the same chip. Google split it into two distinct flavors: the TPU 8t for training and the TPU 8i for inference.

This is a smart move. Training a frontier model is a very different beast from running it in production. Training is about raw throughput, shoving terabytes of data through the model as fast as possible. Inference is about latency, getting a response back to the user without them getting bored and closing the tab. Trying to optimize one chip for both is a compromise.

Google’s justification is that the “agent era” is fundamentally different from the AI systems we’ve been running. I’ve been skeptical of “agent” hype since it became the buzzword du jour, but I think there’s some truth here. If agents are actually going to do multi-step tasks, they’re going to need a lot more inference compute than a simple chatbot. That puts different pressure on the hardware.

The TPU 8t is designed to cut training time from months to weeks. That’s not just a nice-to-have. When you’re spending millions on a single training run, shaving off a month is real money. The TPU 8i, meanwhile, is supposed to handle the inference workload more efficiently, which matters when you’re serving millions of agentic queries a day.

I’m not going to pretend I’ve benchmarked these things. Google hasn’t released raw specs yet, and even if they had, comparing TPUs to Nvidia’s lineup is apples to oranges because Google’s chips are tied to their own cloud infrastructure. But the direction is right. Specialization is how you win in hardware, and splitting training and inference is a natural evolution.

What I’m curious about is whether this will actually matter to anyone outside of Google. Their TPUs are only available through Google Cloud, and while the pricing is competitive, most serious AI labs are already locked into Nvidia’s ecosystem. The switching cost is huge.

Still, I like that Google is thinking about this differently instead of just chasing Nvidia’s benchmarks. The agent era might be overhyped, but the hardware problem is real. Two chips for two jobs makes a lot more sense than one chip trying to do both poorly.

Google’s New TPUs: Two Chips for the Agent Era, Not Just a Faster Nvidia Clone

Comments (0)