Google's AI Overviews Still Lying 10% of the Time, Which Is a Lot of Lies

Google’s AI Overviews have been a headache since day one. Launched in 2024, the Gemini-powered search summary box at the top of results has been catching flak for spitting out nonsense. It’s gotten better, sure. But a new analysis from The New York Times suggests “better” still means one in ten answers is flat-out wrong.

The Times worked with a startup called Oumi, which is in the business of building AI models. They used a standard benchmark called SimpleQA — a list of over 4,000 questions with verifiable answers, originally released by OpenAI in 2024. Oumi started testing last year when Gemini 2.5 was still the top dog. Back then, AI Overviews scored 85% accuracy. After the Gemini 3 update, that number climbed to 91%.

91% sounds decent until you do the math. Google handles billions of searches every day. Even a 9% miss rate means tens of millions of incorrect answers getting served up daily. Hundreds of thousands per hour. That’s a lot of confidently delivered lies.

Now, I’ve been around long enough to know that benchmarks like SimpleQA have their limits. They test factual recall on well-defined questions, not the messy, ambiguous stuff people actually search for. Real-world accuracy is probably worse. But even if we take this as the best-case scenario, it’s not great for a product that’s supposed to be the front door to the internet.

What bothers me is the framing. Google keeps saying AI Overviews is improving, and technically it is. But the bar was on the floor. Going from 85% to 91% is progress, but it’s still failing one in ten times. If your GPS gave you wrong directions 10% of the time, you’d throw it out the window. Why should search be any different?

The other angle here is Oumi itself. They’re not exactly neutral — they’re an AI startup with skin in the game. The Times disclosed the relationship, but it’s worth noting that Oumi benefits from attention on AI accuracy. Doesn’t make the data wrong, but it’s a reminder to read the fine print.

I’d love to see Google run its own public, transparent benchmarks instead of relying on third parties. But that’s not how the game works. For now, we’re stuck with a search engine that lies to us millions of times a day and calls it progress.

Google’s AI Overviews Still Lying 10% of the Time, Which Is a Lot of Lies

Comments (0)