Google’s AI Overviews Is Wrong 10% of the Time — That’s Millions of Lies Per Hour

Google’s AI Overviews Is Wrong 10% of the Time — That’s Millions of Lies Per Hour

5 0 0

I’ve been using Google since the days when you could type “how to tie a tie” and get a list of links, not a robot confidently telling you the wrong way to do it. AI Overviews, Google’s Gemini-powered search assistant that sits at the top of results, has been a mess since launch in 2024. It’s gotten better, sure, but a new analysis from The New York Times suggests “better” isn’t the same as “good.”

The Times teamed up with a startup called Oumi to run the SimpleQA benchmark — a set of over 4,000 questions with verifiable answers, originally released by OpenAI in 2024. They fed these questions into AI Overviews to see how often it got things right. The result? About 90% accuracy. That sounds decent until you do the math.

Ninety percent means one in ten answers is wrong. For a service handling billions of queries daily, that’s hundreds of thousands of lies every minute. The Times extrapolated that to tens of millions of incorrect answers per day. Let that sink in: Google is serving up millions of wrong answers every single day, and that’s considered an improvement.

Oumi started testing when Gemini 2.5 was still the top model. Back then, accuracy was 85%. After the Gemini 3 update, it jumped to 91%. So yes, it’s getting better, but the bar was on the floor. A 6% improvement over a year is not exactly a victory lap. I’ve seen this pattern before with AI products: launch broken, fix some stuff, declare victory, and hope nobody looks too closely at the remaining failures.

The real issue isn’t the 90% — it’s the 10%. If you ask AI Overviews a question and it’s wrong, you might not know it. Unlike a link you can click and verify, the AI just presents its answer as fact. For trivial queries like “What’s the capital of France?” it’s fine. For medical advice, legal questions, or anything where a wrong answer has consequences, that 10% is terrifying.

Google has been playing catch-up since the AI search race started. Microsoft threw Bing into the ring with ChatGPT, and Google scrambled to respond. AI Overviews was rushed, and it shows. The company has made noise about improvements, but the data says it’s still a coin flip for certain types of questions.

I’m not saying AI search is doomed. I use it myself for quick lookups. But pretending that 90% accuracy is acceptable for a product that positions itself as an authoritative source is nonsense. If a human assistant got one in ten facts wrong, you’d fire them. Google gets a pass because it’s free and we’re used to it.

The Times article is behind a paywall, but the key numbers are out there. Oumi’s testing methodology is solid — SimpleQA is a standard benchmark, and they ran it multiple times. The 91% post-Gemini 3 update is the best case, but even that leaves a massive gap.

So what’s the takeaway? AI Overviews is better than it was, but “better” is not “good enough.” Google needs to own the failure rate and be transparent about where it falls short. Until then, I’ll keep scrolling past the AI box and clicking the actual links. That’s still the safest bet.

Comments (0)

Be the first to comment!