AI Summaries on Google Remain Prone to Errors

April 20, 2026, 5:46 am | Read time: 3 minutes

For about two years, Google’s search engine has used AI overviews to summarize the most important search results. These contents appear at the top of the search results. However, a recent study shows mixed reliability.

The problem of spreading false information on the internet has existed for years. Google’s AI summaries could even exacerbate this issue. This is shown by a study from the New York Times in collaboration with the AI company Oumi. The accuracy rate of AI-generated responses is 91 percent. While this is an improvement over previous values, it also means that about one in ten answers is incorrect. Scaled to the worldwide use of Google search, this corresponds to several hundred thousand false statements per minute and millions per hour.

Testing Method and Accuracy Development

The AI tool “SimpleQA” from OpenAI was used for the study. It checks the reliability of AI systems with more than 4,000 questions. In 2025, the accuracy of Google’s AI search was still at 85 percent. After an update from Gemini 2.5 to version 3.0, the value rose to 91 percent.

Google criticized the study’s results to the New York Times. The company stated that SimpleQA is unrealistic and uses false information. Moreover, it does not reflect how real users search. Google itself relies on its own test system called “Simple QA verified,” which works with fewer but more carefully selected questions.

"Googple"? Google’s AI Fails at the Word Google

New Study Many Wikipedia Articles Are Outdated or Incorrect

Examples Show AI Weaknesses

The study shows through individual examples that AI does not always work correctly. For instance, it was asked in which year Bob Marley’s former residence became a museum. The AI used several websites as sources but did not find a clear answer and eventually resorted to Wikipedia. However, contradictory data can be found there, and the AI chose the wrong year.

In another example, it was asked when cellist Yo-Yo Ma was inducted into a Hall of Fame for classical music. The AI incorrectly responded that such a Hall of Fame does not exist.

Interpreting the Results Remains Difficult

The significance of the study is not clear-cut. It is critically noted that the test model used may itself contain errors, which could influence the results.

Also of interest: YouTube soon without video titles? New AI feature in testing

Google also points out in a request from Ars Technica that different models are used for different search queries. Often, cheaper variants are used. Additionally, the company itself states that the accuracy of its AI systems ranges between 60 and 80 percent. Against this backdrop, the measured value of 91 percent seems comparatively high.

This article is a machine translation of the original German version of TECHBOOK and has been reviewed for accuracy and quality by a native speaker. For feedback, please contact us at info@techbook.de.