April 20, 2026, 5:46 am | Read time: 3 minutes
For about two years, Google’s search engine has used AI overviews to summarize the most important search results. These contents appear at the top of the search results. However, a recent study shows mixed reliability.
The problem of spreading false information on the internet has existed for years. Google’s AI summaries could even exacerbate this issue. This is shown by a study from the New York Times in collaboration with the AI company Oumi. The accuracy rate of AI-generated responses is 91 percent. While this is an improvement over previous values, it also means that about one in ten answers is incorrect. Scaled to the worldwide use of Google search, this corresponds to several hundred thousand false statements per minute and millions per hour.
Testing Method and Accuracy Development
The AI tool “SimpleQA” from OpenAI was used for the study. It checks the reliability of AI systems with more than 4,000 questions. In 2025, the accuracy of Google’s AI search was still at 85 percent. After an update from Gemini 2.5 to version 3.0, the value rose to 91 percent.
Google criticized the study’s results to the New York Times. The company stated that SimpleQA is unrealistic and uses false information. Moreover, it does not reflect how real users search. Google itself relies on its own test system called “Simple QA verified,” which works with fewer but more carefully selected questions.
Google’s AI Fails at the Word Google
Many Wikipedia Articles Are Outdated or Incorrect
Examples Show AI Weaknesses
The study shows through individual examples that AI does not always work correctly. For instance, it was asked in which year Bob Marley’s former residence became a museum. The AI used several websites as sources but did not find a clear answer and eventually resorted to Wikipedia. However, contradictory data can be found there, and the AI chose the wrong year.
In another example, it was asked when cellist Yo-Yo Ma was inducted into a Hall of Fame for classical music. The AI incorrectly responded that such a Hall of Fame does not exist.
Interpreting the Results Remains Difficult
The significance of the study is not clear-cut. It is critically noted that the test model used may itself contain errors, which could influence the results.
Also of interest: YouTube soon without video titles? New AI feature in testing
Google also points out in a request from Ars Technica that different models are used for different search queries. Often, cheaper variants are used. Additionally, the company itself states that the accuracy of its AI systems ranges between 60 and 80 percent. Against this backdrop, the measured value of 91 percent seems comparatively high.