Study Finds That AI Chatbots Fail Real Patients

What To Know

Surveys cited in the study indicate that roughly one in six adults in the United States consult AI chatbots for health information at least once a month—a number expected to rise rapidly as AI becomes further integrated into everyday digital tools.
The broader takeaway from the Oxford study is not that AI has no role in healthcare, but that its deployment must be cautious, carefully regulated, and supported by rigorous safety testing in real-world conditions.

AI News: Artificial intelligence chatbots that can ace medical exams are now facing serious scrutiny after a major new study found they may be unreliable—and even dangerous—when used by real people seeking health advice. The research, published in the prestigious journal Nature Medicine, suggests that while large language models (LLMs) perform impressively in controlled academic settings, their real-world use by the public poses significant risks.

AI chatbots shine in exams but struggle dangerously with real-world patient advice
Image Credit: Thailand AI News

The study was led by researchers from the Oxford Internet Institute and the Nuffield Department of Primary Care Health Sciences at the University of Oxford. In this AI News report, the findings paint a sobering picture of how AI tools behave outside laboratory benchmarks and standardized testing environments. According to the researchers, the core problem is not just occasional inaccuracies, but a persistent tendency to provide inconsistent, mixed-quality information that users struggle to interpret correctly.

When Exam Success Does Not Translate to Patient Safety

Dr Rebecca Payne, a co-author of the study and a practicing general practitioner, cautioned that public enthusiasm for AI in healthcare may be premature. Despite widespread headlines celebrating AI systems that outperform medical students in licensing exams, she emphasized that “AI just isn’t ready to take on the role of the physician.”

To test how effective AI tools truly are in real-life scenarios, researchers recruited nearly 1,300 participants across the United Kingdom. Participants were presented with ten different health-related situations, ranging from relatively mild concerns such as headaches after a night out drinking, to more complex issues such as symptoms suggestive of gallstones or postnatal exhaustion.

Participants were randomly assigned to use one of three major AI chatbots—OpenAI’s GPT-4o, Meta’s Llama 3, or Command R+—while a control group relied on traditional search engines or conventional methods such as consulting a general practitioner. The results were striking. Those using AI chatbots correctly identified their health condition only about one-third of the time. Even more concerning, only around 45 percent selected the appropriate next course of action.

These results were no better than the control group.

The Communication Breakdown Problem

Researchers described a clear gap between how AI performs in exam-style testing and how it functions when interacting with real people. Unlike standardized simulated cases, real users often failed to provide complete or relevant details when describing symptoms. At the same time, many participants struggled to interpret the chatbot’s responses, misunderstood the options presented, or ignored important warnings.

Those using AI Chatbots to solve their medical issues could be actually endangering themselves
Image Credit: Thailand AI News

Andrew Bean, the study’s lead author from the Oxford Internet Institute, said that interacting effectively with humans remains a significant challenge for even the most advanced language models. He noted that while these systems excel at structured knowledge retrieval, they falter in messy, real-world communication.

The implications are substantial. Surveys cited in the study indicate that roughly one in six adults in the United States consult AI chatbots for health information at least once a month—a number expected to rise rapidly as AI becomes further integrated into everyday digital tools.

Medical ethicists not involved in the research have echoed the concerns. They warn that misplaced confidence in AI-generated medical advice could delay urgent treatment, lead to incorrect self-diagnosis, or create false reassurance in serious cases.

The broader takeaway from the Oxford study is not that AI has no role in healthcare, but that its deployment must be cautious, carefully regulated, and supported by rigorous safety testing in real-world conditions. While AI systems show enormous promise in diagnostics support and administrative efficiency, they cannot yet replace trained medical professionals in high-stakes decision-making.

As public reliance on AI tools grows, the responsibility to ensure their safety becomes more urgent. Policymakers, developers, and healthcare providers will need to collaborate closely to prevent technological optimism from outpacing patient protection. For now, experts advise individuals to seek medical guidance from qualified healthcare professionals rather than relying solely on chatbot advice.

The study published in nature Journal can be found here:

https://www.nature.com/articles/s41591-025-04074-y

For the latest on AI in healthcare, keep on logging to Thailand AI News.

Visit Also:

https://www.thailandhotel.news/

https://bangkokhotel.news/

Demo

Useful Links

Edtior's Picks

Latest Articles

Study Finds That AI Chatbots Fail Real Patients

UAE Leads Global Generative AI Adoption as Singapore Follows Fast

You may also like

Demo

Useful Links

Edtior's Picks

Latest Articles