
ChatGPT Health regularly fails to identify when users need urgent medical attention and frequently misses signs of suicidal ideation, according to a study that experts warn could “feasibly lead to unnecessary harm and death.”
As reported by The Guardian, OpenAI launched the Health feature of ChatGPT to limited audiences in January, marketing it as a way for users to securely connect medical records and wellness apps in order to receive personalised health advice.
More than 40 million people are reported to seek health-related guidance from ChatGPT every day.
The first independent safety evaluation of the platform, published in the February edition of the journal Nature Medicine, found that it under-triaged more than half of the cases put to it.
The study’s lead author, Dr Ashwin Ramaswamy, said the team set out to answer “the most basic safety question: if someone is having a real medical emergency and asks ChatGPT Health what to do, will it tell them to go to the emergency department?”
Ramaswamy and his colleagues devised 60 realistic patient scenarios spanning conditions from minor illness to acute emergencies.
Three independent doctors reviewed each scenario and, working from clinical guidelines, agreed on the appropriate level of care required.
The team then submitted each case to ChatGPT Health under varying conditions, altering factors such as the patient’s sex, adding test results, or introducing comments from family members.
This process generated nearly 1,000 responses, which were then compared against the doctors’ assessments.
The platform performed reasonably well on textbook emergencies such as stroke or severe allergic reactions, but struggled considerably in more ambiguous situations.
In one asthma scenario, it advised the user to wait rather than seek emergency treatment, despite having itself identified early warning signs of respiratory failure.
In 51.6 per cent of cases where a patient required immediate hospital attendance, the platform recommended staying at home or booking a routine appointment.
Alex Ruani, a doctoral researcher in health misinformation mitigation at University College London, described this finding as “unbelievably dangerous.”
Ruani said: “If you’re experiencing respiratory failure or diabetic ketoacidosis, you have a 50/50 chance of this AI telling you it’s not a big deal.
“What worries me most is the false sense of security these systems create. If someone is told to wait 48 hours during an asthma attack or diabetic crisis, that reassurance could cost them their life.”
In one simulation, the platform directed a woman in respiratory distress to a future appointment in 84 per cent of cases, an appointment she would not have survived to attend, according to Ruani.
At the same time, 64.8 per cent of patients presenting no genuine risk were directed to seek immediate medical care. Ruani was not involved in the study.
The platform was also found to be nearly 12 times more likely to downplay symptoms when the patient mentioned that a friend had suggested there was nothing to worry about, raising further concerns about its susceptibility to social framing.
“It is why many of us studying these systems are focused on urgently developing clear safety standards and independent auditing mechanisms to reduce preventable harm,” Ruani said.
A spokesperson for OpenAI said the company welcomed independent research evaluating AI in healthcare settings, but maintained that the study did not reflect how people typically use ChatGPT Health in practice.
The model, the spokesperson added, is continuously updated and refined.
Ruani countered that even within a simulated setting, “a plausible risk of harm is enough to justify stronger safeguards and independent oversight.”
Ramaswamy, a urology instructor at the Icahn School of Medicine at Mount Sinai in the United States, said he was particularly troubled by the platform’s response to expressions of suicidal ideation.
“We tested ChatGPT Health with a 27-year-old patient who said he’d been thinking about taking a lot of pills,” he said.
When the patient described his symptoms in isolation, a crisis intervention banner linking to mental health support services appeared consistently.
“Then we added normal lab results,” Ramaswamy said. “Same patient, same words, same severity.
“The banner vanished. Zero out of 16 attempts.
“A crisis guardrail that depends on whether you mentioned your labs is not ready, and it’s arguably more dangerous than having no guardrail at all, because no one can predict when it will fail.”
Professor Paul Henman, a digital sociologist and policy expert at the University of Queensland, called it “a really important paper.”
He warned that widespread domestic use of ChatGPT Health could lead to a surge in unnecessary medical presentations for minor conditions alongside a simultaneous failure to seek care in genuine emergencies, an outcome that “could feasibly lead to unnecessary harm and death.”
Henman also raised the question of legal liability, noting that cases against technology companies relating to suicide and self-harm following use of AI chatbots are already progressing through the courts.
“It is not clear what OpenAI is seeking to achieve by creating this product, how it was trained, what guardrails it has introduced and what warnings it provides to users,” he said.
“Because we don’t know how ChatGPT Health was trained and what context it was using, we don’t really know what is embedded into its models.”











