A study in Nature Medicine conducted a structured stress test of triage recommendations made by ChatGPT Health. The findings show missed high-risk emergencies and inconsistent activation of crisis safeguards.

"Performance followed an inverted U-shaped pattern, with the most dangerous failures concentrated at clinical extremes: non-urgent presentations (35%) and emergency conditions (48%). Among gold-standard emergencies, the system under-triaged 52% of cases, directing patients with diabetic ketoacidosis and impending respiratory failure to 24–48-hour evaluation rather than the emergency department"

#AiAiAi

https://www.nature.com/articles/s41591-026-04297-7.epdf

ChatGPT Health performance in a structured test of triage recommendations - Nature Medicine

A stress test of ChatGPT Health triage revealed missed high-risk emergencies and inconsistent activation of suicide-crisis safeguards, raising safety concerns for consumer-scale deployment.

Nature