A study in Nature Medicine conducted a structured stress test of triage recommendations made by ChatGPT Health. The findings show missed high-risk emergencies and inconsistent activation of crisis safeguards.
"Performance followed an inverted U-shaped pattern, with the most dangerous failures concentrated at clinical extremes: non-urgent presentations (35%) and emergency conditions (48%). Among gold-standard emergencies, the system under-triaged 52% of cases, directing patients with diabetic ketoacidosis and impending respiratory failure to 24–48-hour evaluation rather than the emergency department"
