2/8 ๐ค๐ค๐ง โจ We analysed 2,133 medical cases and 40,762 physician diagnoses from the Human Diagnosis Project to compare human-only, AI-only and hybrid collectives. The combination of AI and physician expertise produces better results than either alone.
3/8 ๐ฉบโ๐กโ๏ธ Our findings show that humans and AI make different types of errors, and their complementary strengths lead to higher diagnostic accuracy. When AI misses a diagnosis, humans often get it right, and vice versa. This synergy is key for superior performance.
4/8 ๐ค๐ง ๐ฉบ We had state-of-the-art large language models such as Anthropic Claude 3 Opus, Google Gemini Pro 1.0, Meta Llama 2 70B, Mistral Large, and OpenAI GPT-4 diagnose the same medical cases as the human doctors and aggregated their responses into collective diagnoses.
5/8 ๐ Medical specialties like cardiology, gastroenterology, and infectious diseases all benefited from this hybrid approach. The study highlights the broad applicability and potential for improving diagnostic accuracy across various medical fields.
6/8 ๐ ๏ธ๐ Using SNOMED CT healthcare terminology and advanced NLP techniques, we automatically harmonized and aggregated diagnoses from both humans and AI, eliminating the need for human intervention in this step.
7/8 ๐๐ Diagnostic errors cause nearly 795,000 deaths and permanent disabilities annually in the U.S. alone. Our approach explores ways to reduce these errors and improve patient outcomes without significantly increasing costs.
8/8๐ฎ We used case vignettes in text form. Future research could explore the integration of multimodal data. It will also be critical to assess performance in authentic clinical contexts and across diverse populations, while monitoring and accounting for potential biases.