1/8 ๐Ÿš€๐Ÿ“„ New HACID project preprint! http://arxiv.org/pdf/2406.14981 Our study shows human-AI collectives, combining human expertise with AI, significantly improve diagnostic accuracy. @mpib_berlin @stefanherzog
2/8 ๐Ÿค๐Ÿค–๐Ÿง โœจ We analysed 2,133 medical cases and 40,762 physician diagnoses from the Human Diagnosis Project to compare human-only, AI-only and hybrid collectives. The combination of AI and physician expertise produces better results than either alone.
3/8 ๐ŸฉบโŒ๐Ÿ’กโš–๏ธ Our findings show that humans and AI make different types of errors, and their complementary strengths lead to higher diagnostic accuracy. When AI misses a diagnosis, humans often get it right, and vice versa. This synergy is key for superior performance.
4/8 ๐Ÿค–๐Ÿง ๐Ÿฉบ We had state-of-the-art large language models such as Anthropic Claude 3 Opus, Google Gemini Pro 1.0, Meta Llama 2 70B, Mistral Large, and OpenAI GPT-4 diagnose the same medical cases as the human doctors and aggregated their responses into collective diagnoses.
5/8 ๐Ÿ“š Medical specialties like cardiology, gastroenterology, and infectious diseases all benefited from this hybrid approach. The study highlights the broad applicability and potential for improving diagnostic accuracy across various medical fields.
6/8 ๐Ÿ› ๏ธ๐Ÿ”„ Using SNOMED CT healthcare terminology and advanced NLP techniques, we automatically harmonized and aggregated diagnoses from both humans and AI, eliminating the need for human intervention in this step.
7/8 ๐ŸŒ๐Ÿ“ˆ Diagnostic errors cause nearly 795,000 deaths and permanent disabilities annually in the U.S. alone. Our approach explores ways to reduce these errors and improve patient outcomes without significantly increasing costs.
8/8๐Ÿ”ฎ We used case vignettes in text form. Future research could explore the integration of multimodal data. It will also be critical to assess performance in authentic clinical contexts and across diverse populations, while monitoring and accounting for potential biases.