8/8๐ฎ We used case vignettes in text form. Future research could explore the integration of multimodal data. It will also be critical to assess performance in authentic clinical contexts and across diverse populations, while monitoring and accounting for potential biases.
7/8 ๐๐ Diagnostic errors cause nearly 795,000 deaths and permanent disabilities annually in the U.S. alone. Our approach explores ways to reduce these errors and improve patient outcomes without significantly increasing costs.
6/8 ๐ ๏ธ๐ Using SNOMED CT healthcare terminology and advanced NLP techniques, we automatically harmonized and aggregated diagnoses from both humans and AI, eliminating the need for human intervention in this step.
5/8 ๐ Medical specialties like cardiology, gastroenterology, and infectious diseases all benefited from this hybrid approach. The study highlights the broad applicability and potential for improving diagnostic accuracy across various medical fields.
4/8 ๐ค๐ง ๐ฉบ We had state-of-the-art large language models such as Anthropic Claude 3 Opus, Google Gemini Pro 1.0, Meta Llama 2 70B, Mistral Large, and OpenAI GPT-4 diagnose the same medical cases as the human doctors and aggregated their responses into collective diagnoses.
3/8 ๐ฉบโ๐กโ๏ธ Our findings show that humans and AI make different types of errors, and their complementary strengths lead to higher diagnostic accuracy. When AI misses a diagnosis, humans often get it right, and vice versa. This synergy is key for superior performance.
2/8 ๐ค๐ค๐ง โจ We analysed 2,133 medical cases and 40,762 physician diagnoses from the Human Diagnosis Project to compare human-only, AI-only and hybrid collectives. The combination of AI and physician expertise produces better results than either alone.
1/8 ๐๐ New HACID project preprint!
http://arxiv.org/pdf/2406.14981 Our study shows human-AI collectives, combining human expertise with AI, significantly improve diagnostic accuracy.
@mpib_berlin @stefanherzogOur approach did not improve accuracy in all patient cases. Future work could focus on understanding the conditions under which it is beneficial (or not) to pool independent diagnoses in general practice.
Pooled decisions achieved highest accuracy when individual GPs used a Decision Support System (DSS) in their diagnostic process, showing that there can be synergy between DSS and collective intelligence approaches.