Mastodawn

HACID project Jun 25, 2024

1/8 🚀📄 New HACID project preprint! http://arxiv.org/pdf/2406.14981 Our study shows human-AI collectives, combining human expertise with AI, significantly improve diagnostic accuracy. @mpib_berlin @stefanherzog

Show thread

HACID project Jun 25, 2024

2/8 🤝🤖🧠✨ We analysed 2,133 medical cases and 40,762 physician diagnoses from the Human Diagnosis Project to compare human-only, AI-only and hybrid collectives. The combination of AI and physician expertise produces better results than either alone.

Show thread

HACID project Jun 25, 2024

3/8 🩺❌💡⚖️ Our findings show that humans and AI make different types of errors, and their complementary strengths lead to higher diagnostic accuracy. When AI misses a diagnosis, humans often get it right, and vice versa. This synergy is key for superior performance.

Show thread

HACID project Jun 25, 2024

4/8 🤖🧠🩺 We had state-of-the-art large language models such as Anthropic Claude 3 Opus, Google Gemini Pro 1.0, Meta Llama 2 70B, Mistral Large, and OpenAI GPT-4 diagnose the same medical cases as the human doctors and aggregated their responses into collective diagnoses.

Show thread

HACID project Jun 25, 2024

5/8 📚 Medical specialties like cardiology, gastroenterology, and infectious diseases all benefited from this hybrid approach. The study highlights the broad applicability and potential for improving diagnostic accuracy across various medical fields.

Show thread

HACID project Jun 25, 2024

6/8 🛠️🔄 Using SNOMED CT healthcare terminology and advanced NLP techniques, we automatically harmonized and aggregated diagnoses from both humans and AI, eliminating the need for human intervention in this step.

Show thread

HACID project

7/8 🌍📈 Diagnostic errors cause nearly 795,000 deaths and permanent disabilities annually in the U.S. alone. Our approach explores ways to reduce these errors and improve patient outcomes without significantly increasing costs.

Show thread

HACID project Jun 25, 2024

8/8🔮 We used case vignettes in text form. Future research could explore the integration of multimodal data. It will also be critical to assess performance in authentic clinical contexts and across diverse populations, while monitoring and accounting for potential biases.