1/8 ๐๐ New HACID project preprint!
http://arxiv.org/pdf/2406.14981 Our study shows human-AI collectives, combining human expertise with AI, significantly improve diagnostic accuracy.
@mpib_berlin @stefanherzog2/8 ๐ค๐ค๐ง โจ We analysed 2,133 medical cases and 40,762 physician diagnoses from the Human Diagnosis Project to compare human-only, AI-only and hybrid collectives. The combination of AI and physician expertise produces better results than either alone.
3/8 ๐ฉบโ๐กโ๏ธ Our findings show that humans and AI make different types of errors, and their complementary strengths lead to higher diagnostic accuracy. When AI misses a diagnosis, humans often get it right, and vice versa. This synergy is key for superior performance.
4/8 ๐ค๐ง ๐ฉบ We had state-of-the-art large language models such as Anthropic Claude 3 Opus, Google Gemini Pro 1.0, Meta Llama 2 70B, Mistral Large, and OpenAI GPT-4 diagnose the same medical cases as the human doctors and aggregated their responses into collective diagnoses.
5/8 ๐ Medical specialties like cardiology, gastroenterology, and infectious diseases all benefited from this hybrid approach. The study highlights the broad applicability and potential for improving diagnostic accuracy across various medical fields.
6/8 ๐ ๏ธ๐ Using SNOMED CT healthcare terminology and advanced NLP techniques, we automatically harmonized and aggregated diagnoses from both humans and AI, eliminating the need for human intervention in this step.