#AI, #ML science has a lot to learn from #HCI, #Psychology, #economics on evaluation methods.
Datasets are curated with often implicit assumptions about what the usecase is. E.g, #GenAI systems (#MedFlamingo, #MedPalm) are evaluated with QA from #USMLE. The questions in these datasets are designed to test a clinician's knowledge and memory. A model's performance on these datasets tells us NOTHING about if it is a good information source for lay persons. (2/n)
Mystery AI Hype Theater 3000, Episode 16 - Med-PaLM or Facepalm? A Second Opinion On LLMs In Healthcare
https://peertube.dair-institute.org/w/2hrc8RNfmY5bL2q5bxezQV

Ich hätte da schon ein paar Alternativen in Petto...
Nach der Notaufnahme: KI-Modell soll Aufenthalt im Krankenhaus ermitteln
Das Universitätsklinikum Schleswig-Holstein will gemeinsam mit dem DFKI und dem Unternehmen singularIT eine KI für die Notaufnahme entwickeln.
Beim Abruf medizinischen Wissens ist das Sprachmodell Med PaLM so gut wie Menschen, zeigt eine Nature-Studie. In der Praxis reicht das laut Experten noch nicht.
https://www.heise.de/news/Digital-Health-Med-PaLM-im-Medizinertest-auf-Augenhoehe-9214542.html #DigitalHealth #medpalm #ki #Gesundheitsdaten
Is it worth studying #Medicine 😜?
Is #Radiology really the medical specialty most in alleged danger from artificial intelligence😎?
Google Research and DeepMind unveiled a ChatGPT-like ChatBot for Medicine, #MedPaLM, an open source language model.
Six existing open-ended question response datasets, as well as a new one called HealthSearch.
92.6% of the Med-PaLM responses were on par with those generated by clinicians (92.9%).
#AI #AI #Artificial Intelligence
https://pharmaphorum.com/news/google-and-deepmind-share-work-on-medical-chatbot-med-palm/