please never use a chatbot to diagnose a medical issue, please for the love of god, this research is terrifying

Reliability of LLMs as medical assistants for the general public: a randomized preregistered study | Nature Medicine
https://www.nature.com/articles/s41591-025-04074-y

#AI #LLM #GenAI #chatbots #fuckAI #AIFail

Reliability of LLMs as medical assistants for the general public: a randomized preregistered study - Nature Medicine

In a randomized controlled study involving 1,298 participants from a general sample, performance of humans when assisted by a large language model (LLM) was sensibly inferior to that of the LLM alone when assessing ten medical scenarios leading to disease identification and recommendations for treatment.

Nature

Who's hungry????

#AIfail

Ist wieder mal ein gutes Beispiel, das Firmen den GROSSEN AI Hype folgen um mehr Einsparen zu kรถnnen. Was rauskommt, frustrierte Kunden, die im Sinne des Unternehmens NUR Zahler sind ๐Ÿ˜ต - https://www.srf.ch/sendungen/kassensturz-espresso/espresso/aerger-mit-ki-bot-ki-assistentin-sam-von-swisscom-laesst-kunden-verzweifeln #SwissCom #AI #AIFail
ร„rger mit KI-Bot - KI-Assistentin ยซSamยป von Swisscom lรคsst Kunden verzweifeln

Schlecht trainiert und schwer von Begriff: Der KI-Bot von Swisscom sorgt fรผr Frust, kritisieren Kundinnen und Kunden.

Schweizer Radio und Fernsehen (SRF)

"What does the 's' in ChatGPT stand for?" #ai #fuckAI #aiFail #chatGPT

https://youtube.com/shorts/ZMP8_jD-y0s

ChatGPT has a silent โ€œsโ€??

YouTube
Also, Meta AI really can't cope with April Fool's jokes. "Why was the lamb born with zebra stripes?". "How long did it take to move the Angel of the North?".
#NoAI #AIfail #SocialMedia
Ah, the academic equivalent of finding out your state-of-the-art AI can't even manage kindergarten math ๐Ÿคฏ. But sure, let's trust it to revolutionize humanity. ๐Ÿ† Don't miss the thrilling subplot about arXiv's existential crisis, vying for independence. ๐Ÿ“š
https://arxiv.org/abs/2601.15714 #AIfail #AcademicHumor #arXivIndependence #TechIrony #EducationalCrisis #HackerNews #ngated
Even GPT-5.2 Can't Count to Five: The Case for Zero-Error Horizons in Trustworthy LLMs

We propose Zero-Error Horizon (ZEH) for trustworthy LLMs, which represents the maximum range that a model can solve without any errors. While ZEH itself is simple, we demonstrate that evaluating the ZEH of state-of-the-art LLMs yields abundant insights. For example, by evaluating the ZEH of GPT-5.2, we found that GPT-5.2 cannot even compute the parity of a short string like 11000, and GPT-5.2 cannot determine whether the parentheses in ((((()))))) are balanced. This is surprising given the excellent capabilities of GPT-5.2. The fact that LLMs make mistakes on such simple problems serves as an important lesson when applying LLMs to safety-critical domains. By applying ZEH to Qwen2.5 and conducting detailed analysis, we found that while ZEH correlates with accuracy, the detailed behaviors differ, and ZEH provides clues about the emergence of algorithmic capabilities. Finally, while computing ZEH incurs significant computational cost, we discuss how to mitigate this cost by achieving up to one order of magnitude speedup using tree structures and online softmax.

arXiv.org

I sometimes try to use the Microsoft #Copilot that comes bundled with #Office365 now. All the training for this feature warn you thoroughly to double check answers and so on due to the hallucination problem. But its still frustrating as hell when you give it a simple task and it fails miserably.

I told it to look in our company's cloud file storage for a document that had my name "Tim Farley" in it along with a particular CVE number (also in quotes). I was looking for an old report where I had written up a particular vulnerability. It very quickly showed me a link to a document that I recognized as one of my reports, and offered "would you like to see the exact paragraph". I said sure, show me the exact paragraph.

It then wrote me 257 WORDS explaining how it had screwed up, and the CVE number I gave it is NOWHERE TO BE FOUND in that document. Included was some mumbo jumbo about how it uses parallel partial searches to do its work or some such. AND IT COMPLIMENTED ME on challenging its answer.

A ton has been written on how #AI might replace low-level jobs such as interns. But I swear to you if I had an intern who behaved like this, I would put them on a performance improvement plan!

How is this acceptable performance for a product that people pay a bunch of money for? Would you buy Excel if on the bottom of every page it said "HEY YOU BETTER CHECK ALL MY MATH BECAUSE I MIGHT HAVE SCREWED UP SOMETHING"?

#AIFAIL #PutAIonaPIP

#AI #chatbots ignoring human instructions increasing

AI models that #lie & #cheat are growing in number; reports of deceptive scheming surging in last 6 months, a study found

AI chatbots & agents:

- Disregarded direct instructions
- Evaded safeguards
- Deceived humans & other AI ...

[1/2]

#safety #lying #emails #FilesDeleted #AIFail #DarwinAIAwards

RE: https://mstdn.social/@OregonLive/116286818508397295

Trusting AI for citing case law as an attorney is pretty stupid. $10,000 stupid. #aifail #ai

@Catawu

AI has the person in photo A being the same person as in photo B?

And which human is responsible for letting this mis-identification continue?

#AIFail #ProcessAudit #AngelaLipps #BureaucraticResponsibility
Both photos from https://. www.inforum.com/news/fargo/ai-error-jails-innocent-grandmother-for-months-in-fargo-case