Mastodawn

«The "most accurate" #LLM silently discarded 63% of the relevant papers when screening for a #SystematicReview»

https://www.linkedin.com/posts/lechmadeyski_systematicreview-llm-evidencesynthesis-share-7472889532548427776-3mVi

#systematicreview #llm #evidencesynthesis #softwareengineering #openaccess #metaresearch | Lech Madeyski

The "most accurate" LLM silently discarded 63% of the relevant papers when screening for a Systematic Review. That is what we found when we re-analysed a 9,695-article systematic review screening study: the LLM ranked best by Accuracy lost 63.3% of the relevant evidence. The one ranked best by MCC still lost 43.9%. The one ranked best by WMCC — the cost-sensitive Weighted Matthews Correlation Coefficient we propose in the paper — lost just 5.8%. LLMs are increasingly used to screen papers for systematic reviews — but the standard metrics used to evaluate them can badly mislead under the extreme class imbalance and asymmetric error costs of screening. Across 29 papers we reviewed: 🔸 only 24% reported the full confusion matrix 🔸 only 10% reported MCC 🔸 none of the 5 papers claiming "workload savings" priced the cost of a wrongly excluded study Our new open-access paper in Information and Software Technology — LLM4SCREENLIT — turns this into actionable recommendations: ✅ Report Lost Evidence (1 − Recall) as a headline metric ✅ Use Weighted MCC (WMCC): chance-corrected AND cost-sensitive, validated on 9 LLMs × 24 SE secondary studies (34,528 articles) ✅ Always report the full confusion matrix; treat unclassifiable outputs as positives requiring human review ✅ Distinct guidance for benchmarking vs deployment studies — plus a ready-to-use compliance checklist for editors and reviewers Joint work with Barbara Kitchenham (Keele University) and Martin Shepperd (Brunel University of London); I am affiliated with Wydział Informatyki i Telekomunikacji Politechniki Wrocławskiej (Faculty of Information and Communication Technology) of Wrocław University of Science and Technology. I recently had the pleasure of giving an invited talk on this work at the AI Engineering lab at Chalmers University of Technology and the University of Gothenburg (https://lnkd.in/de3iAbas) — thank you again, Miroslaw Staron, for hosting that discussion. 📄 Paper (open access, CC-BY): https://lnkd.in/dpVkt6QK 🧰 Replication package — R/Python scripts + fillable reviewer/editor checklist: https://lnkd.in/d8c-ZReU #SystematicReview #LLM #EvidenceSynthesis #SoftwareEngineering #OpenAccess #MetaResearch

David Makin May 21

2026-05-21T15:05:21+00:00https://blog.sleep-er.co.uk/notes/2026-05-21-16-05/

Lets go all in on AI they say and we want it by next week. Never trust anything when the following phrase is used, "It is actually that simple."

#aihell

Added via Quill

DAIR Institute May 10

Mystery AI Hype Theater 3000, Episode 77 - Hotter Than (AI) Hell

https://peertube.dair-institute.org/w/iccQCfUvfr6ssMAmK7nvmJ

Mystery AI Hype Theater 3000, Episode 77 - Hotter Than (AI) Hell

PeerTube

Alexandre B A Villares ☔🐍Apr 25

#AIHell "Anthropic secretly installs spyware when you install Claude Desktop" https://www.thatprivacyguy.com/blog/anthropic-spyware/

Anthropic secretly installs spyware when you install Claude Desktop — That Privacy Guy!

Anthropic's Claude Desktop silently installs a Native Messaging bridge into seven Chromium browsers, including browsers Anthropic's own documentation says it does not support, and browsers the user has not even installed.

That Privacy Guy!

Toni Aittoniemi Apr 5

So if AI was supposed to be artificial intelligence, and make my job easier, why is it that I consistently have more and more AI tools competing for my attention, sending me notifications and breaking my flow?

You’d think if it was intelligent, it would learn from what I was already doing, right?

#aihell

Alexandre B A Villares ☔🐍Mar 16

Mistery #AI Hype Theater #podcast on #peertube

https://peertube.dair-institute.org/c/mystery_ai_hype_theater/

#AIHELL

Mystery AI Hype Theater 3000

Artificial Intelligence has too much hype. In this stream, linguist Emily M. Bender and sociologist Alex Hanna break down the AI hype, separate fact from fiction, and science from bloviation. They'...

DAIR-Tube

Alexandre B A Villares ☔🐍Mar 5

What the fresh hell is this? #AIHELL

Joanie with the Good Hair 😷Feb 11

I've just caught up on the latest 'Mystery AI Hype Theater 3000' with @emilymbender and @alex, and special guest Naomi Klein -

https://www.twitch.tv/videos/2693454280

- and holy hell, it's depressing as anything, but it's a must-watch.

Key take-away: the worst people in the world have control over the most lethal and destructive weapons in the world, and plan to make decisions on their use aided by the glitchiest tech in the world (and no nuclear treaties are in force).

#AI #GenAI #Military #Gaza #AIHell

Mystery AI Hype Theater 3000 - dair_institute on Twitch

dair_institute went live on Twitch. Catch up on their Science & Technology VOD now.

Twitch

DAIR Institute Jan 9

Mystery AI Hype Theater 3000, Episode 70 - Wrapping Up a Hellish 2025

https://peertube.dair-institute.org/w/sUzA6ZCSW7bv16DGuwRg5L

Mystery AI Hype Theater 3000, Episode 70 - Wrapping Up a Hellish 2025

PeerTube

happyborg Nov 26, 2025

#eBay support is now #AIhell and it is yet another reason for me not to want to buy there.

Their #LLM is useless. Only good for people who can't (or can't be bothered to) navigate the site or use the spoon fed options. It doesn't cater for anything that isn't already easy to find, and hides the actual option I want.

It's like hunt the wumpus or 'Adventure' where you have to navigate a maze until eventually you stumble on the treasure you seek.

Utterly awful fucking 'service'. Fuckem.