"New study reveals that when used to summarize scientific research, generative AI is nearly five times LESS accurate than humans. Many haven't realized, but Gen AI's accuracy problem is worse than initially thought."

https://royalsocietypublishing.org/doi/epdf/10.1098/rsos.241776

@gerrymcgovern And yet @Nature has gone all in. Suggesting we use it to write papers and review them. I will not be publishing with their journals until they remove that.
@Microplastics101 Fun fact: Nature (for all its faults) never did it, and all editors were quick to clarify that their rules explicitly forbid using AI to write or review papers.
@gerrymcgovern @Nature
@j_bertolotti @gerrymcgovern @Nature I hadn't heard that. I just saw their magazine articles promoting it.

@gerrymcgovern

Just taught an econ of AI course for the first time. I tried to make the case that there is a big accuracy problem but I'm not sure I fully convinced my students bought it.

This will be a good supplement to that case next time I try the course.

@FantasticalEconomics @gerrymcgovern
What good does accepting that the answer is wrong do a student‽ 🙃
@gerrymcgovern The Intelligence is Artificial. It's right there in the name.

@gerrymcgovern

If the story is that AI has been massively overblown by the people who promote it, I'm not seeing anything in any article to dissuade me from this

@gerrymcgovern nothing beats a diligent and thorough human being
@gerrymcgovern I am starting to feel like avoiding AI and keeping your skills sharp is going to become a major advantage in the near future. You can’t deeply learn without repetition, whether intentional or not, and AI removes all repetition.

@hackeryarn

A few years ago I 'kinda dated' a guy who grabbed his phone after any question. I asked, for instance, what he thought of a piece of art or if he knew what shoes would work in a particular situation. Each time, he grabbed his phone.
I once tried to let him visualise something: being at the beach, waves lapping at the shore, vocal gulls. And that suddenly, a black shape arose from the waves.
"What would you do?"
He couldn't answer. His phone had no idea.

@gerrymcgovern

@pascaline @gerrymcgovern oof, that’s rough. Well, there are two ways to get someone hooked on a product. Make it so engaging and good that people can’t live without it, make it so automatic and easy people forgot how to do the thing that the product automates. Looks like we’ve stepped into automating away thinking.

@hackeryarn

Yes, it was rough. I tried to get through very gently, for months, but he was incapable of any original thoughts. Sad 😔

@gerrymcgovern

@gerrymcgovern We were told hallucinations would decrease, far as I can tell, they have increased.

LLMs are completely unacceptable for any purpose beyond shitposting/brainstorming. No LLM content is fit to show another human being, even with "this is from an LLM" warning, let alone without.

@TheZeldaZone @gerrymcgovern
The reduction in accuracy can be attributed by those crawler bots that try to get more 'training data', are taking in more and more generated slop, so that the effectiveness is dropping sharply.

@MeiLin @TheZeldaZone @gerrymcgovern

Or the model architecture and the preset prompts getting too complex.

@glitzersachen @TheZeldaZone @gerrymcgovern
Ah yes... The Preset Prompts, designed to filter the responses in such a way that specific things don't get mentioned by the LLM.

@TheZeldaZone @gerrymcgovern

The problem is that "hallucination" is a human term to describe when we NOTICE that an AI is wrong. It's not doing anything different when it hallucinates. By it's nature, all of it's output is a hallucination, it's just we've created a term to delineate when it clashes with what we know is correct.

If you're making an AI that is general purpose, >50% accuracy is unlikely. if you're making an AI to solve a specific problem, you can get shockingly close to 100%.

@TheZeldaZone @gerrymcgovern There's an AI trained to detect cancer in biopsy slides, and it's reached something like 98% accuracy, often finding pre-cancer anomalies that experts in the field can't spot.

If you try to write a chatbot that can answer any question anyone asks, you're trying to solve a "Very Hard" problem. The sort of thing that chipping away at a crumb is a PHD thesis.

AI huckersters will have you believe it's inevitable.

@adanufgail @TheZeldaZone @gerrymcgovern

But I also not that is not a program that pretends to chat or convey meaning or understand the world. It's just an image classifier, a visual pattern matcher.

I always thought it a bit overblown to call this "AI", even if it uses ML.

@glitzersachen @TheZeldaZone @gerrymcgovern

I'd say it's AI. It's not "General AI" which is the near human-level intelligence (or beyond) all those companies are promising is right around the corner.

It's also much more "AI" than all the companies who are slapping "AI-powered" on things that are running the same algorithms they were for years.

@adanufgail @TheZeldaZone @gerrymcgovern i totally agree, & would like to add that the AIs making a positive impact on society are not generative (like LLMs). also, many other examples of good « AI »are just traditional algorithms or expert systems. we are wasting billions on unethical LLMs that are never going to achieve human-level intelligence, whilst we should be funding research in areas were the goal is to benefit humanity (& not just the rich)

@jpaulgibson @TheZeldaZone @gerrymcgovern Truely! I remember when IBM's Watson was going to help fight hospital mistakes by being a diagnostic tool to assist doctors. Except they were probably a decade too early and so spent billions on tech that didn't work.

AI has made massive leaps in protein-folding and helping to design fusion reactors (we'll see if it's ever actually possible). It could do so much good creating climate saving solutions, rather than burning our world to make memes.

@gerrymcgovern

Having demonstrated how the text generators simply produce the statistically-most-likely next word over and over; the authors then unaccountably write about "potential mitigation strategies".

Rather than stating "do not use these things".

@michael_w_busch Well, exactly. There's this bending over backwards to accommodate this scammy, crappy tech. It's supposed to be a Great Big Facts Machine and instead it's a Great Big Crapping Machine. And it's getting worse. But this deification of technology, this bowing to the tech lords and their latest scam. Because AI does some fancy tricks, so many are wide-eyed and so willing to believe.

@gerrymcgovern I'm having difficulty getting over the idea of using LLMs to SUMMARIZE ABSTRACTS.

THE ABSTRACTS ARE THE SUMMARY! THE ARTICLE IS ALREADY SUMMARIZED! IT'S CALLED AN ABSTRACT!!

@ergative @gerrymcgovern

But it's <bleating> TOO DIFFICULT </bleating>. Everybody needs to be able to pretend their are a scientist (or even educated) so they need those summaries.

@gerrymcgovern I don't think most people understand what “accurate” means in this context.

When a human isn't accurately summarizing something, there are holes and minor misunderstandings.

An LLM AI invents context that doesn't exist but sounds plausible to people not familiar with the field. On a much larger scale than a human misunderstanding.

@Stefan_S_from_H @gerrymcgovern perfect fit conservatives. Their main source of information is "I made it fuck up" anyway, do now they can day that "AI made it fuck up"
@gerrymcgovern
From an AI-marketing point of view this could be overcome: just keep pushing AI and wait until human reviewers performance worsen down (because of AI-boosted cognitive decline) to the level that the G-AI performance becomes comparable.
@gerrymcgovern - Thank you; My thoughts on this subject as well. I cannot trust something that is not corporeal.
@gerrymcgovern I was talking with a colleague who sets up "AI" systems for some of our clients. Stuff like chat bots, for searches in specialized knowledge-bases. They estimate that at least 30% of answers is made up or wrong (their optimistic estimate from the actual reports they receive), to the point of the bot going completely off topic (once recommending a list of VPNs to try to a user who couldn't find a procedure).
@gerrymcgovern another issue related to that is that the cost of iterating to improve on that is massive. Data collection and anonimization and reclassification would already send every project way over budget, and make them more expensive than training people, without even accounting the actual cost of training and running the models. And that would only ever improve things, with 0 guarantees of correct results.
@bovaz Yeah, that's a good point. Re-training is hugely expensive because there's SO much data, and also it's really hard to know WHY the wrong answer was given in the first place, so knowing how to even fix things is a huge challenge. These AI systems quickly spiral into levels of the deepest complexity and unknowability.
@gerrymcgovern so much better than just having a guy and asking them...
@bovaz How staggeringly bad is that? It's astonishing! And yet this is acceptable? But then, having spent almost 30 years working with support content, it's not terribly surprising. Management don't give a crap about support. It's always the cheapest option.

@gerrymcgovern @bovaz

The whole point of these chat bots is to make it harder to bother a real human. They hope that you’ll just get frustrated and go away when you find that asking for help is more of a problem than the one you’re trying to resolve.

Personally, I have never had a useful or successful experience with one, other than the few times that it is set up to escalate to a human.

#techgarbage #enshittification

@saprentice
As soon as I see one of those chatbots, I say, that's it, I'm not contacting support. They are, as you indicate, a cost saving device meant to scare customers away.

@bovaz

@saprentice Almost all of them have a failsafe to cut to a human agent

Not always easy to find.

@gerrymcgovern @bovaz

@androcat @gerrymcgovern @bovaz

I will typically just ask .. “May I chat with a human?” .. if that doesn’t work, I’ll curse at it and leave.

@saprentice

I wonder if they are just looking for the keyword "Human". I hate wasting breath on an edifice of dehumanization.

@gerrymcgovern @bovaz

@saprentice @gerrymcgovern but the thing is, the "chat bot set up to escalate to a human" was a thing that was already perfectly solved. First couple steps of a decision tree to help a human search through a knowledge base, than hand that search over to an operator. That had been working for years, at least.

@bovaz @gerrymcgovern

Exactly. As a decision tree based on content from the website (or other static data), this can be a sensible first step to providing support .. #noai needed.

This has been around for years, and is likely as effective as an AI-based system, for much less money and damage.

@saprentice @gerrymcgovern @bovaz

That very well summarizes why chat bots are used in customer service (and when fopping off employees).

@gerrymcgovern And in science details matter

@gerrymcgovern

"I used AI to... " is nothing more than "Listen, I'm not an asshole but... " for the 21st Century

#FuckAI

@gerrymcgovern
My favorite(?) line:
"Notably, newer models exhibited significantly greater inaccuracies in generalization than earlier versions"

My translation - this crap is getting crappier. So much for "Give us more data and AI will get better!"

@gerrymcgovern summarising is a high level, very (VERY) complex task, but human brains are so good at it that we do it with relative ease.

Why anybody thought that a machine could do it, is something that is beyond my comprehension.

@gerrymcgovern
Hmm, it's published in Royal Society Open Science.
Maybe Musk will resign from the Royal Society in protest? ;

@gerrymcgovern
"It's weird how generative AI is so amazing with all the topics I don't know about, but always wrong with the topics I'm knowledgeable about. Funny coincidence, don't you think?"
https://firefish.city/notes/a7ss0rixe120ub44

People who aren't knowledgeable about anything never have this revelation. Same with people who don't believe in (anybody else's) expertise.

David B. Himself (@DavidBHimself)

It's weird how generative AI is so amazing with all the topics I don't know about, but always wrong with the topics I'm knowledgeable about. Funny coincidence, don't you think? #AI #GenerativeAI #LLM #ChatGPT

Firefish