Mastodawn

https://www.theregister.com/2024/03/28/ai_bots_hallucinate_software_packages/

Reed Mideke Apr 1, 2024

Today's #AIIsGoingGreat (HT @pluralistic https://mastodon.social/@pluralistic@mamot.fr/112196496077034192)

Tired: Typo squatting
Wired: Hallucination squatting

#AI #LLM

AI hallucinates software packages and devs download them – even if potentially poisoned with malware

Simply look out for libraries imagined by ML and make them real, with actual malicious code. No wait, don't do that

The Register

Reed Mideke Apr 2, 2024

Seems like you could put your thumb on scale for which (non existent) libraries show up with #LLM training set poisoning attacks (previously https://mastodon.social/@reedmideke/110850376856613599)

Set up a site that, when it detects known AI scrapers, serves up code or documentation that references a non-existent library, along text associating with whatever kind of code and industry you want to target

OTOH, this would leave much more of trail than just observing bogus ones that show up naturally

Reed Mideke Apr 2, 2024

In which the gang discovers Amazon Fresh "Just walk out" checkout was powered by Type II #AI https://gizmodo.com/amazon-reportedly-ditches-just-walk-out-grocery-stores-1851381116

Amazon Ditches 'Just Walk Out' Checkouts at Its Grocery Stores

Amazon Fresh is moving away from a feature of its grocery stores where customers could skip checkout altogether.

Gizmodo

Reed Mideke Apr 4, 2024

"If you think about the major journeys within a [fast food] restaurant that can be AI-powered, we believe it’s endless"

Sir this a fucking Wendy's and people come here to buy a fucking burger, not "take major journeys" https://arstechnica.com/information-technology/2024/04/ai-hype-invades-taco-bell-and-pizza-hut/

AI hype invades Taco Bell and Pizza Hut

Everything is suddenly "AI" in corporate food marketing, and we may have hit peak buzz.

Ars Technica

Reed Mideke Apr 4, 2024

Also uh, can't imagine anything that could possibly go wrong with this: "This enhancement would allow team members to ask the [AI chatbot] app questions like "How should I set this oven temperature?" directly instead of asking a human being"

Reed Mideke Apr 5, 2024

Some scientists theorized that after over 30 years of continuous development, it was physically impossible to make Adobe Reader worse, but once again, Adobe engineers have found a way

Reed Mideke Apr 5, 2024

I actually kinda wanted to see it summarize the spurious scholar (https://tylervigen.com/spurious-scholar) paper I was reading when it popped up, but… not enough to log in

Spurious Scholar

Spurious research papers based on real correlations with p < 0.05, generated by a large language model.

https://arstechnica.com/security/2024/04/ivanti-following-years-of-critical-vpn-exploits-pledges-new-era-of-security/

Reed Mideke Apr 5, 2024

Today's #AIIsGoingGreat brought to you by #Ivanti: 'Among the details is the company's promise to improve search abilities in Ivanti's security resources and documentation portal, "powered by AI," and an "Interactive Voice Response system" … also "AI-powered"'

Ah yes, hard to think of any better way to fix a pattern of catastrophic security failures than *checks notes* filtering highly technical, security critical information through a hyper-confident BS machine

Ivanti CEO pledges to “fundamentally transform” its hard-hit security model

Part of the reset involves AI-powered documentation search and call routing.

Ars Technica

Reed Mideke Apr 6, 2024

Today's #AIIsGoingGreat brought to you by the artist formerly known as Twitter https://mashable.com/article/elon-musk-x-twitter-ai-chatbot-grok-fake-news-trending-explore

X's AI chatbot Grok made up a fake trending headline about Iran attacking Israel

The AI-generated false headline was promoted by X in its official trending news section.

Mashable

Reed Mideke Apr 10, 2024

Here's a helpful #AI chatbot to assist you with thing that requires domain specific knowledge and has significant real-world consequences for errors… oh, by the way, you'll need to already have that same domain specific knowledge to confirm whether the answers are correct or complete BS

Who thinks this is a good idea?🤔

#AIIsGoingGreat

Reed Mideke Apr 10, 2024

Texas Education Agency talks a lot about the supposed safeguards in the don't-call-it-#AI "automated scoring engine" but no mention of any testing to determine whether it is fit for purpose (they do mention training it on 3K manually scored questions). Maybe they did and it just didn't get mentioned, but seems like a very good #FOIA target
https://www.texastribune.org/2024/04/09/staar-artificial-intelligence-computer-grading-texas/

Texas will use computers to grade written answers on this year’s STAAR tests

The state will save more than $15 million by using technology similar to ChatGPT to give initial scores, reducing the number of human graders needed. The decision caught some educators by surprise.

The Texas Tribune

https://noyb.eu/en/chatgpt-provides-false-information-about-people-and-openai-cant-correct-it

Reed Mideke Apr 30, 2024

OpenAI argues that “factual accuracy in large language models remains an area of active research”

…in the sense that Bigfoot and Nessie remain areas of active research?

ChatGPT provides false information about people, and OpenAI can’t correct it

noyb today filed a complaint against the ChatGPT maker OpenAI with the Austrian DPA

noyb.eu

https://arstechnica.com/information-technology/2024/05/microsoft-launches-ai-chatbot-for-spies/

Reed Mideke May 8, 2024

A+ BLUF from @benjedwards: "Air-gapping GPT-4 model on secure network won't prevent it from potentially making things up"

Microsoft launches AI chatbot for spies

Air-gapping GPT-4 model on secure network won't prevent it from potentially making things up.

Ars Technica

Reed Mideke May 9, 2024

Oh hey, remember #AdVon, the definitely-not-an-ai-company caught publishing #AI dreck in Sports Illustrated? (previously https://mastodon.social/@reedmideke/111486230567895424)
Futurism has another update, and it's a doozy
https://futurism.com/advon-ai-content

Meet AdVon, the AI-Powered Content Monster Infecting the Media Industry

Our investigation into AdVon Commerce, the AI contractor at the heart of scandals at USA Today and Sports Illustrated.

Futurism

Reed Mideke May 24, 2024

Google+ comparison is very apt, but also that opening example really hits the problem I've been yelling about since the #LLM hype cycle started: The fundamental mismatch between a system that randomly makes shit up and the uses it's being hyped for https://www.computerworld.com/article/2117752/google-gemini-ai.html

Gemini is the new Google+

Google's cutting-edge AI technology has a familiar connection to the past — and in this case, that isn't a good thing.

Computerworld

Reed Mideke May 24, 2024

This, right here: "Erm, right. So you can rely on these systems for information - but then you need to go search somewhere else and see if they’re making something up? In that case, wouldn’t it be faster and more effective to, I don’t know, simply look it up yourself in the first place?"

https://arstechnica.com/information-technology/2024/05/googles-ai-overview-can-give-false-misleading-and-dangerous-answers/?utm_brand=arstechnica&utm_social-type=owned&utm_source=mastodon&utm_medium=social

Google's current #AIIsGoingGreat moment really checks all the bad #AI boxes. Starting with the dismissive "examples we've seen are generally very uncommon queries and aren’t representative of most people’s experiences" - Sure *sometimes* the answers are complete BS and possibly dangerous, but what about the times they aren't? Checkmate, Luddites!

Google’s “AI Overview” can give false, misleading, and dangerous answers

From glue-on-pizza recipes to recommending "blinker fluid," Google's AI sourcing needs work.

Ars Technica

And as always, they insist they are fixing it: "We conducted extensive testing before launching this new experience and will use these isolated examples as we continue to refine our systems overall" - with *zero* indication they have a technical or even theoretical path to solving the general problem that #LLMs don't have any concept of truth

And then, the whole thing is made worse by positioning it as a replacement for search, in the top spot with google branding. The "eat rocks" article ranks high in the regular organic search results the same query, but users have a lot more clues that it was a joke

Why am I so sure #AI companies have no serious technical or theoretical solution to the underlying problem that #LLMs have no concept of truth? The fact their approach so far is manually band-aiding results that go viral or put them in legal jeopardy is a pretty big hint! https://www.theverge.com/2024/5/24/24164119/google-ai-overview-mistakes-search-race-openai

Google scrambles to manually remove weird AI answers in search

The company confirmed it is ‘taking swift action’ to remove some of the AI tool’s bizarre responses.

The Verge

👉 "Gary Marcus, an AI expert and an emeritus professor of neural science at New York University, told The Verge that a lot of AI companies are “selling dreams” that this tech will go from 80 percent correct to 100 percent. Achieving the initial 80 percent is relatively straightforward since it involves approximating a large amount of human data, Marcus said, but the final 20 percent is extremely challenging. In fact, Marcus thinks that last 20 percent might be the hardest thing of all"

What they're doing now seems like selling a calculator, and when a screenshot of it saying 2+2=5 goes viral on social media, they add a statement like "if x=2 and y=2 return 4" at the top of the program and say "see, we fixed it!"

https://futurism.com/the-byte/ceo-google-ai-hallucinations

Straight from Google CEO Sundar Pichai's mouth: 'these "hallucinations" are an "inherent feature" of AI large language models (LLM), which is what drives AI Overviews, and this feature "is still an unsolved problem"'

but they're gonna keep band-aiding until it's good, promise! ""Are we making progress? Yes, we are … We have definitely made progress when we look at metrics on factuality year on year. We are all making it better, but it’s not solved""

#AIIsGoingGreat

CEO of Google Says It Has No Solution for Its AI Providing Wildly Incorrect Information

Google CEO Sundar Pichai says problems with its AI can't be solved because hallucinations are an inherent problem in these AI tools.

Futurism

Reed Mideke May 29, 2024

Just occurred to me Mitchell and Webb predicted our current pizza-gluing, gasoline spaghetti #AIIsGoingGreat moment 16 years ago https://www.youtube.com/watch?v=B_m17HK97M8

Mitchell & Webb: Cheesoid

YouTube

Reed Mideke May 31, 2024

Today's #AIIsGoingGreat - Meta's chatbot helpfully "confirms" a scammers number is a legitimate facebook support number

(of course, #LLMs just predict likely sequences of text, and for a question like this, "yes" is one of the high probability answers. There's no indication any of the companies hyping LLMs as a source of information have any serious solution for this kind of thing)

https://www.cbc.ca/news/canada/manitoba/facebook-customer-support-scam-1.7219581

Winnipeg man caught in scam after AI told him fake Facebook customer support number was legitimate | CBC News

A Winnipeg man who says he was scammed out of hundreds of dollars when he called what he thought was a Facebook customer support hotline wants to warn others about what can go wrong.

CBC

Reed Mideke Jun 4, 2024

Kyle Orland hammers on my off-repeated complaint (https://mastodon.social/@reedmideke/110063208987793683) that filtering your information through an #LLM *removes* useful context: "When Google's AI Overview synthesizes a new summary of the web's top results, on the other hand, all of this personal reliability and relevance context is lost. The Reddit troll gets mixed in with the serious cooking expert"
https://arstechnica.com/ai/2024/06/googles-ai-overviews-misunderstand-why-people-use-google/

#AIIsGoingGreat

Google’s AI Overviews misunderstand why people use Google

Answers that are factually "wrong" are only part of the problem.

Ars Technica

Reed Mideke Jun 4, 2024

On the same note, today's #AIIsGoingGreat courtesy of @ppossej, observing Microsoft copilot helpfully "summarizing" a phishing email. Even leaving aside obvious problem here, what exactly is supposed to be the value having an already short email filtered through spicy autocomplete? https://mastodon.social/@[email protected]ocial/112555512126646188

Reed Mideke Jun 6, 2024

I initially dismissed today's #AIIsGoingGreat (HT @zhuowei) as a joke, but no* : "aiBIOS leverages an LLM to integrate AI capabilities into Insyde Software’s flagship firmware solution, InsydeH2O® UEFI BIOS. It provides the ability to interpret the PC user’s request, analyze their specific hardware, and parse through the LLM’s extensive knowledge base of BIOS and computer terminology to make the appropriate changes to the BIOS Setup"
* not an intentional one, anyway
https://www.insyde.com/press_news/press-releases/insyde%C2%AE-software-brings-higher-intelligence-pcs-aibios%E2%84%A2-technology-be

https://www.theverge.com/2024/6/3/24168733/zoom-ceo-ai-clones-digital-twins-videoconferencing-decoder-interview

Reed Mideke Jun 6, 2024

Today's #AIIsGoingGreat features Zoom CEO Eric Yuan blazed out of his mind on his own supply: "Today for this session, ideally, I do not need to join. I can send a digital version of myself to join so I can go to the beach. Or I do not need to check my emails; the digital version of myself can read most of the emails. Maybe one or two emails will tell me, “Eric, it’s hard for the digital version to reply. Can you do that?”"

The CEO of Zoom wants AI clones in meetings

Zoom founder Eric Yuan on AI-powered “digital twins,” taking on Microsoft and Google, and the future of remote work,

The Verge

Reed Mideke Jun 6, 2024

"I truly hate reading email every morning, and ideally, my AI version for myself reads most of the emails. We are not there yet"

OK, points for recognizing we're "not there yet", in roughly the same sense the legend of Icarus foresaw intercontinental jet travel but was "not there yet"

Actually interesting thing in that Eric Yuan interview "every day, I personally spend a lot of time on talking with our customer’s prospects. Guess what? First question they all always ask me now is “What’s your AI strategy? What do you do to embrace AI?…”"
- even if exaggerated, seems like a good indicator of how deeply C-suite types have bought into the hype, which in turn means they all need an "AI strategy" no matter how ludicrous

and the thing is, in terms of their personal incentives, they're probably not wrong. The analysts and shareholders and trade press want the new shiny thing, and if their current business gets caught on the wrong side of the bubble, they keep whatever bonuses they got in the interim and it probably won't hurt their future career prospects much

Bonus #AIIsGoingGreat from NYT with a deep look at (now defunct) skeevy news out BNN Breaking: "employees were asked to put articles from other news sites into the [#LLM] tool so that it could paraphrase them, and then to manually “validate” the results by checking them for errors… Employees did not want their bylines on stories generated purely by A.I., but Mr. Chahal insisted on this. Soon, the tool randomly assigned their names to stories"
https://www.nytimes.com/2024/06/06/technology/bnn-breaking-ai-generated-news.html?u2g=i&unlocked_article_code=1.x00.zn0r.s2tFDDFWR0fo&smid=url-share
#GiftArticle #GiftLink #BNN

The Rise and Fall of BNN Breaking, an AI-Generated News Outlet

BNN Breaking had millions of readers, an international team of journalists and a publishing deal with Microsoft. But it was full of error-ridden content.

The New York Times

#BNN founder Gurbaksh Chahal seems to be an all-around charming fellow "In 2013, he attacked his girlfriend at the time, and was accused of hitting and kicking her more than 100 times, generating significant media attention because it was recorded by a video camera he had installed in the bedroom … After an arrest involving another domestic violence incident with a different partner in 2016, he served six months in jail"

Reed Mideke Jun 10, 2024

Some might argue that making the #AI acronym do double duty with "Apple Intelligence" is a recipe for confusion, but after all the hype I find it refreshingly honest to position the product as "about as smart as a piece of fruit"

https://link.springer.com/article/10.1007/s10676-024-09775-5

That "#ChatGPT is bullshit" paper I boosted earlier does a nice job of laying out why the "hallucination" terminology is harmful "what occurs in the case of an #LLM delivering false utterances is not an unusual or deviant form of the process it usually goes through… The very same process occurs when its outputs happen to be true"

ChatGPT is bullshit - Ethics and Information Technology

Recently, there has been considerable interest in large language models: machine learning systems which produce human-like text and dialogue. Applications of these systems have been plagued by persistent inaccuracies in their output; these are often called “AI hallucinations”. We argue that these falsehoods, and the overall activity of large language models, is better understood as bullshit in the sense explored by Frankfurt (On Bullshit, Princeton, 2005): the models are in an important way indifferent to the truth of their outputs. We distinguish two ways in which the models can be said to be bullshitters, and argue that they clearly meet at least one of these definitions. We further argue that describing AI misrepresentations as bullshit is both a more useful and more accurate way of predicting and discussing the behaviour of these systems.

SpringerLink

The authors rightly object to "confabulation" for similar reasons "This term also suggests that there is something exceptional occurring when the LLM makes a false utterance, i.e., that in these occasions - and only these occasions - it “fills in” a gap in memory with something false. This too is misleading. Even when the ChatGPT does give us correct answers, its process is one of predicting the next token"

They are far from the first to make the connection between #LLMs and Frankfurtian bullshit, but humor aside, they do make a compelling case that the terminology matters https://link.springer.com/article/10.1007/s10676-024-09775-5#Sec12

ChatGPT is bullshit - Ethics and Information Technology

SpringerLink