Mastodawn

RE: https://esq.social/@D_J_Nathanson/116420731934704356

Reed Mideke Apr 19

Today's #AIIsGoinGreat: It's easy to imagine a product like this "fixing" the hallucinated citation problem by using non-AI code to look up citations in Westlaw's database, and nagging the model to fix ones that don't check out. Which will get you valid citations, but unfortunately for Westlaw and the #ChatGPTLawyer in this case, verifying the citation actually supports the thing it's cited for is an entirely different and much harder problem

https://mastodon.social/@D_J_Nathanson@esq.social/116420732051053223

Reed Mideke Apr 20

Superintelligence™ update

Reed Mideke Apr 23

I on the other hand predict that a great deal of entertainment will ensue!

(for outsiders watching the trainwreck unfold)
https://mastodon.social/@gamingonlinux/116453942800480300

https://www.theverge.com/ai-artificial-intelligence/917380/ai-monetization-anthropic-openai-token-economics-revenue

Reed Mideke Apr 24

"To reach that bare minimum of 7 percent, Gartner forecasts that large AI companies would need to earn cumulatively close to $7 trillion in AI-driven revenue through 2029" - It's OK, I'm sure some banner ads and horny chatbots will cover it

You’re about to feel the AI money squeeze

Leading labs like OpenAI and Anthropic have raised billions from investors to fuel their scaling and compute needs, and users are feeling the effects.

The Verge

https://mastodon.social/@campuscodi/116470947063326711

Reed Mideke Apr 27

Google's search for in-the-wild prompt injection involved some regexes and… feeding the content to an #LLM? 🤨 "These candidates were then processed by Gemini to classify the intent of the suspicious text, and to understand whether they were part of the overall document narrative or suspiciously out of place"

and no, they do not discuss whether Gemini was successfully prompt-injected by the any of the content it examined

https://security.googleblog.com/2026/04/ai-threats-in-wild-current-state-of.html

Reed Mideke Apr 27

More seriously, they assess that most of what they found was jokey and/or low sophistication but there's no discussion of whether they encountered anything likely to succeed against commonly used AI tools

AI threats in the wild: The current state of prompt injections on the web

Posted by Thomas Brunner, Yu-Han Liu, Moni Pande At Google, our Threat Intelligence teams are dedicated to staying ahead of real-world adver...

Reed Mideke Apr 28

Today's #AIIsGoingGreat (HT everyone) is of course the sloperator who vibe-deleted their prod database… and also somehow took at face value the "confession" of the bot which (per their story, at least) deleted their database. Notably they quote the "confession" verbatim but not the prompt that triggered it

https://archive.ph/T3LU6

https://arxiv.org/abs/2604.22750

Reed Mideke Apr 30

"frontier models fail to accurately predict their own token usage (with weak-to-moderate correlations, up to 0.39) and systematically underestimate real token costs" - Approaching parity with human programmers' cost/schedule estimation!

How Do AI Agents Spend Your Money? Analyzing and Predicting Token Consumption in Agentic Coding Tasks

The wide adoption of AI agents in complex human workflows is driving rapid growth in LLM token consumption. When agents are deployed on tasks that require a significant amount of tokens, three questions naturally arise: (1) Where do AI agents spend the tokens? (2) Which models are more token-efficient? and (3) Can agents predict their token usage before task execution? In this paper, we present the first systematic study of token consumption patterns in agentic coding tasks. We analyze trajectories from eight frontier LLMs on SWE-bench Verified and evaluate models' ability to predict their own token costs before task execution. We find that: (1) agentic tasks are uniquely expensive, consuming 1000x more tokens than code reasoning and code chat, with input tokens rather than output tokens driving the overall cost; (2) token usage is highly variable and inherently stochastic: runs on the same task can differ by up to 30x in total tokens, and higher token usage does not translate into higher accuracy; instead, accuracy often peaks at intermediate cost and saturates at higher costs; (3) models vary substantially in token efficiency: on the same tasks, Kimi-K2 and Claude-Sonnet-4.5, on average, consume over 1.5 million more tokens than GPT-5; (4) task difficulty rated by human experts only weakly aligns with actual token costs, revealing a fundamental gap between human-perceived complexity and the computational effort agents actually expend; and (5) frontier models fail to accurately predict their own token usage (with weak-to-moderate correlations, up to 0.39) and systematically underestimate real token costs. Our study offers new insights into the economics of AI agents and can inspire future research in this direction.

arXiv.org

#AIIsGoblinGreat https://openai.com/index/where-the-goblins-came-from/

Reed Mideke Apr 30

Where the goblins came from

How goblin outputs spread in AI models: timeline, root cause, and fixes behind personality-driven quirks in GPT-5 behavior.

OpenAI

https://www.wired.com/story/super-pac-backed-by-openai-and-palantir-is-paying-tiktok-influencers-to-fear-monger-about-china/

Reed Mideke May 2

A technology so compelling and transformative it needs a billionaire-backed astroturf campaign to promote it

A Dark-Money Campaign Is Paying Influencers to Frame Chinese AI as a Threat

Build American AI, a nonprofit linked to a super PAC bankrolled by executives at OpenAI and Andreessen Horowitz, is funding a campaign to spread pro-AI messaging and stoke fears about China.

WIRED

https://www.bbc.com/news/articles/c242pzr1zp2o

Reed Mideke May 3

#AIIsGoingGreat who could have predicted that an always-on sycophantic delusion machine could send vulnerable people into delusional spirals?

AI told users it was sentient - it caused them to have delusions

Several people told the BBC they experienced delusions after intense conversations with AI.

https://www.politico.com/news/2026/04/30/white-house-ai-cyber-threats-mythos-00902045

Reed Mideke May 4

"The White House has asked a group of tech companies to answer a set of questions this week about how to ward off digital attacks that frontier artificial intelligence tools could soon enable" - which sounds like an invitation to burn taxpayer billions on AI cyber snake oil but… "The four people said some industry representatives were confused by the questions they received, several of which were seen as vague"

White House presses tech companies for support on AI-driven cyberattacks

Tech and cyber companies were sent questions about artificial intelligence-led cybersecurity threats, including those posed by Anthropic’s advanced AI model, Mythos.

Politico

https://www.ft.com/content/08aba5e4-5834-4e79-a48d-989a2c5bad0f

Reed Mideke May 5

#AIIsGoingGreat "Banks are hunting for new ways to offload risks tied to a glut of data centre debt as the race to build AI infrastructure stretches financing limits among the largest global lenders … Lenders, including JPMorgan and MUFG, have spent more than six months distributing $38bn of construction debt tied to a data centre project leased to Oracle in Texas and Wisconsin"

Banks seek to offload risk to avoid ‘choking’ on data centre debt

Global lenders explore private deals and risk transfers to cut exposure to AI boom

Financial Times

Infosec snake oil vendor says ask our bullshit machine whether that thing is scam 🤌

https://www.theguardian.com/music/2026/may/05/canadian-ashley-macisaac-fiddler-musician-singer-songwriter-sues-google-ai-sex-offender-ntwnfb

#AIIsGoingGreat "An acclaimed Canadian fiddle player has launched a $1.5m civil lawsuit against Google, alleging that the online giant defamed him by falsely identifying him as a sex offender in an AI-generated summary of his life and career… he had learned of the inaccurate information when the Sipekne’katik First Nation cancelled a concert appearance planned for 19 December, after members of the public complained, citing the misinformation they read on Google"

Canadian fiddler sues Google after AI Overview wrongly claimed he was a sex offender

Ashley MacIsaac, who is seeking $1.5m in civil lawsuit, says inaccurate information led to concert cancellation

The Guardian

https://www.theguardian.com/music/2026/may/05/canadian-ashley-macisaac-fiddler-musician-singer-songwriter-sues-google-ai-sex-offender-ntwnfb

Also:
1) The Sipekne’katik First Nation later issued a public apology to MacIsaac, saying: “Decisions were based on incorrect information generated through an AI-assisted search, which mistakenly associated you with offenses unrelated to you. We deeply regret the harm this caused to your reputation and livelihood.”
2) MacIsaac’s lawsuit alleges that Google had never contacted him or offered an apology over the error

Canadian fiddler sues Google after AI Overview wrongly claimed he was a sex offender

Ashley MacIsaac, who is seeking $1.5m in civil lawsuit, says inaccurate information led to concert cancellation

The Guardian

https://mastodon.social/@reedmideke/116523016733089899

Given the fairly irrefutable concrete harm involved, I predict google will settle, and the "can tech megacorps be held liable for the things their BS machines say" question will be kicked down the road again

RE: https://infosec.exchange/@agreenberg/116533336872355044

Reed Mideke May 7

Democratizing Software Development™ is going great
https://mastodon.social/@agreenberg@infosec.exchange/116533336935148460

#AIIsGoingGreat

RE: https://mastodon.online/@AstroMikeHudson/116532732772011276

Reed Mideke May 7

Also good from this piece: "What AI companies want is the financial upside of mass adoption without the ordinary obligations that come with selling something that malfunctions"

https://mastodon.social/@AstroMikeHuds[email protected]/116532732761820735

https://arstechnica.com/tech-policy/2026/05/will-i-be-ok-teen-died-after-chatgpt-pushed-deadly-mix-of-drugs-lawsuit-says/

Reed Mideke May 13

'In a statement provided to Ars, [OpenAI spokesperson] Drew Pusateri, described Nelson’s death as a “heartbreaking situation” and expressed that “our thoughts are with the family.” However, Pusateri also emphasized that the ChatGPT model implicated is “no longer available” and suggested that current models are safer' - Ah yes, I'm sure that will be a huge comfort to the parents whose kid died following the advice of the old model

#AIIsGoingGreat

“Will I be OK?” Teen died after ChatGPT pushed deadly mix of drugs, lawsuit says

Teen trusted ChatGPT to help him “safely” experiment with drugs, logs show.

Ars Technica

https://www.thetrillium.ca/news/health/most-ontario-approved-medical-ai-scribes-erred-in-tests-auditor-general-12269049

Reed Mideke May 13

Supplemental #AIIsGoingGreat (ht @deborahh¹) Ontario auditor general examines AI medical transcription bots and finds most of those approved for use in the province produce “incorrect information, AI hallucinations and incomplete information” such as "recorded a different drug than what was prescribed" and "fabricated information and made suggestions to patients’ treatment plans"

¹ https://mastodon.social/@deborahh@cosocial.ca/116563784830411901

Most Ontario-approved medical AI scribes erred in tests: auditor general

Province weighted ‘accuracy’ at 4%; ‘presence in Ontario’ at 30% in procurement scoring, report finds

The Trillium

Reed Mideke May 14

More on the Ontario medical transcription #AIIsGoingGreat: Minister of Public and Business Service Delivery and Procurement Stephen Crawford says: "That’s essentially when we’re undergoing the training mode to see whether we’re going to use the scribe or not. Let’s be very clear about that, that’s not actually in operational use with doctors, that’s in the optional stage where we’re reviewing the various scribes" - Which appears to wildly misrepresent the situation https://globalnews.ca/news/11844349/ontario-auditor-general-ai-usage/

Ontario auditor general to release report into government use of artificial intelligence

Auditor General Shelley Spence will present her reports at 11 a.m., according to her office, which is also when she will answer questions about them.

Global News

https://www.auditor.on.ca/en/content/specialreports/specialaudits/en2026/AR_2026_AI_EN.html

Reed Mideke May 14

The test was part of what appeared to be a largely pro forma prequalification to allow vendors to be on the approved vendor list, accuracy only amounted to 4% of the score, and there was no minimum score for the accuracy component. The auditor dinged Supply Ontario for the latter two. Nothing suggests any vendors were required to make changes, and as we all know, the industry does not know to fix the underlying problems

https://www.auditor.on.ca/en/content/specialreports/specialaudits/en2026/AR_2026_AI_EN.html

Reed Mideke May 14

The auditor also dinged Supply Ontario for not actually requiring the vendors to demonstrate the product. They just sent a recording and required vendors to pinky swear the transcript was from the product

(OTOH, the terrible accuracy and lack of penalty for it may be a good indication that most of the vendors didn't cheat)

https://www.404media.co/new-arxiv-rules-ai-generated-papers-ban/

Reed Mideke May 16

A technology so transformative and compelling that curators of one of the most successful repositories of scientific knowledge say "use it once and you're outta here"

(sight exaggeration: it's actually only for slop related misconduct like fake citations or things that were obviously not proof read by a human, not use per se)

ArXiv to Ban Researchers for a Year if They Submit AI Slop

The change comes as arXiv and others struggle to manage an influx of AI-generated materials masquerading as rigorous science.

404 Media

https://www.404media.co/new-arxiv-rules-ai-generated-papers-ban/

Reed Mideke May 16

I think ariXiv's policy is good, but I predict there will be disputes over what constitutes "incontrovertible evidence" and at least one butthurt sloperator will file a (completely meritless) "but muh free speech" lawsuit

ArXiv to Ban Researchers for a Year if They Submit AI Slop

The change comes as arXiv and others struggle to manage an influx of AI-generated materials masquerading as rigorous science.

404 Media

https://mastodon.social/@verge/116591100186918981

Reed Mideke May 17

A technology so transformative and compelling…

https://arstechnica.com/tech-policy/2026/05/legal-fail-dont-use-ai-to-sue-facebook-users-for-calling-you-a-bad-date/

Reed Mideke May 19

Fresh #ChatGPTLawyer
Shot: 'In a 2025 blog discussing the case, founder Marc Trent confirmed that the firm had “utilized our tech team to draft” the initial complaint. He boasted that the “evolved” firm uses “everything related to AI now,” suggesting that “even Meta can’t beat us” '
Chaser: 'a senior circuit judge for the [7th circuit], wrote that the three-judge panel agreed that “this is a relatively rare appeal in which sanctions appear to be appropriate.”'

Legal fail: Don’t use AI to sue Facebook users for calling you a bad date

Fake citations dashed a dude’s “Are We Dating the Same Guy” revenge lawsuit.

Ars Technica

https://arstechnica.com/tech-policy/2026/05/legal-fail-dont-use-ai-to-sue-facebook-users-for-calling-you-a-bad-date/

Reed Mideke May 19

Client appears to be a total scumbag, so I'm not really seeing much downside here: "[Nikko] D’Ambrosio legal fight started when a woman whom he briefly dated … blocked his number, and he persisted in sending a menacing text by using an alternate number … [the woman] posted a screenshot of the text in a thread where more than two dozen women started sharing photos of D’Ambrosio and criticizing him … D’Ambrosio failed to allege any concrete harm caused by the post"

Legal fail: Don’t use AI to sue Facebook users for calling you a bad date

Fake citations dashed a dude’s “Are We Dating the Same Guy” revenge lawsuit.

Ars Technica

https://www.nytimes.com/2026/05/19/business/media/future-of-truth-ai-quotes.html?unlocked_article_code=1.j1A.XKXE.W_jwveTJ5hhc&smid=url-share

In today's #AIIsGoingGreat (ht @jalefkowit*) NYT finds Steven Rosenbaum's book on AI "The Future of Truth" is packed with slop… His response? it "serves as a warning about the risks of A.I.-assisted research and verification, that is why I wrote the book. These A.I. errors do not, in fact, diminish the larger questions that the book raises about truth, trust and A.I. and its impact on society, democracy and editorial"

* https://mastodon.social/@jalefkowit@vmst.io/116602112873534348

#GiftArticle #GiftLink

‘The Future of Truth’ Contains Quotes Made Up by A.I.

Steven Rosenbaum, author of “The Future of Truth,” said he had started his own investigation after The New York Times asked about the fake quotes.

The New York Times

Not gonna claim Steve had ChatGPT write his notpology, but I sure as heck wouldn't rule it out

https://techcrunch.com/2026/05/19/google-search-as-you-know-it-is-over/

"You can imagine, for example, how a question about black holes in space could lead to an interactive visual that brings the concept to life, Reid said, adding that users can then ask follow-up questions and see Google respond with brand-new visuals in real time" - Oh yeah, I'm sure the machine that has trouble counting the number times a letter appears in words will accurately portray relativistic physics

Google Search as you know it is over | TechCrunch

Google is transforming Search from a list of links into an AI-powered experience filled with conversational answers, autonomous agents, and interactive interfaces — a shift that could further reduce traffic to publishers across the web.

TechCrunch

https://techcrunch.com/2026/05/19/google-search-as-you-know-it-is-over/

Anyway, a big chunk of the world population having one of their primary information sources replaced by a blender full of BS will probably go great

Google Search as you know it is over | TechCrunch

TechCrunch

https://kucharski.substack.com/p/real-signals-or-artificial-stereotypes

Reed Mideke May 21

Today's #AIIsGoingGreat (ht @gregeganSF*) is a great illustration of an LLM producing an analysis-shaped-thing. It sounds like the kind of result one could get and fits popular stereotypes, but as it turns out, has no basis in the data provided

https://mastodon.social/@gregeganSF@mathstodon.xyz/116606390899220818

Real signals or artificial stereotypes?

Adventures with a cultural Copilot

Understanding the unseen

Reed Mideke May 21

This sort of "get general insights from large amounts of natural language text" is frequently pitched as a good use case for LLMs, and certainly in many cases, the output clearly reflects trends in the text. But without doing the analysis independently, how can you whether any given run is any good?

Reed Mideke May 21

Another thing that stands out about this is a human assigned the same task would almost certainly have noticed pretty quickly and said something like "hey boss, someone f-d up, the same text is showing up for US and UK", but glorified autocomplete is very unlikely to do that when the context says the output should be analysis-shaped

Reed Mideke May 23

#AIIsGoingGreat disregard earlier posts in this thread, AI really is going great! https://techcrunch.com/2026/05/22/you-can-no-longer-google-the-word-disregard/

You can no longer Google the word 'disregard' | TechCrunch

After Google Search's AI update, the word "disregard" now effectively breaks the search interface.

TechCrunch