This from @waldoj is a really excellent example of how these things BS https://mastodon.social/@waldoj/110353407663057558

I've seen people described LLMs as "recognizing" or "admitting" they were wrong when pressed on a BS answer, but of course, that's just because admitting a mistake is one probable response to having an error pointed out.

They are likely tweaked against the alternative of continuing to argue, because being aggressively wrong is a bad look (except that one asshole version of bing everyone mocked)

Ironically, asshole bing is probably more representative of a training set derived from internet text, so politely accepting your correction is presumably a result of deliberate effort https://www.voanews.com/a/angry-bing-chatbot-just-mimicking-humans-experts-say-/6969343.html
Angry Bing Chatbot Just Mimicking Humans, Experts Say

Among other things, it’s issued threats and spoken of desires to steal nuclear code

Voice of America (VOA News)
Who could have seen this coming? Turns out asking a stochastic bullshit machine whether it wrote a thing is not an accurate way to determine whether it actually wrote the thing #ChatGPT #AI (gift link) https://wapo.st/45eXjAl
A professor accused his class of using ChatGPT, putting diplomas in jeopardy

A Texas A&M instructor falsely accused students of using ChatGPT to write essays, putting them at risk of failing.

The Washington Post

Oh my. A lawyer used #ChatGPT output in their filings and it's going about as well as you'd expect (presuming you have a couple brain cells to rub together)

https://twitter.com/steve_vladeck/status/1662286888890138624

(filings https://www.courtlistener.com/docket/63107798/mata-v-avianca-inc/)

Steve Vladeck on Twitter

“Dear … Everyone: Do *not* use ChatGPT (or any other AI) for legal research. https://t.co/yKUjoHB2Zq (H/T: @questauthority.)”

Twitter
"A submission filed by plaintiff’s counsel in opposition to a motion to dismiss is replete with citations to non-existent cases… the Court issued Orders requiring plaintiff’s counsel to provide an affidavit annexing copies of certain judicial opinions of courts of record cited in his submission, and he has complied… Six of the submitted submitted cases appear to be bogus judicial decisions with bogus quotes and bogus internal citations"
https://www.courtlistener.com/docket/63107798/mata-v-avianca-inc/#entry-31
Mata v. Avianca, Inc., 1:22-cv-01461 - CourtListener.com

Docket for Mata v. Avianca, Inc., 1:22-cv-01461 — Brought to you by Free Law Project, a non-profit dedicated to creating high quality open legal information.

CourtListener
So not only did he use #ChatGPT to write the original filing, when called on the bogus citations he *used ChatGPT to generate the supposed decisions in the cited (non-existent) cases* 🤯 (they're in https://www.courtlistener.com/docket/63107798/mata-v-avianca-inc/#entry-29)
Mata v. Avianca, Inc., 1:22-cv-01461 - CourtListener.com

Docket for Mata v. Avianca, Inc., 1:22-cv-01461 — Brought to you by Free Law Project, a non-profit dedicated to creating high quality open legal information.

CourtListener
On the one hand, I have trouble believing Schwartz' "I had no idea #ChatGPT would make shit up" defense, but on the other, did he really think opposing counsel wouldn't notice, after they already called him on the bogus citations?

The original "hey we couldn't find any of those cases" was in entry #24 https://www.courtlistener.com/docket/63107798/mata-v-avianca-inc/#entry-24

In which we also learn what the case is about: "There is no dispute that the Plaintiff was travelling as a passenger on an international flight when he allegedly sustained injury after a metal serving cart struck his left knee"

… two dudes set their law licenses on fire for a personal injury suit for a guy who took a drink cart to the knee?

Mata v. Avianca, Inc., 1:22-cv-01461 - CourtListener.com

Docket for Mata v. Avianca, Inc., 1:22-cv-01461 — Brought to you by Free Law Project, a non-profit dedicated to creating high quality open legal information.

CourtListener

Idle thoughts: In a legal context, this sort of stuff is likely to be caught pretty quickly.

As happened here, the opposing side is going to try to find the cited cases and notice if they're like, totally made up.

So aside from the poor plaintiff who hired these clowns (and presumably has an argument for inadequate representation), the risk should be limited… but a lot of other contexts are much less well positioned to catch plausible looking BS early

Also, while "Varghese v. China Southern Airlines" and friends are unlikely to slip into authoritative sources as a real cases, it wouldn't be at all surprising for general search engines or future LLMs to pick it up and fail to recognize it isn't real
A Man Sued Avianca Airline. His Lawyer Used ChatGPT.

A lawyer representing a man who sued an airline relied on artificial intelligence to help prepare a court filing. It did not go well.

The New York Times
Very good play-by-play on the #ChatGPT lawyers from @kendraserra (this is where I wish we had proper quote toots) https://mastodon.social/@kendraserra@dair-community.social/110441210421818852

Finally, an #AI article that at least raises the question whether BSing may be an inherent characteristic of LLMs rather than a bug that can be fixed (gift link)

https://wapo.st/43obDFt

ChatGPT ‘hallucinates.’ Some researchers worry it isn’t fixable.

AI chatbots are everywhere, but they still routinely make up false information and pass it off as real.

The Washington Post
The "solutions" discussed mostly strike me as bandaids: "a system they called “SelfCheckGPT” that would ask the same bot a question multiple times, then tell it to compare the different answers. If the answers were consistent, it was likely the facts were correct"
and "researchers proposed using different chatbots to produce multiple answers to the same question and then letting them debate each other until one answer won out"

Seems like these might reduce glaring errors where the training data contains a clear consensus correct answer, but doesn't really address the underlying problem.

Is a model that's usually right about stuff "everyone knows" while still making shit up about less obvious topics an improvement? Or does being right about obvious stuff encourage people to trust it when the shouldn't?

Meanwhile that "Air force AI attacks operator in simulation" story is entertaining, but hardly seems representative of any potential real world usage or risks https://www.vice.com/en/article/4a33gj/ai-controlled-drone-goes-rogue-kills-human-operator-in-usaf-simulated-test
USAF Official Says He ‘Misspoke’ About AI Drone Killing Human Operator in Simulated Test

The Air Force's Chief of AI Test and Operations initially said an AI drone "killed the operator because that person was keeping it from accomplishing its objective."

lol. 'the "rogue AI drone simulation" was a hypothetical "thought experiment" from outside the military'

https://arstechnica.com/information-technology/2023/06/air-force-denies-running-simulation-where-ai-drone-killed-its-operator/

Air Force denies running simulation where AI drone “killed” its operator

“We’ve never run that experiment,” says original source, who “misspoke.”…

Ars Technica
So, people cosplaying as killer #AI behaved like stereotypical sci-fi killer AI, clearly demonstrating the existential threat of killer AI!

Seriously that @davidgerard piece has it all, but I liked this illustration of bollockschain to AI pipeline "IBM: “The convergence of AI and blockchain brings new value to business.” IBM previously folded its failed blockchain unit into the unit for its failed Watson AI"

https://davidgerard.co.uk/blockchain/2023/06/03/crypto-collapse-get-in-loser-were-pivoting-to-ai/

Crypto collapse? Get in loser, we’re pivoting to AI

The same grift by the same grifters.

Attack of the 50 Foot Blockchain
#ChatGPTLawyer's lawyers have filed their response, arguing that while their clients may have been extremely reckless and incompetent, they did not know the cases were fake, and so don't meet the "subjective bad faith" standard for sanctions https://www.courtlistener.com/docket/63107798/mata-v-avianca-inc/#entry-45
Mata v. Avianca, Inc., 1:22-cv-01461 - CourtListener.com

Docket for Mata v. Avianca, Inc., 1:22-cv-01461 — Brought to you by Free Law Project, a non-profit dedicated to creating high quality open legal information.

CourtListener
and TBH, I kinda believe them, because as stupid as they were, knowingly trying to pass off completely fake cases would seem even stupider. Still mindboggling you could get that far and not check though
@willoremus digs into the compute cost of #LLMs and boy does that not look like good news for all the startups cramming #AI into everything (gift link) https://wapo.st/3WTCK8Q
AI chatbots lose money every time you use them. That is a problem.

The sheer expense of operating chatbots could throttle the artificial intelligence boom.

The Washington Post
Also, pity the poor gamers who have only recently started to see GPU availability recover from the cryptominer induced shortages
Lawyer Who Used ChatGPT Faces Penalty for Made Up Citations

In a cringe-inducing court hearing, a lawyer who relied on A.I. to craft a motion full of made-up case law said he “did not comprehend” that the chat bot could lead him astray.

The New York Times

This blow by blow over on the bird site suggests he took an extremely dim view of #ChatGPTLawyer's buddy LoDuca who was signing off on the filings without reading them. Also sounds like they fibbed about who was on vacation when they asked for the extension 😬

https://twitter.com/innercitypress/status/1666838526762139650

Inner City Press on Twitter

“OK - now the Chat GPT case, Mata v. Avianca, sanctions against the lawyer(s) who filed a brief with non-existent or hallucinated cases. Inner City Press is covering the case https://t.co/sa0Kw08K3L and will live tweet, thread below”

Twitter
Filing what appears to be a thinly veiled pitch for a law-oriented AI startup as an amicus on this case is… a choice https://www.courtlistener.com/docket/63107798/mata-v-avianca-inc/#entry-50
Mata v. Avianca, Inc., 1:22-cv-01461 - CourtListener.com

Docket for Mata v. Avianca, Inc., 1:22-cv-01461 — Brought to you by Free Law Project, a non-profit dedicated to creating high quality open legal information.

CourtListener
AI Is a Lot of Work

How many humans does it take to make tech seem human? Millions to support OpenAI, Google, Meta, and every other major tech company. As AI becomes ubiquitous, a vast tasker underclass is emerging — and not going anywhere.

The Verge
Low wage #AI clickworkers surreptitiously using #ChatGPT to do their tasks

#ChatGPTLawyer ruling is in and it's… SANCTIONS FOR EVERYONE! Unsurprisingly, the judge didn't buy the "no bad faith" argument, for predictable reasons such as
"Above Mr. LoDuca’s signature line, the Affirmation in Opposition states, “I declare under penalty of perjury that the foregoing is true and correct”
Although Mr. LoDuca signed the Affirmation in Opposition and filed it on ECF, he was not its author"

https://www.courtlistener.com/docket/63107798/mata-v-avianca-inc/#entry-54

Mata v. Avianca, Inc., 1:22-cv-01461 - CourtListener.com

Docket for Mata v. Avianca, Inc., 1:22-cv-01461 — Brought to you by Free Law Project, a non-profit dedicated to creating high quality open legal information.

CourtListener
Also, not great plan to lie about being on vacation when responding to a show cause order "Mr. LoDuca’s statement was false and he knew it to be false at the time he made the statement. Under questioning by the Court at the sanctions hearing, Mr. LoDuca admitted that he was not out of the office on vacation"
Mr. Schwartz fares no better
"Mr. Schwartz’s statement in his May 25 affidavit that ChatGPT “supplemented” his research was a misleading attempt to mitigate his actions by creating the false impression that he had done other, meaningful research on the issue and did not rely exclusive on an AI chatbot, when, in truth and in fact, it was the only source"
More on clickworkers allegedly using #AI to automate their AI training tasks. Back door Habsburg AI https://www.technologyreview.com/2023/06/22/1075405/the-people-paid-to-train-ai-are-outsourcing-their-work-to-ai/
The people paid to train AI are outsourcing their work… to AI

It’s a practice that could introduce further errors into already error-prone models.

MIT Technology Review
AI is killing the old web, and the new web struggles to be born

AI language models and chatbots show that AI can generate content cheaply but at a lower quality. These characteristics mean AI will remake the web as we know it — from Google Search to Wikipedia and more.

The Verge

Janelle Shane on the recent #AI detector paper, with succinct advice "Don't use AI detectors for anything important"

https://www.aiweirdness.com/dont-use-ai-detectors-for-anything-important/

Don't use AI detectors for anything important

I've noted before that because AI detectors produce false positives, it's unethical to use them to detect cheating. Now there's a new study that shows it's even worse. Not only do AI detectors falsely flag human-written text as AI-written, the way in which they do it is biased. This is

AI Weirdness
This. This is what people reporting on #AI / #LLM hype need to understand
https://mastodon.social/@amydentata@tech.lgbt/110651829564300496
Above could also have helped @[email protected] avoid the whole #AI explain train wreck, which thankfully seems to have been rolled back https://github.com/mdn/yari/issues/9208#issuecomment-1615411943
MDN can now automatically lie to people seeking technical information · Issue #9208 · mdn/yari

Summary MDN's new "ai explain" button on code blocks generates human-like text that may be correct by happenstance, or may contain convincing falsehoods. this is a strange decision for a technical ...

GitHub
io9 Published an AI-Generated Star Wars Article Filled With Errors

A new byline showed up Wednesday on the site of io9, the genre-entertainment section of Gizmodo tech website: “Gizmodo Bot.” And the site’s editorial staff appears to have not had…

Variety
Additional comment from io9 deputy editor James Whitbrook "that's the formal part, here's my own personal comment: lmao, it's fucking dogshit"
https://twitter.com/Jwhitbrook/status/1676704102754004996
James Whitbrook on Twitter

“that's the formal part, here's my own personal comment: lmao, it's fucking dogshit”

Twitter

Oh FFS @[email protected] @[email protected] "readers also pointed out a handful of concrete cases where an incorrect answer was rendered. This feedback is enormously helpful, and the MDN team is now investigating these bug reports"

They aren't "bugs" - #LLMs by definition just put together plausible sounding words with no regard to correctness. Pointing out individual errors demonstrates this, but does not provide any mechanism by which it might be "fixed" in the general case

https://blog.mozilla.org/en/products/mdn/responsibly-empowering-developers-with-ai-on-mdn/

Responsibly empowering developers with AI on MDN | The Mozilla Blog

Generative AI technologies powered by Large Language Models (LLMs), such as OpenAI’s ChatGPT, have shown themselves to be both a big boon to productivity

The post also notes that many users were happy with the answers, ignoring that the target audience of people who *came to MDN looking for help with something they didn't already know* may not immediately recognize that the answer is subtly wrong, or just plausible looking #AI gibberish

It also says "even extraordinarily well-trained LLMs — like humans — will sometimes be wrong"

which is true as far as it goes, but here's the thing: They are not *wrong like humans* … yes, you'll find some overconfident bullshitters on stack overflow, but generally humans in these contexts have some awareness of the limits of their knowledge and don't drift seamlessly between accurate explanation and complete BS

@[email protected] post also makes no mention of the apparent lack of communication with the rest of the MDN team https://github.com/mdn/yari/issues/9208#issuecomment-1615411943
MDN can now automatically lie to people seeking technical information · Issue #9208 · mdn/yari

Summary MDN's new "ai explain" button on code blocks generates human-like text that may be correct by happenstance, or may contain convincing falsehoods. this is a strange decision for a technical ...

GitHub
Anyway, there's a new bug, so if you have thoughts on #MDN adding #AI stochastic bullshit to what has, up to now, been the premier technical reference for web developers, you could make them heard there https://github.com/mdn/yari/issues/9230
The AI help button is very good but it links to a feature that should not exist · Issue #9230 · mdn/yari

Summary I made a previous issue pointing out that the AI Help feature lies to people and should not exist because of potential harm to novices. This was renamed by @caugner to "AI Help is linked on...

GitHub
Are Australian Research Council reports being written by ChatGPT?

Multiple accounts from researchers suggest that feedback for Discovery Project grant funding was written by artificial intelligence

The Guardian

More on the #Gizmodo #AI debacle: After publishing error-ridden #LLM garbage which their own editorial team called "fucking dogshit" 'a G/O Media spokesman, said the company would be “derelict” if it did not experiment with AI. “We think the AI trial has been successful,”'

(free link)

https://wapo.st/43iHlmP

How an AI-written Star Wars story created chaos at Gizmodo

A Gizmodo story on Star Wars, generated by artificial intelligence, was riddled with errors. The irony that the problem happened at a tech publication was undeniable.

The Washington Post
#AI is going great
(caveat I don't know the source and thought it might be a joke, but the rest of their timeline looks real, and Janelle Shane retweeted it)
https://twitter.com/guntrip/status/1640694869785030657
Steve Guntrip on Twitter

“Digistore EU have promoted an AI to their website's chat function. It's not working particuarly well. Follow my attempts to get a tracking number that result in a milkshake recipe and a rude poem. (1/2)”

Twitter
Not gonna screenshot the thread of screenshots here, but it's archived if you don't want to visit the bird site https://web.archive.org/web/20230329232724/https://twitter.com/guntrip/status/1640694869785030657
Steve Guntrip on Twitter

“Digistore EU have promoted an AI to their website's chat function. It's not working particuarly well. Follow my attempts to get a tracking number that result in a milkshake recipe and a rude poem. (1/2)”

Twitter
Why AI writing detectors don’t work

Can AI writing detectors be trusted? We dig into the theory behind them.

Ars Technica
A thing that occurs to me about that last boost from @zoe (https://mastodon.social/@[email protected]imeprincess.net/110797643482092764): #AI scrapers refusing to play nice with ROBOTS.TXT is that it encourages adversarial approaches…
People building models will be keen to exclude AI generated content from the training set. So, would interspersing stuff that scores high as AI-generated (whether it actually is or not) cause entire pages to be excluded? You could separate it from the real content in ways that humans would understand. OTOH, if you care about SEO it'd be pretty risky
There's also been talk about standards to identify AI generated content, leading to hilarious option of falsely identifying your real content as AI generated to stop people from training AI on it
Folks have suggested CSS based approaches to poison models (like white text on white background) but there's a significant risk of breaking accessibility. Also risk of search engines thinking it looks spammy again
A general problem with poisoning like this is that any technique which becomes really widespread will likely be noticed and filtered out. OTOH, if the goal is to not have your content used, that may be OK!

Good to see mainstream press finally touching the question of whether #LLM #AI BSing is fixable or an inherent property of the tech, even if it gets a bit of he said, she said treatment.

Also uh "Those errors are not a huge problem for the marketing firms turning to Jasper AI for help writing pitches…" marketing doesn't care if their pitches are BS? KNOCK ME OVER WITH A FEATHER

https://fortune.com/2023/08/01/can-ai-chatgpt-hallucinations-be-fixed-experts-doubt-altman-openai/

Tech experts are starting to doubt that ChatGPT and A.I. ‘hallucinations’ will ever go away: ‘This isn’t fixable’

Experts are starting to doubt it, and even OpenAI CEO Sam Altman is a bit stumped.

Fortune
So @[email protected] points out (https://mastodon.social/@Toke@helvede.net/110848880977610283) that #OpenAI does claim to have unique user agent and honor robots.txt when scraping text for #ChatGPT #AI training. Not clear whether this is the only or even primary way publicly accessible web content gets into their training set though https://platform.openai.com/docs/gptbot
OpenAI Platform

Explore developer resources, tutorials, API docs, and dynamic examples to get the most out of OpenAI's platform.

Just hypothetically speaking, many web platforms could easily be configured to serve specially tailored content based on the user agent, but that would be mean and wrong and potentially waste resources of VC backed billionaires freeloading off the public web to build their BS machines so definitely don't do that 😉

Complete gibberish will likely get weeded out. Common knowledge will tend to be overwhelmed by other sources. So the sweet spot for influence would seem to be obscure topics, or unique tokens that only appear in your content (though to what end isn't obvious).

Bring on the SolidGoldMagikarp https://www.lesswrong.com/posts/Ya9LzwEbfaAMY8ABo/solidgoldmagikarp-ii-technical-details-and-more-recent

SolidGoldMagikarp II: technical details and more recent findings — LessWrong

tl;dr: This is a follow-up to our original post on prompt generation and the anomalous token phenomenon which emerged from that research. Work done b…

Of course, these things don't just scrape human readable text, many of them do code too. Serving up a special vulnerable version of your input sanitization code when you see GPTBot is left as an exercise to the reader

"It's highly unlikely that ChatGPT's training data includes the entire text of each book under question, though the data may include references to discussions about the book's content—if the book is famous enough"
Highlights a pernicious problem with ChatGPT style #LLM #AI: It's far more likely to give reasonable answers on well-known subjects. If you spot check with say, Dickens and Hunter S. Thomson, you might think it was pretty good at spotting naughty books

https://arstechnica.com/information-technology/2023/08/an-iowa-school-district-is-using-chatgpt-to-decide-which-books-to-ban/

An Iowa school district is using ChatGPT to decide which books to ban

Official: "It is simply not feasible to read every book" for depictions of sex.

Ars Technica