This is infuriating and also was totally predictable. Thank you @daveyalba for the reporting.

https://www.bloomberg.com/news/articles/2023-05-01/ai-chatbots-have-been-used-to-create-dozens-of-news-content-farms

A few reactions in thread:

AI Chatbots Have Been Used to Create Dozens of News Content Farms

A new report documents 49 new websites populated by AI tools like ChatGPT and posing as news outlets 

Bloomberg

There is actually no way to "train" an LLM, run as a text synthesis machine, to not generate fake news -- short of watermarking the output (and it's not yet clear whether this is possible).

>>

Google is being particularly squirrely here. It's like the want to leave open the possibility that synthetic text could EVER be a reasonable source of information.

Really bad look for a company ostensibly "organizing the world's information" -- surely that project is only hampered when the information ecosystem is polluted with unending supplies of synthetic text.

@emilymbender Great prole in NYT today by Cade Metz on Geoffrey Hinton , a lead AI developer at Google. It see,s Mr. Hinton has doubts about his life’s work. Working on the problem since the 1970’s he now states that bad actors can easily be imagined using this technology for nefarious ends. Enjoy it, these are the good old days.
@Csosorchid @emilymbender Illustrates my thoughts about techies getting so absorbed in and besotted with the technology that they don't consider the negative consequences. This guy took 40 years to see what could go wrong.
@anne_twain @emilymbender Imagine how bad it must be. He worked in this area from its very start. A company he started was bought by Google, so he must have cashed in. Then he works for Google with one of his colleagues leading the project, and him retiring from some sort of corporate emeritus position.
That guy now thinks it possible his life’s work will lead to something very bad. That guy is smart, he is probably right.
@Csosorchid @emilymbender There are different kinds of smart. Some people have less ability to see the social consequences of things.
@emilymbender what an indirect and weird way for Crovitz to say that publishers should continue to rely on humans to produce news stories
@emilymbender I feel like we're going to end up where "we" (writers, creators, journalists, researchers) have to escrow or validate our work-in-progress because it's impossible to compel watermarking the generative stuff.
@GavinChait
YyyyyyyuuuuuuuuuP. As always it becomes a problem for the 'victim'.
@emilymbender

@GavinChait @emilymbender having a version history works very well, at least for technical writing.

Generating a plausible proof of work with a LLM would not be very difficult, though.

You would need a hours-long video of yourself sitting and writing. Generating a deep fake video (or even making one with clever editing) would not be too difficult, though...

The cross section of pulp , mechanical turk ghost writer, and a language model is not small..

Claiming authorship is not trivial.

@janvenetor @emilymbender Which is where escrow comes in. We'll need a while new industry of trusted proof-of-work auditors.

@emilymbender
YyyyyyyuuuuuuuuuP. How does one "watermark" plain text?

Short answer: one doesn't.

@notroot @emilymbender I read about a scheme where the LLM deliberately uses unlikely words at regular intervals. The text is still “sensical ” but a human would be unlikely to do that, so it serves as a watermark. (As long as you don’t edit the text.) I don’t know if this is actually a good scheme because I’m not an expert, but it seemed possible?

@paulmather007 @emilymbender I mean... I'm sure there's some grifters using unaltered chatbot output that could be detected that way, but the minute that's implemented, some Macedonian teen will write a script to strip the "se Socal" and we're back to undetectable plain text.

The problem is "how to watermark plain text", and there's just no solution to that. Period. At all. It can always be stripped.

@notroot @emilymbender Well, the scheme requires LLMs to, in good faith, impliment the watermark. (Meant to say “sensical,” ironically autocorrect messed me up.) Yes, of course you could strip the watermark. But since people are already turning in essays without reading them that start with “as AI, I can’t write an essay” I suppose there’s a low bar and a watermark would be helpful in some circumstances.
@notroot @emilymbender I mean the thing about any kind of watermark is it can be stripped. They’re a speed bump.

@paulmather007 @emilymbender You're absolutely right IMO... a speed bump is better than nothing. And, yup... the Genie and the Lamp are forever parted company. Now that "we" know how to make LLMs, "we" will make them until "we're" bored.

We really do live in a state of existential anarchy. Human "laws" are just patterns emerging in the chaos, no more binding than a speed limit less than C.

@paulmather007 @emilymbender If I was a philosopher, I'd probably say "Hobbes Was Right". We'll go to any lengths to rationalize away the incontrovertible fact that we're all -- the Earth is -- insignificant specks in a unimaginably vast cosmos. We'll do anything to escape existential angst. Any social order that puts humanity at the center is better than facing the fact that *we have ALWAYS lived in anarchy*. That's why we make governments. We don't like it.

@emilymbender if I understand LLMs correctly (I am not an AI person), they can in theory be used as a steganographic channel. (This understanding hinges on them essentially being an elaborate expander from some entropy source.)

If that's the case, then it should also be possible to embed a watermark in the output.

What's questionable is whether or not the bitrate of the steganographic channel is high enough to embed enough in a typical sized output to be meaningful.