This is infuriating and also was totally predictable. Thank you @daveyalba for the reporting.
A few reactions in thread:
This is infuriating and also was totally predictable. Thank you @daveyalba for the reporting.
A few reactions in thread:
There is actually no way to "train" an LLM, run as a text synthesis machine, to not generate fake news -- short of watermarking the output (and it's not yet clear whether this is possible).
>>
Google is being particularly squirrely here. It's like the want to leave open the possibility that synthetic text could EVER be a reasonable source of information.
Really bad look for a company ostensibly "organizing the world's information" -- surely that project is only hampered when the information ecosystem is polluted with unending supplies of synthetic text.
@GavinChait @emilymbender having a version history works very well, at least for technical writing.
Generating a plausible proof of work with a LLM would not be very difficult, though.
You would need a hours-long video of yourself sitting and writing. Generating a deep fake video (or even making one with clever editing) would not be too difficult, though...
The cross section of pulp , mechanical turk ghost writer, and a language model is not small..
Claiming authorship is not trivial.
@emilymbender
YyyyyyyuuuuuuuuuP. How does one "watermark" plain text?
Short answer: one doesn't.
@paulmather007 @emilymbender I mean... I'm sure there's some grifters using unaltered chatbot output that could be detected that way, but the minute that's implemented, some Macedonian teen will write a script to strip the "se Socal" and we're back to undetectable plain text.
The problem is "how to watermark plain text", and there's just no solution to that. Period. At all. It can always be stripped.
@paulmather007 @emilymbender You're absolutely right IMO... a speed bump is better than nothing. And, yup... the Genie and the Lamp are forever parted company. Now that "we" know how to make LLMs, "we" will make them until "we're" bored.
We really do live in a state of existential anarchy. Human "laws" are just patterns emerging in the chaos, no more binding than a speed limit less than C.
@emilymbender if I understand LLMs correctly (I am not an AI person), they can in theory be used as a steganographic channel. (This understanding hinges on them essentially being an elaborate expander from some entropy source.)
If that's the case, then it should also be possible to embed a watermark in the output.
What's questionable is whether or not the bitrate of the steganographic channel is high enough to embed enough in a typical sized output to be meaningful.