Mastodawn

schizoidman Jan 16

Judge orders Anna’s Archive to delete scraped data; no one thinks it will comply

https://lemmy.zip/post/57152697

Judge orders Anna’s Archive to delete scraped data; no one thinks it will comply - Lemmy.zip

cross-posted from : https://lemmy.ca/post/58748253 [https://lemmy.ca/post/58748253]

Show thread

AeronMelon Jan 17

Hey judge, order AI companies to delete THEIR illegally-scraped data.

Show thread

db2 Jan 17

Couldn’t they make an argument with that pointing out that they’re being unjustly targeted because they’re smaller and easier to pick on?

Show thread

SolacefromSilence Jan 17

No one cares if they're small or unjustly picked on. If they want to beat the charges, they need to announce their own AI trained on the data.

Show thread

tempest

It would make me laugh if they could train an LLM that could only regurgitate content verbatim

Show thread

ilinamorato Jan 17

Well, it’s not an LLM, but “AI” doesn’t have a defined meaning, so from that perspective they kind of already did.

Show thread

Natanael Jan 17

It’s actually kinda easy. Neural networks are just weirder than usual logic gate circuits. You can program them just the same and insert explicit controlled logic and deterministic behavior. To somebody who don’t know the details of LLM training, they wouldn’t be able to tell much of a difference. It will be packaged as a bundle of node weights and work with the same interfaces and all.

The reason that doesn’t work well if you try to insert strict logic into a traditional LLM despite the node properties being well known is because of how intricately interwoven and mutually dependent all the different parts of the network is (that’s why it’s a LARGE language model). You can’t just arbitrarily edit anything or insert more nodes or replace logic, you don’t know what you might break. It’s easier to place inserted logic outside of the LLM network and train the model to interact with it (“tool use”).

Show thread

Dran Jan 17

en.wikipedia.org/wiki/Markov_chain

Before the advent of AI, I wrote a slack bot called slackbutt that made Markov chains of random lengths between 2 and 4 out of the chat history of the channel. It was surprisingly coherent. Making an “llm” like that would be trivial.

Markov chain - Wikipedia

Show thread

SlurpingPus Jan 17

Reddit has at least one sub where the posts and the comments are generated by Markov-chain bots. More than a few times I’ve gotten a post from there in my feed, and read through it confusedly for several minutes before realizing. Iirc it’s called subreddit_simulator.

Show thread

Meron35 Jan 17

The original subreddit simulator ran on simple Markov chains.

Subreddit simulator GPT2 used GPT2, and was already so spookily accurate that IIRC its creators specifically said they wouldn’t create one based on GPT3 out of fear that people wouldn’t be able to tell the difference between real and not generated content