Mastodawn

mesa Apr 4, 2024

Reddit has struck a $60m deal with Google that lets the search giant train AI models on its posts

Reddit has struck a $60m deal with Google that lets the search giant train AI models on its posts - Lemmy.World

Its a bit old, but I just learned it via the retro-dodo article here: https://retrododo.com/google-is-killing-retro-dodo/ [https://retrododo.com/google-is-killing-retro-dodo/]

Show thread

Rayspekt

Is it just me or are 60 million a ridiculously small price for that whole dataset?

Show thread

bobburger Apr 4, 2024

To be fair it's a pretty terrible dataset. The AI is just going to say "this" to every question you ask

Show thread

rarkgrames Apr 4, 2024

This.

Show thread

Altima NEO Apr 4, 2024

and “and my axe”

and “rock and stone”

& Knuckles, featuring Dante from the Devil May Cry series

Show thread

AwkwardLookMonkeyPuppet Apr 4, 2024

"Reject humanity. Return to monke.

Show thread

OfCourseNot Apr 4, 2024

Yeah and Google already has everything scrapped and indexed

Show thread

lol Apr 4, 2024

I don’t think it’s terrible; the opposite really. It’s likely incredibly useful for creating LLMs with specific knowledge or behavior. The categorization into subreddits alone opens up so many possible applications. Imagine for example training a conversational AI with data from specific subreddits like science, askscience, biology, physics, astronomy,… or posts by users that frequent such subreddits in order to create sort of an academic AI.

You could do the same for all sorts of topics: Want a sports commentator AI, use sports related subreddits; an AI that supports you in writing a novel, use creative writing subreddits etc. Don’t want your AI to spew political opinions, exclude political subreddits from your data; don’t want it to use offensive language, only use well-moderated subreddits etc.

Show thread

Hemingways_Shotgun Apr 5, 2024

This presumes that Reddit is populated by so-called experts answering questions and posting in those subs.

But the vast overwhelming truth is that most people pretending to be experts are just regurgitating the answers they heard from another reddit post, and so on, and so on.

You might as well just train your AI on the “confidently incorrect” sub and call it a day.

Show thread

MBM Apr 6, 2024

It’s always an eye-opener when you look at an ELI5 thread where you’re actually knowledgeable about the topic

Show thread

AwkwardLookMonkeyPuppet Apr 4, 2024

Or “just Google it”.

Show thread

GBU_28 Apr 5, 2024

Ai:

😭 I’m trying

Show thread

captainlezbian Apr 6, 2024

My heel turn as a mod back in the day was having automod remove lmgtfy links

Show thread

brygphilomena Apr 6, 2024

It was a weird day when I recently went to teach someone about lmgtfy and found the website dead. There are clones, but the original was so simple and great.

Show thread

jkrtn Apr 5, 2024

Hey, now, be fair. There are some Top 40s song lyrics in there too.

Show thread

qjkxbmwvz Apr 4, 2024

I wonder if Google’s unlimited legal budget plays a role. Not a lawyer, so probably way off here…

But, for example, reddit’s success in part depends on Google ingesting their data — reddit shows up in Google searches all the time, which can only happen if Google uses reddit’s content. So reddit telling Google “you can’t use our content” doesn’t work, and they need to say something like, “you can use our content for search results but you can’t consume it as training data.”

This is a pretty straightforward statement/request/demand, but one could imagine Google lawyers maliciously complying and throwing their hands up dramatically, claiming “well we use some amount of AI in our search results, so if we can’t use your content for AI training then we can’t risk using it for search results.” Which would, I imagine, really, really hurt reddit (no Google results would be catastrophic I suspect).

So, perhaps the “low” 60M figure is just Google using their leverage.

Or not. As a random person on the Internet, I can say I’m probably not contributing anything meaningful here…

Show thread

Zaktor Apr 4, 2024

I’m personally curious whether Reddit actually has any ability to protect that database. I don’t remember Reddit TOS, but usually those things give them license to use and copy the data, maybe even to sell it, but not actually the copyright on it. So if someone made a Reddit scraper and copied the comments, wouldn’t only the actual commenter be able to sue?

$60M may be reflecting that, in that it’s more a convenience fee to shield Google against individual Redditors going after them than something that Reddit itself could actually sue over.

Show thread

GBU_28 Apr 5, 2024

How quickly you forget that half of it is just “I also choose this guy’s wife” and “the narwhal bacon’s at midnight”

Show thread

PoliticalAgitator Apr 5, 2024

It’s more than they were making from third party apps, hence the ridiculous API fees.

Show thread

trolololol Apr 5, 2024

Considering it’s all full of Nazis and bots, and if you get to filter all of them out you’re left with reposts and low quality memes followed by comments that represent the hostile side of each of us… I’d say anything over $5 is a good deal for spez.

Now, I hope Google uses this data exclusively for detecting inappropriate answers. Can you imagine it giving answers based on the endless threads i of " I’m not your mate, bro; I’m not your bro, dude…".