Just wondering how you collect as much Mastodon content as possible for AI training purposes ?
#AI #Mastodon #trainingdata
Meta slurps up EU user data for AI training

Meta users in Europe will have their public posts swept up and ingested for AI training, the company announced this week.

Malwarebytes
Novel technique overcomes spurious correlations problem in AI

AI models often rely on "spurious correlations," making decisions based on unimportant and potentially misleading information. Researchers have now discovered these learned spurious correlations can be traced to a very small subset of the training data and have demonstrated a technique that overcomes the problem. The work has been published on the arXiv preprint server.

Open Web Crawl is such a security vulnerability, that I don’t know why it isn’t the top of the news every day.

If you turn on a general suction hose, how do you not realise there’s going to be a party of attackers right there feeding it all the #propaganda they possibly can?

How can you be so nonchalant about it? How do you not realise you created the biggest attack vector in the history of computing?

#ai #trainingdata #crawlers

🚀 The EU’s #AI Challenge: Can Europe Compete Without Enough #TrainingData?

Daniel Friedlaender explains why AI #innovation depends on #data diversity – and how outdated #privacy approaches are a disadvantage in the global AI race.

#AIAct #DataProtection #EuropeanAIroundtable

https://www.youtube.com/watch?v=hEAzCWVjdu4

The EU’s AI Challenge: Can Europe Compete Without Enough Training Data?

YouTube
The recent copyright decisions against AI have made it imperative to rethink how we train AI. Ultimately, we should aim to build a free training data repository to train genuinely free AI.
#AI #trainingdata
https://www.korte.co/3iqx
The AI Copyright Apocalypses: Bon or Threat to Developers

The latest AI copyright cases could throw AI into chaos yet relieve exploited creatives. Let's examine AI's copyright problem and find a better solution.

Kevin Korte - AI and Cybersecurity for the Boardroom
Many critics maintain that AI cannot be open sourced in principle (cited: Lessing, Casado, Stoica). To me it seems clear that all people investing into public uses for AI have a duty to demand legal clarity and open access to #trainingdata // / @simonschlauri delivers a constructive and balanced exposé at #Winterkongress https://winterkongress.ch/2025/talks/open_source_artificial_intelligence/
Open Source Artificial Intelligence

Vortrag von Simon Schlauri Winterkongress 2025

Winterkongress 2025

@RuthMalan

The worst-case scenario here is you get sued?

Apparently, this author/text was not included in the Book3 dataset of pirated content used for LLM #trainingdata.

First Legal Ruling on AI, Copyright, and Training Data Goes the Way of Creators

This case will be cited in future legal arguments.

PetaPixel
Copyright debate intensifies over AI training data use

Analysis of Andreessen Horowitz's position on AI model training and fair use in response to US Copyright Office inquiry

PPC Land