Crovia Spider đã phân tích dataset LAION-5B, phát hiện nhiều lỗ hổng về giấy phép với điểm tuân thủ chỉ 14/100. Các mô hình AI được đào tạo trên LAION-5B sẽ thừa hưởng những vấn đề pháp lý này. Crovia Spider giúp trích xuất bằng chứng và gợi ý giấy phép để đảm bảo tuân thủ.
#AICompliance #AIGovernance #LAION5B #CroviaSpider #TuânThủAI #QuảnLýAI #DữLiệuMôHình

https://www.reddit.com/r/LocalLLaMA/comments/1pfjnec/crovia_spider_laion5b_evidence_snapshot_real/

AI trains on kids’ photos even when parents use strict privacy settings

Human Rights Watch (HRW) continues to reveal how photos of real children casually posted online years ago are being used to train AI models powering image generators—even when platforms prohibit scraping and families use strict privacy settings.

#HRW #LAION5B #privacy #ArtificialIntelligence #AI #GenAI #TrainingData #data #BigData #technology #tech

https://arstechnica.com/tech-policy/2024/07/ai-trains-on-kids-photos-even-when-parents-use-strict-privacy-settings/

AI trains on kids’ photos even when parents use strict privacy settings

Even unlisted YouTube videos are used to train AI, watchdog warns.

Ars Technica
Just wow...amazing website/visualization about LAION-5B , a large dataset a lot of generative AI models are trained on.
https://knowingmachines.org/models-all-the-way
#AI #bigdata #LAION5B #trainingdata #CSAM
Models All The Way Down

LAION-5B is an open-source foundation dataset. It contains 5.8 billion image and text pairs—a size too large to make sense of. We follow the construction of the dataset to better understand its contents, implications and entanglements.

"As artists, academics, practitioners, or as journalists, dataset investigation is one of the few tools we have available to gain insight and understanding into the most complex systems ever conceived by humans."

https://knowingmachines.org/models-all-the-way#section3

via @ethanz

#Luxembourg #Laion5b #AIModels #Laion #DatasetInvestigation

Models All The Way Down

🌐💡Companies reliance on #internet data for training #AI models is a double-edged sword. It provides vast resources but also includes illegal and problematic representations, which can be inherited by the AI models.

🔍🤖 Researchers discovered over 1,000 images depicting child sexual abuse in the #laion5b dataset, which contains over 5 billion images.

🌟➡️ This calls for open-Sourcing and auditing of Datasets to improve transparency and accountability in AI development. #ethics #responsibleai

LAION-5B: Forscher entdecken Links zu Kindesmissbrauchsbildern

Stanford-Forscher haben Links zu Missbrauchsbildern von Kindern in dem LAION-5B-Trainingsdatensatz für KI-Bildgeneratoren gefunden.

heise online
#CSAM Was Found In a Major #AI #Dataset. Researchers Aren’t Surprised.
The #LAION5B dataset is the basis for numerous AI models. The LAION-5B dataset contains links to 5 billion images scraped from the internet. Researchers have long warned that massive training datasets are poorly audited. https://www.vice.com/en/article/3aky5n/child-sex-abuse-material-was-found-in-a-major-ai-dataset-researchers-arent-surprised
Child Sex Abuse Material Was Found In a Major AI Dataset. Researchers Aren’t Surprised.

The LAION-5B dataset is the basis for numerous AI models. Researchers have long warned that massive training datasets are poorly audited.

Ich lerne: vor dem Bildteilen immer #Exif Daten löschen und bei der #Bildbeschreibung aufpassen. #TIL

Bei der Analyse des weltweit wohl größten öffentlich zugänglichen Trainingsdatensatzes für #KI #Bildgenerierung wurden massenweise Daten gefunden, mit denen sich Personen identifizieren lassen: Gesichter & Namen, Geokoordinaten, E-Mails, Kontonummern. Der #LAION5B Datensatz besteht aus fünf Milliarden Links auf Bilder & ihren Beschreibungen im Internet.
#datenschutz #AI
https://www.tagesschau.de/wissen/technologie/ki-trainingsdaten-privat-datenschutz-100.html

KI-Trainingsdaten enthalten private Informationen

Trainingsdaten sind der Rohstoff für KI-Systeme. Sie bestehen aus riesigen Mengen an Bildern und Texten aus dem Netz. Eine BR-Recherche zeigt nun: Darunter sind viele privaten Daten - ein Problem für den Datenschutz. Von K. Brunner und E. Harlan.

tagesschau.de