🔍 Nghiên cứu phân tích 23+ mô hình từ 7 phòng thí nghiệm cho thấy đặc tính “thermodynamic” của mô hình phụ thuộc nhiều hơn vào nhà phát triển hơn là số tham số. Các mô hình EleutherAI (Pythia, GPT‑NeoX) có xu hướng giảm tín hiệu (G<1), trong khi Meta/OpenAI (LLaMA, OPT, GPT‑2) mở rộng (G>1). Fine‑tuning chỉ thay đổi độ lớn, hiếm khi đảo ngược dấu, nên việc chọn base model quan trọng. #AI #NLP #LLM #MachineLearning #Mô_hình #DeepLearning #Research #EleutherAI #Meta #LLaMA #Finetuning

https://ww

Kann KI auch ohne Urheberrechtsverletzung stark sein? EleutherAI zeigt mit „Common Pile v0.1“, wie ethisches Training mit 8 TB aus freien & lizenzierten Quellen aussehen kann. Reicht das gegen die Großen der Branche? Klick rein & urteile selbst. #EleutherAI #CommonPile #KI 👇
https://www.all-ai.de/news/news24/ki-training-free
Sauber, schlau, stark: So geht KI-Training heute

Comma v0.1 zeigt, was mit legalen Daten möglich ist. Ist das das Ende der Copyright-Diskussion in der KI?

All-AI.de

TechCrunch: EleutherAI releases massive AI training dataset of licensed and open domain text. “The dataset, called the Common Pile v0.1, took around two years to complete in collaboration with AI startups Poolside, Hugging Face, and others, along with several academic institutions. Weighing in at 8 terabytes in size, the Common Pile v0.1 was used to train two new AI models from EleutherAI, […]

https://rbfirehose.com/2025/06/07/techcrunch-eleutherai-releases-massive-ai-training-dataset-of-licensed-and-open-domain-text/

TechCrunch: EleutherAI releases massive AI training dataset of licensed and open domain text | ResearchBuzz: Firehose

ResearchBuzz: Firehose | Individual posts from ResearchBuzz

Une autres alternative de LLM open source à découvrir :

GPT-NeoX / GPT-Neo-J : Développés par des chercheurs d'EleutherAI, un laboratoire de recherche en IA à but non lucratif

#EleutherAI #GPT-NeoX
#openLLM #LLMalternatif

EleutherAI is a grassroots non-profit AI research group, formed in July 2020 by Connor Leahy, Sid Black, and Leo Gao. Known for creating open-source models like GPT-Neo, GPT-J, and GPT-NeoX, their Pile dataset is widely used for training large language models. In early 2023, they incorporated as the EleutherAI Institute. #AI #OpenSource #EleutherAI #MachineLearning #GPT
https://eleuther.ai
EleutherAI

EleutherAI
"Recently it was revealed that an AI research lab called #EleutherAI had harvested subtitles from YouTube videos without the creators' consent. This data was then combined with data from Wikipedia, the U.K. Parliament and Enron Staff emails and added to a dataset called “the Pile.”
(Tom's Guide 7/22/2024)

YouTube creators surprised to find Apple and others trained AI on their videos

AI models at Apple, Salesforce, Anthropic, and other major technology players were trained on tens of thousands of YouTube videos without the creators' consent and potentially in violation of YouTube's terms

#Apple #Salesforce #Anthropic #EleutherAI #YouTube #data #BigData #TrainingData #ArtificialIntelligence #AI #technology #tech

https://arstechnica.com/ai/2024/07/apple-was-among-the-companies-that-trained-its-ai-on-youtube-videos/

YouTube creators surprised to find Apple and others trained AI on their videos

Apple only used the model in question for research purposes, though.

Ars Technica

This is a sentence and a half: “Proof News' article also mentions that it was trained on videos of a parrot, so AI models are parroting a parrot, parroting human speech, as well as parroting other AIs, parroting humans.” #EleutherAI

https://arstechnica.com/ai/2024/07/apple-was-among-the-companies-that-trained-its-ai-on-youtube-videos/?utm_brand=arstechnica&utm_social-type=owned&utm_source=mastodon&utm_medium=social

@arstechnica @arstechnica

YouTube creators surprised to find Apple and others trained AI on their videos

Apple only used the model in question for research purposes, though.

Ars Technica
AI companies used YouTube videos without permission to train models

An investigation by Proof News has found that several top AI companies, including Apple, Nvidia, and Anthropic, have used transcripts from thousands of

Stack Diary
Did you remember #EleutherAi exists?
I honestly completely forgot about them.

#ArtificialIntelligence #AI