US #publishers, represented by #DigitalContentNext, have sent a cease and desist letter to #CommonCrawlFoundation, demanding they stop #scraping and sharing #copyright|ed content from their member companies. https://pressgazette.co.uk/media_law/common-crawl-ai-news-publishers-scraping-cease-and-desist-letter/?eicker.news #tech #media #news
US publishers tell Common Crawl to stop scraping and delete archive

Digital news publishers in the US have raised “significant legal concerns” over the continued scraping of their content by Common Crawl.

Press Gazette
Learned a lot about #airflow at scale over the last few weeks doing a POC for #CommonCrawlFoundation - some of it is not super well documented, so I wrote up my learnings: https://www.jason-grey.com/posts/2025/airflow-at-scale/
Scaling Airflow Dataset Scheduling: Lessons from Common Crawl

Insights on implementing Apache Airflow's dataset-based scheduling for petabyte-scale data processing at Common Crawl

At Civic Hall today, for ...

AI and the Right To Learn on an Open Internet: A Conversation Convened by Common Crawl Foundation and Professor Jeff Jarvis

#commoncrawlfoundation

https://lu.ma/3g9vhzvd

AI and the Right To Learn on an Open Internet: A Conversation Convened by Common Crawl Foundation and Professor Jeff Jarvis · Luma

AI & The Right to Learn on an Open Internet A Conversation Convened by Common Crawl Foundation and Professor Jeff Jarvis Date: April 30, 2024 The Common Crawl…