Teaching AI Ethics

Update: since I wrote this original post covering the nine areas, I've expanded each one into a complete article. Have a read through this post, and then when you're ready to dive deeper into AI ethics, check out the full series here. If you linked to this post as part of a course or university resource, I suggest updating your links with the complete series. https://leonfurze.com/ai-ethics/ As we head into the start of Term 1 it's already looking like Artificial Intelligence is going to be […]

https://leonfurze.com/2023/01/26/teaching-ai-ethics/

The AI Iceberg: Understanding ChatGPT

Analogies are useful for understanding complex ideas, and there are plenty of complexities for educators trying to wrap their heads around ChatGPT. In this post, I’ll try to explain some of the features of the chatbot and the model it’s built on top of. I'm deliberately avoiding any kind of analogy that represents the AI as magical, mythical, human, or godlike - we've seen enough of them. I’m not claiming that this analogy is watertight or that there is no better way to conceptualise […]

https://leonfurze.com/2023/05/18/the-ai-iceberg-understanding-chatgpt/

Idea for a new prize: big LLM maker segments its training data (or maybe even just #CommonCrawl) by originating person, runs DataSHAP on the segments, gives a prize to the highest scorer.

I have no idea how to think about who it would be.

📸🤦‍♂️ Nathan Rooy discovers that flashy websites are like McDonald's cheeseburgers: popular for being just "good enough." Instead of a gourmet web experience, it's a buffet of #mediocrity sourced from Common Crawl's greatest hits. Web connoisseurs, prepare to feast on the bland! 🍔💻
https://nry.me/posts/2025-10-09/small-web-screenshots/ #flashywebsites #webdesign #cheeseburgers #CommonCrawl #HackerNews #ngated
One million (small web) screenshots

One million (small web) screenshots

nry.me
The Company Quietly Funneling #Paywalled Articles to #AI Developers
#CommonCrawl's website states that it scrapes the internet for "freely available content" without "going behind any '#paywall.'" Yet the organization has taken articles from major news websites that people normally have to pay for — allowing AI companies to train their #LLMs on high-quality journalism for free.
In #2020, #OpenAI used Common Crawl’s archives to train #GPT3.
https://www.msn.com/en-us/money/news/the-company-quietly-funneling-paywalled-articles-to-ai-developers/ar-AA1PMBHE
MSN

Mashable: Common Crawl accused of feeding paywalled content to AI companies. “In a detailed investigation for The Atlantic, reporter Alex Reisner reveals that several major AI companies have quietly partnered with the Common Crawl Foundation — a nonprofit that scrapes the web to build a massive public archive of the internet for research purposes.”

https://rbfirehose.com/2025/11/09/mashable-common-crawl-accused-of-feeding-paywalled-content-to-ai-companies/

Mashable: Common Crawl accused of feeding paywalled content to AI companies | ResearchBuzz: Firehose

ResearchBuzz: Firehose | Individual posts from ResearchBuzz
Common Crawl - Setting the Record Straight: Common Crawl’s Commitment to Transparency, Fair Use, and the Public Good commoncrawl.org/blog/setting-t… #AI #CommonCrawl #data #WebArchiving (wow, that Atlantic piece was bad, needing this rebuttal)
Common Crawl - Setting the Record Straight: Common Crawl’s Commitment to Transparency, Fair Use, and the Public Good https://commoncrawl.org/blog/setting-the-record-straight-common-crawls-commitment-to-transparency-fair-use-and-the-public-good #AI #CommonCrawl #data #WebArchiving (wow, that Atlantic piece was bad, needing this rebuttal)
Common Crawl defends archive practices amid deletion claims: Nonprofit Common Crawl issued November 4 statement defending data collection methods, citing technical constraints preventing content deletion. https://ppc.land/common-crawl-defends-archive-practices-amid-deletion-claims/ #CommonCrawl #DataArchive #Nonprofit #DigitalPreservation #DataCollection
Common Crawl defends archive practices amid deletion claims

Nonprofit Common Crawl issued November 4 statement defending data collection methods, citing technical constraints preventing content deletion.

PPC Land