Mastodawn

Web Feeds in 2026: A Survey

Mark Nottingham이 2026년 웹 피드 현황을 Common Crawl 데이터와 AI 에이전트를 활용해 조사했다. 조사 결과, 상위 50만 사이트 중 35.9%가 피드 자동발견을 지원하지만, 실제로 유지·관리되는 고품질 피드는 19.1%에 불과해 많은 피드가 방치되고 있음을 확인했다. 특히 WordPress, Drupal, Blogger 같은 CMS에서 생성된 피드의 품질이 낮아 CMS에서 피드 관리를 개선할 필요가 강조된다. RSS와 Atom 피드는 여전히 공존하며, 다국어 피드는 거의 없고 대부분 단일 언어로 제공된다. 자동발견 링크가 반드시 고품질 피드를 보장하지 않아 새로운 자동발견 메커니즘 제안도 포함되었다.

https://mnot.net/blog/2026/feed-survey

#webfeeds #commoncrawl #rss #atom #cms

Web Feeds in 2026: A Survey

I looked through Common Crawl and found over 300,000 parseable RSS/Atom feeds, confirming that Web feeds are still a major part of the Open Web. But most aren’t high quality, and autodiscovery often points users at stale or abandoned feeds.

Mark Nottingham

PPC Land May 4

FYI: News publishers target Common Crawl, the AI training data backdoor: News/Media Alliance sent a formal letter to Common Crawl demanding it stop unauthorized scraping and block AI companies from using news content for training. https://ppc.land/news-publishers-target-common-crawl-the-ai-training-data-backdoor/ #News #Media #CommonCrawl #AITraining #DataPrivacy

News publishers target Common Crawl, the AI training data backdoor

News/Media Alliance sent a formal letter to Common Crawl demanding it stop unauthorized scraping and block AI companies from using news content for training.

PPC Land

PPC Land May 2

ICYMI: News publishers target Common Crawl, the AI training data backdoor: News/Media Alliance sent a formal letter to Common Crawl demanding it stop unauthorized scraping and block AI companies from using news content for training. https://ppc.land/news-publishers-target-common-crawl-the-ai-training-data-backdoor/ #AI #NewsMedia #CommonCrawl #DataPrivacy #WebScraping

News publishers target Common Crawl, the AI training data backdoor

News/Media Alliance sent a formal letter to Common Crawl demanding it stop unauthorized scraping and block AI companies from using news content for training.

PPC Land

PPC Land May 1

News publishers target Common Crawl, the AI training data backdoor: News/Media Alliance sent a formal letter to Common Crawl demanding it stop unauthorized scraping and block AI companies from using news content for training. https://ppc.land/news-publishers-target-common-crawl-the-ai-training-data-backdoor/ #CommonCrawl #AITechnology #NewsPublishers #MediaAlliance #DataPrivacy

News publishers target Common Crawl, the AI training data backdoor

News/Media Alliance sent a formal letter to Common Crawl demanding it stop unauthorized scraping and block AI companies from using news content for training.

PPC Land

HECHT INS GEFECHT Apr 14

Amazon.de hat HC Rank 201. Mediamarkt.de: 418.396 – 2.000x weiter außen im Web-Netzwerk. Bei fast identischer Backlink-Stärke.

Common Crawl nutzt Harmonic Centrality als Crawl-Priorität. 64% aller großen Sprachmodelle trainieren auf Common-Crawl-Daten (Mozilla 2024).

Wer nach Domain Authority optimiert, optimiert für Google. Für KI-Zitationen zählt die Position im Link-Graph.

https://hechtinsgefecht.de/harmonic-centrality/

#SEO #GEO #LLMSichtbarkeit #CommonCrawl

Harmonic Centrality: Das Ranking-Signal für KI-Systeme 🔗

Harmonic Centrality aus dem Common Crawl Web Graph zeigt, wie nah deine Domain am Kern des Webs ist – und wie oft LLMs sie zitieren. Mit echten Daten.

HECHT INS GEFECHT

benny windolph Mar 30

Mediamarkt.de hat fast denselben PageRank wie Otto.de – aber einen HC Rank von 418.396 vs. 5.153. 📊

Harmonic Centrality misst, wo eine Domain im Web-Netzwerk sitzt. Nicht wer auf dich verlinkt, sondern wie zentral du bist. Common Crawl nutzt genau diesen Wert für die Crawl-Priorität – und 64 % aller LLMs trainieren auf Common-Crawl-Daten.

Backlink-Stärke und Netzwerkposition sind nicht dasselbe.

https://hechtinsgefecht.de/harmonic-centrality/

#SEO #HarmonicCentrality #GEO #KI #CommonCrawl

Harmonic Centrality: Das Ranking-Signal für KI-Systeme 🔗

Harmonic Centrality aus dem Common Crawl Web Graph zeigt, wie nah deine Domain am Kern des Webs ist – und wie oft LLMs sie zitieren. Mit echten Daten.

HECHT INS GEFECHT

Dr Leon Furze Jan 26, 2023

Teaching AI Ethics

Update: since I wrote this original post covering the nine areas, I've expanded each one into a complete article. Have a read through this post, and then when you're ready to dive deeper into AI ethics, check out the full series here. If you linked to this post as part of a course or university resource, I suggest updating your links with the complete series. https://leonfurze.com/ai-ethics/ As we head into the start of Term 1 it's already looking like Artificial Intelligence is going to be […]

https://leonfurze.com/2023/01/26/teaching-ai-ethics/

Dr Leon Furze May 18, 2023

The AI Iceberg: Understanding ChatGPT

Analogies are useful for understanding complex ideas, and there are plenty of complexities for educators trying to wrap their heads around ChatGPT. In this post, I’ll try to explain some of the features of the chatbot and the model it’s built on top of. I'm deliberately avoiding any kind of analogy that represents the AI as magical, mythical, human, or godlike - we've seen enough of them. I’m not claiming that this analogy is watertight or that there is no better way to conceptualise […]

https://leonfurze.com/2023/05/18/the-ai-iceberg-understanding-chatgpt/

Colin Rowat Feb 18

Idea for a new prize: big LLM maker segments its training data (or maybe even just #CommonCrawl) by originating person, runs DataSHAP on the segments, gives a prize to the highest scorer.

I have no idea how to think about who it would be.