How Hyper-Datafication Impacts the Sustainability Costs in Frontier AI
Sophia N. Wilson, Sebastian Mair, Mophat Okinyi, Erik B. Dam, Janin Koch, Raghavendra Selvan
How Hyper-Datafication Impacts the Sustainability Costs in Frontier AI
Sophia N. Wilson, Sebastian Mair, Mophat Okinyi, Erik B. Dam, Janin Koch, Raghavendra Selvan

Large-scale data has fuelled the success of frontier artificial intelligence (AI) models over the past decade. This expansion has relied on sustained efforts by large technology corporations to aggregate and curate internet-scale datasets. In this work, we examine the environmental, social, and economic costs of large-scale data in AI through a sustainability lens. We argue that the field is shifting from building models from data to actively creating data for building models. We characterise this transition as hyper-datafication, which marks a critical juncture for the future of frontier AI and its societal impacts. To quantify and contextualise data-related costs, we analyse approximately 550,000 datasets from the Hugging Face Hub, focusing on dataset growth, storage-related energy consumption and carbon footprint, and societal representation using language data. We complement this analysis with qualitative responses from data workers in Kenya to examine the labour involved, including direct employment by big tech corporations and exposure to graphic content. We further draw on external data sources to substantiate our findings by illustrating the global disparity in data centre infrastructure. Our analyses reveal that hyper-datafication does not merely increase resource consumption but systematically redistributes environmental burdens, labour risks, and representational harms toward the Global South, precarious data workers, and under-represented cultures. Thus, we propose Data PROOFS recommendations spanning provenance, resource awareness, ownership, openness, frugality, and standards to mitigate these costs. Our work aims to make visible the often-overlooked costs of data that underpin frontier AI and to stimulate broader debate within the research community and beyond.
People's resistance against #hyperscale
#datacenters in #Denmark
https://www.information.dk/moti/2026/04/nej-tak-kan-godt-skrive-stort
[paywall]
the line between #vulnerability #disclosure and #AI #advertisement becomes ever more blurry ....
#thereIsNoAI
#thereIsInParticularNoSustainableAI
#alsoNoReponsibleAI
CVE-2026-31431 #copyFail
Clearly, humanity needed "AI"
""There is compelling and concerning #data that explicit deepfakes have increased on the #internet as much as 550% year on year since 2019," Julie Inman Grant wrote after advising parliament on the new laws in 2024.
"It's a bit shocking to note that #pornographic videos make up 98% of the #deepfake material currently online and 99% of that imagery is of women and girls.""
Always a favorite - FruitML = teaching simple #TinyML with #fruit detection as the task, in our #IoT course
#yolo #mobileNets #InternetOfThings #MachineLearning #EdgeImpulse #TeachableMachines
#thereIsNoAI
#thereIsInParticularNoSustainableAI
but some tiny ML is fun
"AI" synthesizing conference sessions - that's a first time for me seeing that oneoffered.
We basically dont need conferennces anymore -
we can have "AI"s hallucinate our favorite viewpoints on demand.
" Subscribers to the Premium Edition have access to enhanced features and productivity tools
Access new AI features such as short synopses, article summaries, content recommendations, and podcasts synthesizing conference sessions
“#Meta, #Google and #Microsoft have all baked [generative AI] deep into their systems,” Joshi says. “I see this all as very much part of the tactic of trying to embed these systems into society and instil dependency in a fashion similar to the growth of single-use plastics in the 1970s.”
https://www.theguardian.com/australia-news/2026/mar/13/ai-datacentres-environmental-impacts
@thomasfuchs
while this is an expected and legitimate rhetorical move, the thing is that we actually know quite a bit about human consciousness - if mostly by exclusion.
We know it does not make the key mistake of digitizing,
we know there s no storage medium,
it does not scrape,
it does not 'improve' by throwing more MegaWatts at itself, but instead runs just fine on 10-12 W.
So we do know it is fundamentally different.
in case you ever considered #Verification
of #Identity on #LinkedIn
https://thelocalstack.eu/posts/linkedin-identity-verification-privacy/
#IdentityVerification #AI #thereIsNoAI #OpenAI #Microsoft #Microslop #DPF