In this post we explore a first set of data quality challenges towards a crowdsourced database of European #DataCenters.

The core dataset we use comes from #OpenStreetMap (OSM).

The ultimate aim is a comprehensive environmental footprint database ( #ODbL )

https://www.openriskmanagement.com/data-quality-discussion-crowdsourced-database-european-data-centers/

@openrisk
thank you for doing this interesting work. Detailed posts look impressive.

Lithuania on the map is empty, but it does have data centers. I will research where they are located to fill in OpenStreetMap (OSM) data. I expect to find 10-20.

And that means that OSM data is quite incomplete right now.

@mindaugas Yes, I suspect all EU countries have at least some. Overall the OSM dataset is quite impressive in terms of data center identification (given that its purely crowdsourced and not specifically for this task 😍 ). In many cases it has not just points but the entire shape of the buildings. But there are various issues to fix. In future steps the plan is to create more detailed reports to help people identify incompleteness and other data quality issues. But it looks promising 🌱

@openrisk on sustainability: I found a datacenter in Lithuania which is to provide heating and hot water for municipality (reusing energy), part of EU wide "ReUseHeat" movement https://www.euroheat.org/dhc/eu-projects/re-use-heat (they say it is 8000 MWh per year of energy, or heating for 1300 apartments)

This datacenter also will host Lithuania's new "AI factory" equipment (project cost €130 mln) https://lithuania.lt/governance-in-lithuania/lithuania-launches-e130-million-national-ai-centre-litai/
https://digital-strategy.ec.europa.eu/en/policies/ai-factories

The owner of this datacenter is the government itself (government owns 100% shares of the company; and this company owns 4 datacenters total)

I just marked this datacenter on the map https://www.openstreetmap.org/search?lat=54.687272&lon=25.216080&zoom=17 (maybe 600 sq.m. whitespace, 1800 sq.m. total). Exactly next to the TV tower.

Energy reuse metric could be incorporated into database, because such practice seems to significantly increase ~sustainability of the data center. But it requires special conditions:
1) Colder climate;
2) Central heating (and water heating) systems;
3) Datacenter location within the city (to be able to connect to the system)
4) Liquid cooled data-centers (far more efficient reuse);
So such practice more common in cities of Northern Europe.

@mindaugas

love this, I was just re-runing the queries for various double checks and found somebody added a DC in Lithuania two hours ago 🤣 🤣

https://www.openstreetmap.org/way/355168270#map=16/54.68717/25.21783

yes, documenting (any) energy reuse will be important. It is already part of EU GPP criteria, so when the procurer is the public sector there might be some relevant data.

https://digital-strategy.ec.europa.eu/en/policies/green-cloud

It might also be in sustainability reports from operator side. In NL the TNO has created a blueprint:

https://publications.tno.nl/publication/34645393/NMNmfucO/TNO-2025-R12900.pdf

Way: 355168270

OpenStreetMap is a map of the world, created by people like you and free to use under an open license.

OpenStreetMap

@openrisk

Lithuania has ~20 ~datacenters.

Now sending another example from another company, which operate 3 datacenters in Lithuania (https://delska.com/data-centers/)

Attaching pictures of each (DC1, DC2, DC3) how the buildings look like, and their parameters (MW and rack count) from company website. Also added streetview pictures.

Which of these building examples are worth changing to 'data_center' in OSM? How it adding or not affects calculations?

DC1: it have a lot of generators outside, but building seems too big given power rating or rack count. I guess half of it is DC

DC2: building have 3 high ceiling levels from one side, and 5 low ceiling levels from another side. I guess less than half of it is DC

(P.S. I just look that it has an unexpected shape in OSM - it is larger building than appeared initially, I added that shape to the collage also)

DC3: this new building is complex. Number of levels changes in different sections. DC area may take small percentage of this building. But DC3 is larger than DC2, and even comparable to DC1 by rack count.

@mindaugas interest in properly mapping #datacenters in #openstreetmap seems to be spiking up. I just found a very recent discussion and proposal for further data center specific attributes which is very relevant:

https://wiki.openstreetmap.org/wiki/Proposal_talk:Data_center_technical_attributes

Reaching accurate values for every DC instance will have to be an iterative process. The approach is to keep track of multiple sources (company reports, surface area estimates, public sector data etc.) and gradually resolve to ground truth.

Proposal talk:Data center technical attributes - OpenStreetMap Wiki

@openrisk discussion suggests to use 'building:part=data_center' to describe part of building as datacenter in some future.

For now I will mark DC1 on the map (as ~50% DC is good enough for such simple building), and skip DC2 and DC3.

I apply similar view to the rest, and after reviewing all datacenters, we will see what percentage of them will end up on the map.

@openrisk after review, 5 datacenters should be on OSM map in Lithuania (25 % of total).

Existing datacentermap.com looks to be very comprehensive and accurate.

But there is a case where they group 6 datacenters into 1
https://www.datacentermap.com/lithuania/vilnius/baltic-data-center/

Their addresses are unclear, but I managed to find at least two from this list and confirming visually they do appear to be separate datacenters in more locations (also within other buildings likely), meaning total number is above >20.

OSM data compared against datacentermap.com could reveal some new insights.

@openrisk at least additional 1 will appear in OSM next year. For now it is a "construction site":)

@mindaugas future #datacenters at various stages of development and how they are captured in OSM and other databases is yet another can of worms to address😅

In principle in such evolving portfolio sets you need the decomposition into past (historical / now defunct but useful for trend analysis), current and performing where the focus is on operational impact, and future developments which can be used to project future impact (but also estimate embedded impacts - the construction phase)

@openrisk I am not aware of some defunct datacenters. But I know many detacenters often change ownership, goes from one company to another, and to another.

datacentermap.com has accurate current ownership, and also has that 'under construction' datacenter in listings, saying its underconstruction.

OSM construction site also accurately names that it is construction site of datacenter (but in words, not tags), also have opening_date („Expected Opening Date“) listed (as 2027-06)

@mindaugas I have also found the datacenteramp as comprehensive and seems long running database. It is a commercial project, but the basic info seems from the disclosures of companies. Once we have the list of operators active in a country, their own disclosures offer the most up-to-date picture. But they vary a lot in what they disclose, esp around environmental imprint.

The grouping of buildings is a very interesting data modeling challenge that I am still working on.

1/2

@mindaugas The company reports of the large operators, when they mention relevant details about #datacenter specifications, they don't break it down by building, but rather per "campus", a collection of buildings. This creates a mismatch when trying to align the reported intensity of energy, water use etc. with actual OSM features, surface area etc.

The #OSM data model does have a "relations" mechanism to group objects but its not used much in the datacenter context

https://wiki.openstreetmap.org/wiki/Relation

Relation - OpenStreetMap Wiki