Everything is a lot right now. Want to tune it all out for ~10 minutes and help us better understand your energy data needs? We're kicking off our first (hopefully annual!) PUDL community survey. Please boost! Both current and possible future PUDL users encouraged.

https://forms.gle/cT1ZY4yunhza3YLK9

#EnergyTransition #PUDL #OpenData #EnergyData #OpenSource

2025 Public Utility Data Liberation Ecosystem Survey

Welcome to the PUDL ecosystem survey! We want to learn a few things: What kinds of energy data work are people doing? This helps us understand our potential impact. What is most important to energy data users? This helps us prioritize the many potential improvements and data integrations on our list. How are people interested in participating in the PUDL project? This helps us figure out where to focus on outreach and process improvements. Energy data practitioners as a group are historically less diverse than the US as a whole. To help us better serve historically marginalized users and ensure a more representative set of stakeholders in energy policymaking, we are collecting some (optional) demographic information.

Google Docs

For the energy data nerds: we've got a new data release out. PUDL v2024.11.0 includes quarterly updates to EIA 860M, EIA 923 year-to-date, EIA 930, EPA CEMS, and final 2023 data for the EIA 861. Comment in this GitHub discussion if you find anything weird. (or just to say Hi 👋)

https://github.com/orgs/catalyst-cooperative/discussions/3967

#EnergyTransition #OpenData #PUDL

PUDL v2024.11.0 is available! · catalyst-cooperative · Discussion #3967

Overview PUDL v2024.11.0 is a regular quarterly release, incorporating a few updates to the following datasets that have come out since the special release we did in October. New Data Coverage EIA ...

GitHub

We want to apply to the Google Season of Docs for #PUDL but have never worked with an outside technical writer before. Does anybody have someone to recommend? It's a #Python project focused on producing open data describing the US energy system.

Cc: @turingway @choldgraf @yabellini @leahawasser

#PyData #WriteTheDocs #EnergyTransition #OpenData #OpenSource #EnergyMastodon

@catalystcoop @ZaneSelvans are there any other open utility databases/projects besides #PUDL?

Now that we're putting all our denormalized output tables and analyses into the #PUDL DB, we've got a lot more #metadata to manage, and are trying to figure out how to best combine existing tools to do it.

GitHub Discussion: https://github.com/orgs/catalyst-cooperative/discussions/2546

Currently we store column, table, and dataset level information in big JSON-ish #python data structures, which are converted into objects using @pydantic models based (loosely) on the #FrictionlessData tabular data package abstractions.

#datadon

Existing tools for managing our metadata? · catalyst-cooperative · Discussion #2546

We have a lot of metadata describing the hundreds of tables and thousands of columns that are part of PUDL, and a somewhat homebrew system for managing it, using a mix of Pydantic and SQLAlchemy. I...

GitHub

The @dagster folks interviewed us and did a write-up of our migration of #PUDL from a messy DIY #Python ETL to using their orchestration framework, which has thus far been a very positive experience. Unlike most of their users we are producing #OpenData outputs. Very curious to see if other non-profit / open-data users will adopt the platform:

https://dagster.io/blog/catalyst-cooperative-case-study

#DataEngineering #datadon #EnergyMastodon #OpenSource #EnergyTransition

Catalyst Cooperative: Liberating Public Utility Data with Dagster | Dagster Blog

The PUDL Project cleans and distributes analysis-ready energy system data to climate advocates, researchers, policymakers, and journalists.

As #PUDL moves toward distributing only data (and much more of it) rather than expecting everyone to run the software (with its 500+ dependencies...) we're going to deprecate our output management layer.

We see two possible deprecation paths. Should we go slow? Or rip the band-aid off now?

#OpenData #EnergyTransition #OpenSource #EnergyMastodon #datadon #pydata #EnergyTwitter

Discussion on GitHub here: https://github.com/orgs/catalyst-cooperative/discussions/2503

Deprecation of the `PudlTabl` output caching class · catalyst-cooperative · Discussion #2503

Part of the motivation behind our move to Dagster is the proliferation of useful output tables that are derived from the public data we curate. Some of these are simple denormalized tables that are...

GitHub

I did not realize you can post up to 100GB of data to #Kaggle and they provide access to computational resources and #Jupyter notebooks.

We're thinking about automatically posting all our #PUDL data there, and maybe running community competitions to help solve entity matching, anomaly detection, and imputation problems. Is there any downside to doing this?

#OpenData #MachineLearning #DataScience #EnergyTransition #EnergyTwitter #EnergyMastodon

https://www.kaggle.com/datasets/zaneselvans/catalyst-cooperative-pudl

Catalyst Cooperative PUDL

US Electricity System Data from EIA, FERC, and EPA

A few #PUDL announcements!

https://github.com/orgs/catalyst-cooperative/discussions/2475

Our migration to @dagster is progressing rapidly. If you use PUDL and run the ETL yourself, and need help getting Dagster set up, feel free to sign up for office hours:

https://calendly.com/catalyst-cooperative/pudl-office-hours

Or ask for help in our GitHub discussions:

https://github.com/orgs/catalyst-cooperative/discussions

#OpenSource #OpenData #EnergyTransition #EnergyMastodon #datadon #DataEngineering #EnergyTwitter

PUDL now using Dagster, GitHub Projects, and Python 3.11 · catalyst-cooperative · Discussion #2475

Dagster Orchestration If you follow the repo at all, you've probably already noticed that we've made some big changes to the architecture. The biggest is the shift to using Dagster to orchestrate o...

GitHub