An interesting paper from @emeryberger et al., showing that, in contrast to prior work, (in my words) energy use across programming languages is a proxy for how long a program takes to execute, and that other factors don't meaningfully affect energy usage. https://arxiv.org/abs/2410.05460
It's Not Easy Being Green: On the Energy Efficiency of Programming Languages

Does the choice of programming language affect energy consumption? Previous highly visible studies have established associations between certain programming languages and energy consumption. A causal misinterpretation of this work has led academics and industry leaders to use or support certain languages based on their claimed impact on energy consumption. This paper tackles this causal question directly: it develops a detailed causal model capturing the complex relationship between programming language choice and energy consumption. This model identifies and incorporates several critical but previously overlooked factors that affect energy usage. These factors, such as distinguishing programming languages from their implementations, the impact of the application implementations themselves, the number of active cores, and memory activity, can significantly skew energy consumption measurements if not accounted for. We show -- via empirical experiments, improved methodology, and careful examination of anomalies -- that when these factors are controlled for, notable discrepancies in prior work vanish. Our analysis suggests that the choice of programming language implementation has no significant impact on energy consumption beyond execution time.

arXiv.org

@ltratt @emeryberger

It's a shame they don't dig a bit more into the parallelism. The paper says:

Further, RAPL samples include all cores, even if the program under test only uses a single core. If a benchmark is single-threaded or generally uses fewer cores than available, idle cores will be included in the energy consumption measurement. Therefore, using a varying level of parallelism across benchmark implementations can result in unfair comparison, as idle cores will add some constant energy consumption to each sample.

Which makes me scream 'yes but!'. Modern SoCs can independently adjust the clock speed of cores and typically have different cores with different power / performance tradeoffs. Leakage current means that it's far more power efficient if you can run the same workload in 1s on two cores clocked down to 800 MHz than one clocked at 1.6 GHz.

But there are confounding factors here. A few years ago, we found that turning off CPU affinity entirely in the FreeBSD scheduler made some workloads much faster. The workloads were bounded by the performance of a single-threaded component but pinning that to a single core made that core hot and then the CPU throttled the clock speed and made it slower. Having it move around unpredictably distributed the heat, which allowed the heat sink to work better.

I'm quite surprised by Fig 11d. I wonder how this varies across systems: actively reading DRAM consumes a lot more power than simply refreshing (the paper says 40% for refresh, I think this varies a bit across DRAM types), but perhaps the base load is so high and the read rates are so low that this doesn't make a difference. Or maybe the cache miss rates are all very low?

The highest numbers are around 90 M LLC misses per second. I think Intel chips do 128 B burst reads, so that's around 10 GB/s, which is around 2.5% of my laptop's peak memory bandwidth. Desktop / server RAM can sustain higher read rates. The difference between 0% and 2.5% of the maximum read rate may not be very big.

On this benchmark, Boost’s library performs significantly worse than PCRE. This outlier alone accounts for the entire reported gap between C and C++.

That's pretty embarrassing for Boost. I'd expect that a C++ RE implementation could build an efficient state machine at compile time and feed that through the compiler for further specialisation, whereas PCRE has to do it dynamically.