Very nice report on the fire at North Hyde substation which took down Heathrow in March.

tl;dr:

* Fire was caused by moisture ingress in a high voltage bushing which was detected in 2018 but got lost in the system.
* North Hyde is an ageing substation which didn't have adequate protection against spread of fire
* Heathrow didn't think that a loss of grid feed was a plausible risk, so they assumed 10-12 hours was a reasonable recovery time from that.

https://www.neso.energy/document/363891/download

Also missed that Heathrow published their own report last week:

https://www.heathrow.com/latest-news/kelly-review-published

Just to add: the GB electricity transmission system is still one of the most reliable in the world, with a reliability of 99.999930% last year. This incident hit a lot of holes in the swiss cheese, and I'm sure it will be learned from.

I'd rather it wasn't privatised, but National Grid Electricity Transmission is definitely one of the more successful of our privatised utilities. (Though I think at least some of the credit for that has to go to the regulator.)

And here comes the blame game. (Which I am, for the avoidance of doubt, not particularly interested in.)

https://www.bbc.co.uk/news/articles/cly22eelnxjo

Heathrow shutdown caused by problem found seven years ago

An investigation has been launched by the energy regulator into National Grid after finding issues at a substation were not fixed.

BBC News

Now is a very good time to read this, especially if you are about to reply to me with a hot take. It's not very long.

https://how.complexsystems.fail/

How Complex Systems Fail

@russss i was just thinking of that and the whole notion of cascade failures.

We only ever to get to hear about those which slip past all the barriers.
Nice mismatch of assumptions in LHR i. 2.11 "so rare we can support 12 failover" and supplier in 2.10 "we assumed know you could handle a failure of one of your three independent substations as that is why you have three of them"

@russss Interesting. A lot of that seems very reminiscent of *Systemantics* / *The Systems Bible* by John Gall (1975). Especially Gall's particularly memorable *Fundamental failure-mode theorem* - "Complex systems usually operate in failure mode" which is echoed so closely by Cook's "Complex systems run in degraded mode."

https://en.wikipedia.org/wiki/Systemantics

Systemantics - Wikipedia

@russss what does "reliable" mean in this context?
@pft had to look that one up - it's the amount of energy which was actually delivered compared to the estimated amount which should have been delivered.

@russss @pft

I think the term we ought to be looking for is redundant. If one sub station goes down the other should be capable to taking over and supply the required power. This was not done. Given that GB electricity supply reliability is north of 99% that should have been easier to accomplish.

I hope the proposed Heathrow expansion does tackle this. It is incredulous. One substation takes down with it one crucial node in world wide travel.

@welkin7 @russss I consider redundancy different from this definition of reliability. I can imagine having redundant but unreliable systems.
@russss thank you! That is very interesting. I suppose the scope is then the whole GB, right? I'm totally new to this 😅
@russss There's a classic IT one in there; they lost their remote access systems about 3:30am possibly due to loss of cooling in an otherwise powered datacentre - as normal the cooling for the data centre wasn't on the backup.
13.17 is interesting though - they can't open a terminal if the emergency lighting batteries are flat after having operated for a long time; I bet most plces don't think of that..

@russss I don't think I can get away with reading a 77 page document while at work, but the highlights certainly look interesting. One for later!

Certainly interesting that the suppliers don't necessarily know that there is CNI hanging off a particular substation

@wishy @russss Just the 2 pages of executive summary are a doozy. Heathrow has 3 independent supply points, but needs all 3 running to cover all critical systems with failover being a manual process requiring 12 hours of work. Insane!

@FlorianTischner @wishy @russss It's a present-state that sounds insane; but the organic process by which just a bit of additional load here, and a little bit over there, were added in response to various new requirements with the OK of assorted people was probably exceptionally common.

It's practically a full time job(and needs institutional backing) to maintain a step back and a look at whether what we are changing compromises anything that was like that for a reason.

@russss DC operators: We have at least two separate feeds from the outside to your equipment + UPS and generators.
Heathrow: Power can fail? *surprised pikachu*

@nicoduck @russss Maybe Heathrow also bought into the belief that GB electric grid had a reliability of about 99%, so they need not invest in any redundant system, like UPS/Batteries/Generators/Fuel Cells/etc.

Basic systems engineering 101: Always have a fall back and redundant systems in place.

Would love to know what were Heathrow's system design principles for power.

@welkin7 @nicoduck @russss I can guarantee that when the power design was done (I'll bet it was decades ago) it would have been reasonably adequate.

Fast forward to now-ish and a lot of the assumptions made during the creation of the design are now incorrect/outdated.

I see that sort of thing more than I'd like to admit.

Perhaps in my lifetime I'll get to see that LHR announce power works, but I won't hold my breath

@quikkie @nicoduck @russss

Even decades ago, either prior to fall of Berlin wall or post the fall of Berlin wall, the principles were the same. Tools, i.e. batteries/Generators/Fuel Cells, change. A Globally crucial airport, being supplied by a single sub station without any redundant sub station in play is befuddling. Do Gatwick, Stansted suffer from the same malady?

Plan A: North Hyde
Plan B: Different Sub station.
Plan C: Generators or Batteries.

@welkin7 @quikkie @russss did you read the report? They have multiple stations, just nut configured as a redundant setup
@russss why wasn't there a redundant substation? Seems odd there was a single point of failure.

@russss

And the responsible persons took extra large bonus payments for reducing maintenance costs.
Personal consequences for management will be nil as usual.

@lohankuo @russss
The deferment of the 2022 basic maintenance was done not by Heathrow Airport Limited. It was allegedly done by one of the parts of GB electric grid. So not sure Heathrow can be faulted for that. It can be faulted for basic systems engineering principles.

I wonder how many such deferments have happened in other parts of GB Electric grid since 2019 or post COVID. This is not painting a pretty picture.

@russss Was there no other sub-station which could pick up the slack due to loss the North Hyde? Was Heathrow supplied and is still supplied by a single substation?

Also how come moisture ingress that happened in 2018 resulted in a fire in 2025? I get that deferring 2022 maintenance on one of the transformers was contributing factor. But it's surprising that all 3 transformers were in such close proximity that fire in one caused a cascade failure in others.

cc @Lydie

@welkin7 @russss My guess is that the substations are already at capacity, there just wasn't capacity to pick up the slack. And, it's not easy to switch what substation feeds a site if it doesn't have existing redundancy - the wires only go one place.
@welkin7 @russss @Lydie You'd be best reading the report, or at least the exec summary. TLDR; they're supplied from 3 substations, but switchover is manual and takes 10 hours as Heathrow Airport Limited believed it was an unlikely event as upstream was "redundant"
@russss The only great thing about England are certain people and actual English women and to be frank only a minority over 40. #Facts.
@russss That's a pretty nicely written report! Page 28 explains why it took so long to let the fire brigade in, which I'd thought was weird when I first heard it; but they had to do work at ~3 other substations to make that one safe. It's pretty worrying how much urgent work they'd managed to delay, both the transformer itself and the fire suppression systems that were out (albeit they don't think it would have helped that much).
@russss oh, linking recovery time to probability of failure. I have seen that before, and never understood the reasoning.
In my case it was a tiny chance of flooding and basically recovery time was then one week.
Just that this would have made the company bust.
So in case of flooding, the desaster recovery documents convolutely said, that we go bankrupt. Problem "solved"?

@russss Haven't had a chance to sit down and read this until now. 7.16 (Fire suppression system installed in 2002 inop for at least 3 years) is a real popcorn moment.

Someone didn't flag a high moisture reading properly is some of those things that happens in large complex systems, but tagging out a fire suppression system for 3 years...

7.21 suggests that it might not have been enough anyway.