Mastodawn

Fulmotondro Sep 7, 2023

Everyone is reporting on "Air traffic chaos caused by 'one in 15 million' event". https://www.bbc.co.uk/news/business-66723586

But the report says "This scenario had never been encountered before, with the system having previously processed more than 15 million flight plans" https://www.nats.aero/news/nats-report-into-air-traffic-control-incident-details-root-cause-and-solution-implemented/

Air traffic chaos caused by 'one in 15 million' event

The UK's air traffic control system shut itself down after software confusion over an unusual flight path.

BBC News

Show thread

Gary James Sep 6, 2023

@standupmaths The important thing here isn't how many times things have gone correctly, but when the next colossal cock-up will take place - and how to mitigate the effects of same.

Yes, having exact reporting would be nice, but more interesting is the question of where other, as yet unencountered, weaknesses lie in wait.

Given the closed nature of the system, without an opportunity for people to TRY and break it, this is going to happen again sooner rather than later.

Optimism, right? :D

Show thread

Robee? Na! 🌈Sep 6, 2023

@standupmaths I’m super interested in why the system imported bad data and then decided to halt because of the data, instead of maybe halting before it imported the data. 🤷‍♀️ Actually I’m even more interested in how the teams that wrote the code are structured… did nobody on any team consider this could happen? Is there a Jira ticket in the backlog to address it that’s been there a while? So many questions!

Show thread

Amo Bishop Rodent Sep 6, 2023

@RobeeShepherd @standupmaths Jira never made anything better

Show thread

Robee? Na! 🌈Sep 6, 2023

@pikesley @standupmaths

Show thread

Serge Matveenko ♻️☮️ ⩜⃝Sep 6, 2023

@standupmaths So, it was zero in 15 million event then:)

Show thread

Matt 🔶 (LordMatt)Sep 6, 2023

@standupmaths The article says the system was designed to stop when it encountered bad data. Generally bad data should be isolated and an urgent alert flagged for a human to do something. Otherwise, it looks like an easy DoS attack vector.

Show thread

Eddie Coldrick 💻Sep 6, 2023

@lordmatt @standupmaths This is what I thought. Surely you'd let it just flag an error, but continue doing other flight plans. Unless it is assuming that if that data is bad, all other data may be bad. Plus, they said they need to ask the manufacturer to find the error. Can the system not just tell you there was an exemption e.g. email notification, text in capitals in logs🤣 I'm sure they'll have reasonable explanations for at least some of these things, but I'm still confused!

Show thread

Matt 🔶 (LordMatt)Sep 6, 2023

@eddie @standupmaths I hope they figure that out and change because now that bad actors know this is a thing...

Show thread

Dewi Ioan Sep 12, 2023

@eddie @lordmatt @standupmaths It looks like from what little they have said if a flight plan is bad then it sends it to be manually checked, the failure that happened meant all flight plans were having to be manually checked, rather than automatically, this is why it was slow not stopped, and everything already in the system was fine
I suspect a flight plan was manually checked, corrected and was still wrong and the system went into a fail safe state ...

Show thread

steelman Sep 6, 2023

@standupmaths some more technical context https://www.theregister.com/2023/08/30/uk_air_traffic_woes_invalid_data/

UK air traffic woes caused by 'invalid flight plan data'

Former BA boss slams resilience, says explanation 'doesn't stand up from what I know of the system'

The Register

Show thread

Dave Sep 6, 2023

@standupmaths I suppose we're going to have to wait another 30 or so million flight plans to see if that statistic holds up with any reasonable level of confidence.

Show thread

Punica granatum Sep 6, 2023

@zornslemmon @standupmaths
Currently, we have observed things going wrong only once, which means that the probability would be 1/15*10^6. As we have only one observation like that, there is a lot of uncertainty and variability in that estimate. We would need a sample size several orders of magnitude larger before we can say much more about that probability. Currently, we are lead to believe that it’s incredibly unlikely, but there’s a possibility that we just got super lucky 15 million times in a row. What if the true probability is closer to 1/100 or something.

Show thread

Hazel ( she / they / her )Sep 6, 2023

@standupmaths
Oh noooo they broke probability...

Show thread

gunstick Sep 6, 2023

@standupmaths how many million rows does that spreadsheet hold? 😁