Mastodawn

This is absolutely not a post that inspires confidence or trust. CISOs who have signed contracts with CrowdStrike should pay close attention to this post.

For what I think is a set of details that meets this standard, this from Meta/Facebook (my former workplace) is a good example: https://engineering.fb.com/2021/10/05/networking-traffic/outage-details/

More details about the October 4 outage

Now that our platforms are up and running as usual after yesterday’s outage, I thought it would be worth sharing a little more detail on what happened and why — and most importantly, how we’re lear…

Engineering at Meta

"Systems that are not currently impacted will [...] have no risk of experiencing this event in the future." -- this is infuriating. How can you possibly say "no risk"? What kind of systemic protections could you possibly put into place within 2 days?

The Meta post is great. It was for an outage not nearly as bad as this, and doesn't even go too deep into the details, but:

it resolves most of the questions someone knowledgeable would ask

it talks about systemic improvements (the "storm" drills talked about were incredibly important and made many great improvements happen)

and it is personally signed by a VP of infrastructure, rather than just "Executive Viewpoint"

I guess CS does say at the end that they're committed to publishing a followup with more detail, but (a) I think this is too little detail even in an interim post, and (b) please don't call this kind of post "technical details". But anyway, I guess we should hold them accountable for their promise

you know what really annoyed me?

they never said they were sorry

cop mindset, innit, tho? Since they're 'protecting' you, any harms done are the cost of that 'protection'.

@rain yeah actually, wow

Ben Aveling Jul 20, 2024

@rain That they aren't asking the hard questions rather suggests that they already know where the problem is.

MaybeMyMonkeys Jul 21, 2024

@BenAveling @rain they won’t acknowledge that their QA is broken to cut costs

MaybeMyMonkeys Jul 21, 2024

@rain everything has to go through legal because it is all evidence in the countless lawsuits and hearings to come

@rain ("Executive Viewpoint" looks like a category of post on their blog, rather than an author line. i think that means its intended audience is in fact CISO/CTO/CEO types who would be asking, well, executive questions. for better or worse.)

@rain yeah https://www.crowdstrike.com/blog/embersim-large-databank-for-similarity-research-in-cybersecurity/ suggests that the author line in the incident post is actually just "Crowdstrike", not even "executive team"

EMBERSim: A Large-Scale Databank for Boosting Similarity Search in Malware Analysis | CrowdStrike

Learn more about this BCS dataset that builds on the existing EMBER dataset, with extended data tags and a new leaf similarity co-occurrence algorithm that accounts for both benign and malicious binaries.

crowdstrike.com

I expect that crowdstrike's legal team is -super- squirrelly about publishing details that could constitute an admission of liability, which may well play into the paucity of detail there.

Not to mention the whole issue with security companies -constantly- eliding necessary detail because it would "compromise customer operations" or whateverthefuck excuse.

Only way to get the real story in infosec is via backchannels half the time :-/

Chris Petrilli Jul 20, 2024

@munin @rain I saw this with a vendor that got ransomwared last year, and it was obvious their lawyers were in charge of comms completely. It took us 6m to find out enough to know if our data was compromised.

It was exhausting and infuriating.

Chris Petrilli Jul 20, 2024

@munin @rain to this day we don’t have a clear understanding of 1) how it happened; 2) what they’re doing to prevent it again.

Vendor risk is such a nearly unspeakably huge toxic dump at most companies. We are burning so much money and energy to reign it in, but it’s an uphill battle when the part of the org that “benefits” from the “cost savings” isn’t the one who bears the cost of the compromise.

@petrillic @rain

Completely agree. And trying to get vendors to even let you see their SOC2 - assuming they even have one, and assuming whoever signed the contract thought to put auditing provisions in in the first place - is a huge fucking pile of NDA horseshit.

Chris Petrilli Jul 20, 2024

@munin @rain well for example, the contract allowed for “one audit” per year and so when Citrixbleed happened and we asked them to attest they had patched, they said “sorry you’re already past he 1 audit” and wouldn’t answer.

They got popped by Citrixbleed.

@petrillic @rain

Oh yeah. Well and truly familiar with both sides of -those- provisions lol

Thing is tho, -everyone- is trying to do their "one audit" a year, and ain't nobody hiring ......well, a -me- to handle coordinating those, or managing the communication to customers around relative exposure to emergent threats.

Eric Carroll Jul 27, 2024

@petrillic
My experience exactly.

If risk could be transfered to an outsider execs were like...
@munin @rain

Hilko Bengen Jul 21, 2024

@munin @rain Given how afraid sales engineers were to answer simple "How does X work?"-style questions even though we bad signed an NDA for evaluating their product, I find this entirely plausible.

@rain can absolutely tell who does not really consider the 24/7 service they operate is a 24/7 service with deeply critical impact to their users

farhaven 🇪🇺Jul 21, 2024

@rain I think they just mean "this particular unpaged memory access at this particular source location caused by this particular kind of corruption in our config files", not "crashes because of invalid configs"

abadidea Jul 21, 2024

@rain @munin personally I’m willing to wait through end-of-business on Monday for them to add the promised actual root cause analysis, since I don’t expect or want every single employee to be working through Saturday and Sunday. if it’s not there by then, though, I’ll be pretty offended

Karen Bruner Jul 20, 2024

@rain it was nonsensical.

I am trying to extend a modicum of grace and believe they will put something more substantive out after they've had time to gather information and do a post-incident write-up.

wrosecrans Jul 20, 2024

@rain Managers and PR people always get twitchy when engineers want to speak plainly.

But when the engineers aren't allowed to speak plainly, the reader must always assume that letting the engineers speak plainly would be horrifically embarrassing so the cageyness always has a net negative PR value.

Ada ⋆.˚🦦⋆🦌 - Neptuwunium Jul 20, 2024

@rain because companies like this largely operate with security through obscurity, the more they explain the more they believe their security will be impacted.

which given you can brick their system by nuking a definitions file, i think it's just crap software.

Old Man Jul 21, 2024

@rain
they don't want to get into the technical details in their technical details report, it's too technical

A test run?

My Spidey-sense tells me it was planned. The #Biden #economy is too good, #StockMarket doing too well, yanno what I mean?

Nahhh, couldn't be, I must be wrong

But then again...

MaybeMyMonkeys Jul 21, 2024

@rain Meaninglessly post to protect investors.

RevK

Jul 21, 2024

@rain "and have no risk of experiencing this event in the future."

That is a brave statement!

markus Jul 21, 2024

@revk @rain It'll just be 'very different' next time.

Tony Finch Jul 21, 2024

@revk @rain risky to promise safe updates in future given the number of crowdstrike sensor configuration update fuckups they have had in the last few months

Matija Nalis Jul 22, 2024

@revk
Actually, not brave at all, but 100% guaranteed (by definition of the word). Completely non-informational though, of course. Because, "Event" is defined (among other things) by time at whhich it happened. So if exactly the same technical bug occurs in a future, well, that is *another* (similar) event, but definitely not "this event". Customers are just not fluent enough in #Newspeak ...
@rain

Alexey Jul 21, 2024

@rain Every business has a test system. The fortunate ones also have a production system. CrowdStrike uses the whole world as a test system.

tamas Jul 21, 2024

@rain To be fair they do touch on this: “we are doing a thorough root cause analysis to determine how this logic flaw occurred. This effort will be ongoing. We are committed to identifying any foundational or workflow improvements that we can make to strengthen our process. We will update our findings in the root cause analysis as the investigation progresses.”

Bernd Schittenhelm Jul 21, 2024

@rain so what they basically say: "we don't test things and we don't do staged distibution. This is normal operations mode at Crowdstrike."
I think I would rather run a system without these risks.

Billie Thompson 🦊Jul 21, 2024

@rain I think what this tells us so far is that Crowd strike doesn't have a generative culture, otherwise we would see them sharing rather than managing

sparseMatrix 📻Jul 21, 2024

fuck microsoft, windows, and all the associated bullshit.

The Air Whisperer Jul 21, 2024

Another thing:

Is it just me, or is that bird GOING DOWN IN FLAMES...?