I'm sorry but CrowdStrike's "technical details" post is well below the expected standard in our field. What kind of logic error was it? More importantly, what was your QA process like? Why did CI not catch it? Why was there not a staged rollout? These are the absolute basics that should be expected.

https://www.crowdstrike.com/blog/falcon-update-for-windows-hosts-technical-details/

Technical Details: Falcon Update for Windows Hosts | CrowdStrike

Learn more about the technical details around the Falcon update for Windows hosts.

crowdstrike.com
This is absolutely not a post that inspires confidence or trust. CISOs who have signed contracts with CrowdStrike should pay close attention to this post.
For what I think is a set of details that meets this standard, this from Meta/Facebook (my former workplace) is a good example: https://engineering.fb.com/2021/10/05/networking-traffic/outage-details/
More details about the October 4 outage

Now that our platforms are up and running as usual after yesterday’s outage, I thought it would be worth sharing a little more detail on what happened and why — and most importantly, how we’re lear…

Engineering at Meta
"Systems that are not currently impacted will [...] have no risk of experiencing this event in the future." -- this is infuriating. How can you possibly say "no risk"? What kind of systemic protections could you possibly put into place within 2 days?

The Meta post is great. It was for an outage not nearly as bad as this, and doesn't even go too deep into the details, but:

it resolves most of the questions someone knowledgeable would ask

it talks about systemic improvements (the "storm" drills talked about were incredibly important and made many great improvements happen)

and it is personally signed by a VP of infrastructure, rather than just "Executive Viewpoint"

I guess CS does say at the end that they're committed to publishing a followup with more detail, but (a) I think this is too little detail even in an interim post, and (b) please don't call this kind of post "technical details". But anyway, I guess we should hold them accountable for their promise

you know what really annoyed me?

they never said they were sorry

@rain

cop mindset, innit, tho? Since they're 'protecting' you, any harms done are the cost of that 'protection'.

@rain That they aren't asking the hard questions rather suggests that they already know where the problem is.
@BenAveling @rain they won’t acknowledge that their QA is broken to cut costs
@rain everything has to go through legal because it is all evidence in the countless lawsuits and hearings to come
@rain ("Executive Viewpoint" looks like a category of post on their blog, rather than an author line. i think that means its intended audience is in fact CISO/CTO/CEO types who would be asking, well, executive questions. for better or worse.)
@rain yeah https://www.crowdstrike.com/blog/embersim-large-databank-for-similarity-research-in-cybersecurity/ suggests that the author line in the incident post is actually just "Crowdstrike", not even "executive team"
EMBERSim: A Large-Scale Databank for Boosting Similarity Search in Malware Analysis | CrowdStrike

Learn more about this BCS dataset that builds on the existing EMBER dataset, with extended data tags and a new leaf similarity co-occurrence algorithm that accounts for both benign and malicious binaries.

crowdstrike.com

@rain

I expect that crowdstrike's legal team is -super- squirrelly about publishing details that could constitute an admission of liability, which may well play into the paucity of detail there.

Not to mention the whole issue with security companies -constantly- eliding necessary detail because it would "compromise customer operations" or whateverthefuck excuse.

Only way to get the real story in infosec is via backchannels half the time :-/

@munin @rain I saw this with a vendor that got ransomwared last year, and it was obvious their lawyers were in charge of comms completely. It took us 6m to find out enough to know if our data was compromised.

It was exhausting and infuriating.

@munin @rain to this day we don’t have a clear understanding of 1) how it happened; 2) what they’re doing to prevent it again.

Vendor risk is such a nearly unspeakably huge toxic dump at most companies. We are burning so much money and energy to reign it in, but it’s an uphill battle when the part of the org that “benefits” from the “cost savings” isn’t the one who bears the cost of the compromise.

@petrillic @rain

Completely agree. And trying to get vendors to even let you see their SOC2 - assuming they even have one, and assuming whoever signed the contract thought to put auditing provisions in in the first place - is a huge fucking pile of NDA horseshit.

@munin @rain well for example, the contract allowed for “one audit” per year and so when Citrixbleed happened and we asked them to attest they had patched, they said “sorry you’re already past he 1 audit” and wouldn’t answer.

They got popped by Citrixbleed.

@petrillic @rain

Oh yeah. Well and truly familiar with both sides of -those- provisions lol

Thing is tho, -everyone- is trying to do their "one audit" a year, and ain't nobody hiring ......well, a -me- to handle coordinating those, or managing the communication to customers around relative exposure to emergent threats.

@petrillic
My experience exactly.

If risk could be transfered to an outsider execs were like...
@munin @rain

@munin @rain Given how afraid sales engineers were to answer simple "How does X work?"-style questions even though we bad signed an NDA for evaluating their product, I find this entirely plausible.
@rain can absolutely tell who does not really consider the 24/7 service they operate is a 24/7 service with deeply critical impact to their users
@rain I think they just mean "this particular unpaged memory access at this particular source location caused by this particular kind of corruption in our config files", not "crashes because of invalid configs"
@rain @munin personally I’m willing to wait through end-of-business on Monday for them to add the promised actual root cause analysis, since I don’t expect or want every single employee to be working through Saturday and Sunday. if it’s not there by then, though, I’ll be pretty offended

@rain it was nonsensical.

I am trying to extend a modicum of grace and believe they will put something more substantive out after they've had time to gather information and do a post-incident write-up.

@rain Managers and PR people always get twitchy when engineers want to speak plainly.

But when the engineers aren't allowed to speak plainly, the reader must always assume that letting the engineers speak plainly would be horrifically embarrassing so the cageyness always has a net negative PR value.

@rain because companies like this largely operate with security through obscurity, the more they explain the more they believe their security will be impacted.

which given you can brick their system by nuking a definitions file, i think it's just crap software.
@rain
they don't want to get into the technical details in their technical details report, it's too technical

@rain

A test run?

My Spidey-sense tells me it was planned. The #Biden #economy is too good, #StockMarket doing too well, yanno what I mean?

Nahhh, couldn't be, I must be wrong

But then again...

@rain Meaninglessly post to protect investors.

@rain "and have no risk of experiencing this event in the future."

That is a brave statement!

@revk @rain It'll just be 'very different' next time.
@revk @rain risky to promise safe updates in future given the number of crowdstrike sensor configuration update fuckups they have had in the last few months
@revk
Actually, not brave at all, but 100% guaranteed (by definition of the word). Completely non-informational though, of course. Because, "Event" is defined (among other things) by time at whhich it happened. So if exactly the same technical bug occurs in a future, well, that is *another* (similar) event, but definitely not "this event". Customers are just not fluent enough in #Newspeak ...
@rain
@rain Every business has a test system. The fortunate ones also have a production system. CrowdStrike uses the whole world as a test system.
@rain To be fair they do touch on this: “we are doing a thorough root cause analysis to determine how this logic flaw occurred. This effort will be ongoing. We are committed to identifying any foundational or workflow improvements that we can make to strengthen our process. We will update our findings in the root cause analysis as the investigation progresses.”
@rain so what they basically say: "we don't test things and we don't do staged distibution. This is normal operations mode at Crowdstrike."
I think I would rather run a system without these risks.
@rain I think what this tells us so far is that Crowd strike doesn't have a generative culture, otherwise we would see them sharing rather than managing

@rain

fuck microsoft, windows, and all the associated bullshit.

@rain

Another thing:

Is it just me, or is that bird GOING DOWN IN FLAMES...?

@rain @koehntopp indeed, it remains vage regarding the mechanism leading to the failure. RCA is pending, so it is not impossible that more details may follow, but i doubt it.
What I'm wondering: why is it that such a "channel file" which, naively speaking, contains a description/recipe of how to identify attacks, can cause a bluescreen of the whole OS. It should be read in such a way that invalid input is disregarded. Exceptions should be caught. At most, one process should die (&auto-restart)?