So managers are starting to spew the whole "well I didn't do anything wrong, it affected everyone else, so we're not liable" bullshit.

Did you allow a third party vendor to have the highest privilege access to all of your systems AND let them run Remote Code Execution on your systems whenever they want?

You didn't have a test environment set up to test each update or patch that is applied to your systems before you push them to prod? No? Just let it auto-update?

Yeah, that "Risk Transference" didn't work so well as your GRC policy seemed to think it would, huh? I know they're a security company and they SHOULD have tested it, but they didn't, did they?

I know everyone else does it, but if everyone else jumped off a bridge, would you?

Just because everyone else fucked up, doesn't mean you didn't fuck up.

There's gonna be a lot of deep discussions in this post-mortem and hopefully orgs will change. Those that don't will just be hit again... and again... and again.

#crowdstrike

Eh, reading the wonderful responses to this thread, who am I kidding?

Just "Risk Acceptance" all of it, budget some money to deal with it when it happens again, and move on with your merry life.

Not like the proles matter at all.

@tinker right in the unknown unknowns budget
@tinker No tech company can be trusted to treat severe adverse impact on ordinary people's lives due to their fuck ups as anything other than a (very small, here's $5 and worse than useless credit monitoring) cost of doing business.

@tinker ". . .hopefully orgs will change."

Many will!

. . .and then they'll lay off their current staff, hire new people who find the old processes tedious, and start all over again.

@tinker "Just let it auto-update?" Most EDRs (including CrowdStrike) don't give us that option. I bet they have that feature release pretty soon.

@Xavier - Indeed. How many folks with budget pushed back before this? How many will now?

A "feature" that everyone requests will be implemented.

@tinker I used to work for an EDR company. We did allow customers to tweak the content updates, and so many customers shot themselves in the foot.

CrowdStrike has always been the MacOS of EDRs. Not very flexible, and many features are hidden from the end user. I used to call it EDR for dummies. You didn't need a dedicated team to run it. It was mostly set-and-forget.

CS has come a long way in flexibility, but still has a while to go.

@tinker @Xavier your company chose to run CrowdStrike and Windows. Your company HAD a choice.

@Xavier @tinker this wasn't even a 'product update', it was a 'definitions 'update'.

This same sort of thing happened many years ago where an Office binary was deleted by our antivirus, basically stopping business for a day. This seems like sort of a nightmare edition of that 'bad defs' problem that used to happen more often.

I sort of don't want to be in a world where we are testing the day-to-day definition updates for our EDR. I'd rather put this into perspective as one bad day, and maybe add this scenario to our DR procedures.

@DarcMoughty @Xavier @tinker your take is too sane. Not nearly spicy enough.

@Xavier @tinker tbh it was a risk that we identified when we did our assessment. I checked and we put it as once in a decade type of event. It was unlikely but there is a potential. This was more fringe than anything.

Now if it happens again…

Today we found out who had good BCP’s and who didn’t, hopefully orgs who’s BCP’s weren’t great learn why.

Also you can have a good BCP and still have a bad time. BCP doesn’t mean you’re going to have fun, you’re minimizing downtime. It sucks but boy am I’m proud.

@tinker

*shrug* most orgs are gonna see this, ultimately, as a black swan event.

Setting up test envs to -actually evaluate updates- before deployment requires specific expertise, which requires paying for it.

In my experience, they don't want to pay for this; the business process incentives do not align to make that a regular part of operations, due to the increased friction in IT operations that results, etc. etc. etc.

Given the systemic removal of in-org IT ops in favor of contracted MSP shit - who deploys endpoint agents like this under contract and is thus -not in the control structure of the org directly- and as such is not meaningfully a part of the organization and not a part of these conversations?

Yeah.

This one's not fixable by telling people to do the things that have been standard practice in well-run orgs for decades; if they're not doing the 'right' thing, then that's due to some kind of internal organizational dysfunction that cannot be treated with generic advice.

"Happy families are all alike; every unhappy family is unique" and all that - if they're refusing to do the correct workflow, there is some organizational trauma, unique to the org, preventing it.

Ain't no such thing as a "business psychotherapist" to unfuck that - tho, lol, if someone's willing to pay me enough I could take a stab at it.

@munin - business psychotherapist.... That's a vCISO right?

@tinker

No. A vCISO is not capable of debugging the structure of the organization itself.

The problem lies in the org, not in the tech the org uses.

@munin - My joke lies in the idea of vCISOs providing consultant "policies and procedures" via GRC gigs, etc.

Just cause a psychotherapist tells you what to do, doesn't mean you apply that therapy to yourself.

Anyhow. I completely agree with everything you have said.

@tinker

"only one, but the lightbulb has to want to change" lol

@tinker @munin

Just when you thought that there was no such things as "Organisational Psychotherapy" ...

https://flowchainsensei.wordpress.com/2012/04/29/the-nine-principles-of-organisational-psychotherapy/

The Nine Principles of Organisational Psychotherapy

The Nine Principles of Organisational Psychotherapy The core premise of Organisational (Psycho)Therapy is that flourishing organisations are great places to work, and because of this, highly effect…

Think Different

@tinker @munin

Honestly, I keep thinking that much of the modern corporate world these days is "certifiably insane."

@JeffGrigg @tinker

Sanity is a societal convention; assuming it has a meaning beyond "conforming with the norms of an organization or context" is prolly not useful.

@munin @tinker

And also, half the time the InfoSec industry is like "patch all your things with all the updates within 5 minutes or you are toast".... and then they are "oh, you don't have robust test, then canary, and full rollback and response plans for every single kind of update to everything you own? tsk tsk"

@munin @tinker

99% of orgs struggle to get stuff patched in anything like a timely manner. anti-malware are a compensating control for that being slow... so now we do that slowly as well. What is the compensating control for the compensating control for being slow being slow?

@mmaibaum @tinker

Yes, it requires systemic examination of the organization as a whole in order to determine how build up a comprehensive approach to understanding the vulnerability surfaces that the company has, and the way in which security controls can be applied to those surfaces relevantly, working with, rather than providing friction to, the workflows that the company requires to do business.

This is what competent blue teaming does, and it requires paying for people who are willing to engage with this problem, paying for their education, and giving them enough political clout within the organization that meaningful change can take place.

@munin @tinker yep, and I have worked with good people like this - tbh this was more a comment on the incredibly naive commentary around in general (not this thread) from people who quite obviously never worked in a large complex org

@mmaibaum @tinker

for context, I've done a lot of work in the past addressing these specific issues - tinker knows my background there with btv and the whole focus on the blue team education pipeline from them lol

@munin @tinker best person I worked with on this persuaded the wider tech function this was all basically a quality issue :)

@mmaibaum @tinker

It is. Security is part of QA and part of Ops generally. The tooling overlaps in both cases, and all three of those departments can - and in my opinion ought to - work synergistically.

@munin @tinker I worked on a project where we engaged a "business psycotherapist" once. It was amazing. He was an organizational behaviorist. Basic message was that most people in the workplace aren't behaving like adults. Really interesting stuff.
@munin @tinker this update can't stop won't stop. Even if you're n-2
@tinker this was pushed to all customers regardless if they had auto-update on

@djnick - The core part of allowing a third party remote code execution at highest priv still stands.

But! What are you going to do. Trusting trust and all that.

@tinker @djnick The core of the problem is the OS.

Namely, an OS with a kernel lacking any meaningful fault isolation.

The research for making OSes not vulnerable to that sort of problem has been completed since at least the 80s.

There really isn't an excuse.

The research for making /performant/ equivalents that do not require special hardware is newer (Singularity OS project is one example from the very same OS publisher, for instance), but was also mostly completed a few decades back.

Trusting Trust is not about the same problem (malice of the component doesn't matter so much if it never has access to anything it shouldn't anyway), but even so David A. Wheeler's paper was published a while ago now.

(Caveat of course being that malicious components can still potentially DoS the system and hardware vulnerabilities can also enable complete compromise despite there being no logic-level flaw in the isolation.)

@lispi314 @djnick @tinker This, but also: AV shitware vendors *insist on bypassing any isolation that exists* because they deem themselves the most important thing on the machine. On Linux they would demand you load sketchy kernel modules to give them the same backdoors. (The history of how fanotify came to exist was basically trying unsuccessfully to avoid that shit happening.)

"AV" and "security products" need to become widely understood as malware, and rejected.

@djnick @tinker auto update is just for the Driver. Channel updates cannot be stopped.
@tinker We're looking at potentially leveraging this event for a reason to get rid of it. We originally looked at another tool with better features and no remote access. But, then a new IT director went behind the back of security and signed a three year contract, because he had it at his last job, and told IT to install it.
@tinker our company had one machine go down...the one testing crowdstrike

@tinker This whole situation just shows how cyclic our industry, and the tech industry in general, is. How many decades was it when Windows updates failed and we preached this? Oh, but then it got better, so nobody does staging upgrades anymore. Likewise for old AV vendors. McAfee comes to mind, but I feel that Symantec was in there as well.

We preach patching, and I'm not immune to that pulpit either, but I agree that we need to do better at adding that it needs to be done safely, in a controlled environment, and not just blindly accept it.

@JohnsNotHere @tinker As an industry, we desperately need to invest in software (and hardware!) diversity.

Software monocultures reduce our resiliency against threats both malicious and benign (including stupid mistakes, like pushing out a malformed update).

We should plan for adequate software and hardware diversity as we design and architect systems and networks. Doing so will increase our security, accessibility, and availability postures.

@tinker Unfortunately management types *like* that cloud MSPs offload liability, and this is a major reason they go with them in the first place. They buy into this crap of letting someone else manage their IT infrastructure so that when it breaks, it's not their own fault. More often than not, these MSPs have dangerous levels of access, but the orgs that use them don't care as long as they can check the box with auditors saying they're compliant. Then when shit goes sideways at the MSP, everyone acts surprised that they're up shit creek without a paddle.

@tinker

> I know everyone else does it, but if everyone else jumped off a bridge, would you?

Would it improve shareholder returns?

@tinker

The best way to avoid problems is to avoid Windows.

@tinker @Taco_lad Have been saying for years, should have stuck with QIC+sneakernet
@tinker
We try our best to keep everything in our control, but it is REALLY difficult with our management constantly being spammed by salespeople trying to push cloud services and SaaS. We dodged this particular bullet because of it, but that just means mgmt won't have it as a memory.
It is also REALLY difficult when using Windows products for desktop users because they deliberately bundle and make so many dependencies on their cloud services which we can't maintain or control.
@tinker Sound like where I work?
@tinker don't hold your breath. An occasional disaster is not on the books or covered in the risk of doing business budget, whereas teams to test and certify updates are committed costs showing up as ever quarter. So - fat chances 😮 the org will change. The goal is to optimize profits, not to provide reliable (customer) service ...
#clownstrike #

@tinker

This is true. It doesnt go far enough.

Discussions on software recently have talked about OS agnostic apps, and NASA has a policy of triple back ups of necessary, critical systems.

Triple back ups. NASA doesnt put new equipment into rockets, and when it gets in, another system stands ready to do the job INSTEAD.

Corners are being cut.

@tinker Even Companies who used the N-1 and N-2 sensor update policies were equally hit, as well as companies who manually incremented the version of their sensor policies. This update didn't go through normal channels that should have been caught.
@tinker but let’s throw all our money into AI because securing data and making sure the bugs are all gone isn’t sexy.
@tinker I get what you're saying and I agree, but I feel like these kinds of events are just going to become normalized like data breaches have, and the costs absorbed by the customers.
@tinker Thank-you, I feel the same about M$. M$ got rid of testers @ 20 years ago. I was at the highest level, 4, and they did it to make more money, and pushed the burden onto the developers, because testing your own code is so foolproof!🤬
CrowdStruck

Soundtrack: EL-P - Tasmanian Pain Coaster (feat. Omar Rodriguez-Lopez & Cedric Bixler-Zavala) When I first began writing this newsletter, I didn't really have a goal, or a "theme," or anything that could neatly characterize what I was going to write about other than that I was on the computer and that

Ed Zitron's Where's Your Ed At

@tinker Sounds like you have a bit of experience with CrowdStrike, (I only have experience in watching exploits get thrown at it as a moderator during a wargame only to have each and every one get flagged and neutralized immediately...)

Do they even offer an option staged deployments? I was under the impression that it behaved closer to AV clients, where new definitions were sent globally as soon as they pushed the button.

@Okanogen and @FeralFeminist I feel your pain; I work for a company that's throwing all their money into building SaaS solutions integrated with AI rather than actually fixing our current stack; it's equally as annoying. (Well that and buying up a bunch of companies.)