Mastodawn

Alexander Bochmann Aug 14, 2018

Looks like we'll be planning for the next round of VMware updates due to #L1TF tomorrow: https://www.vmware.com/security/advisories/VMSA-2018-0020.html

"vCenter Server, ESXi, Workstation, and Fusion updates include Hypervisor-Specific Mitigations for L1 Terminal Fault - VMM. This issue may allow a malicious VM running on a given CPU core to effectively read the hypervisor’s or another VM’s privileged information that resides sequentially or concurrently in the same core’s L1 Data cache."

#infosec

VMSA-2018-0020

VMware vSphere, Workstation, and Fusion updates enable Hypervisor-Specific Mitigations for L1 Terminal Fault - VMM vulnerability.

Show thread

Alexander Bochmann Aug 15, 2018

So, #L1TF is kinda the "hyperthreading is dead" one... VMware introduces a new "ESXi Side-Channel-Aware Scheduler".
Quote: "Currently, this scheduler provides the Hyper-Threading-aware mitigation by scheduling on only one Hyper-Thread of a Hyper-Thread-enabled core. As described in more detail below, careful capacity planning is required prior to enabling the ESXi Side-Channel-Aware Scheduler as it could have a performance impact for enterprise applications."

https://kb.vmware.com/s/article/55767

VMware Knowledge Base

Show thread

Alexander Bochmann

#VMware has released a (Powershell) tool to help assess the effects of activating the "Side-Channel-Aware Scheduler" on the hosts of an ESXi-Cluster: https://kb.vmware.com/s/article/56931

Not sure how useful the output is since I only ran it against our test cluster up to now, which, as I was reminded by the tool, has CPUs that don't do hyperthreading anyways 🙄

Still waiting for results on the first production cluster...

VMware Knowledge Base

Show thread

Alexander Bochmann Aug 22, 2018

According to the tool, we should mostly be fine after activating the ESXi SCA scheduler.
It generates a report for each host, listing how much time the host spent in a band of CPU utilisation, and warns if there were times of more than 70% usage. Also finds VMs that have more vCPUs than cores available on each host (minus HT).
We have two host with CPU usage above the threshold, and will need to juggle some resources. Don't expect any problems with that.

On to patching >50 ESXi hosts then.

Show thread

Alexander Bochmann Aug 22, 2018

Should come as little surprise, but activating the SCA scheduler on hosts with hyperthreading actually, really, halves the number of available logical processors.
I can see how having VMs with 20 vCPUs (our current maximum, and it's just one of those in all of the environment) on #ESXi hosts that now have, for example, 24 logical processors (instead of 48) will not be funny.
I think #VMware is slightly understating the effects this will have on some setups.
Screenshots from two old test hosts:

Show thread

Alexander Bochmann Aug 22, 2018

The slide on #VMware KB55806, "mitigation process for CVE-2018-3646", also easily makes it into my favourites collection.

1) update
2) find out you now have capacity issues
3a) buy new hardware
3b) (NOT RECOMMENDED) accept risk

https://kb.vmware.com/s/article/55806

VMware Knowledge Base

Show thread

The Gibson 🅅 Aug 22, 2018

@galaxis

um, not my experience...

after patching a few test environments, we were unable to vmotion due to high CPU.

even on hosts that weren't using Hyperthreading to start with...

do some testing.

Show thread

Alexander Bochmann Aug 22, 2018

@thegibson Yeah, planning on that. Tool or not, there has to be some visible fallout from removing half of the CPU threads from a cluster (and ESXi seemed to treat hyperthreads mostly as full cores up to now).

Show thread

The Gibson 🅅 Aug 22, 2018

@galaxis It is a HUGE hit on a well used host.

You know your stuff.. proceed cautiously.

Show thread

vautee Aug 22, 2018

@galaxis We patched last Thursday without using that before-mentioned tool and ended up quite fine.