#HPC #supercomputing #Infiniband While trouble-shooting a performance issue on our NDR fabric, where nodes would randomly report high latency and less than expected bandwidth (up to 50% less), I discovered a setting within opensm.conf that configured routing to be randomized vs. distributed/round-robin.... Once I changed the setting (scatter_ports) to the _DEFAULT_, I had immediate and consistent performance improvements. See the before and after images... So, FYI, if your users are reporting random latency and bandwidth issues, double-check your opensm.conf routing. Also, I was using NVIDIA/Mellanox's clusterkit tool.