@grr had a reading group attempt to read "Paxos Made Simple" (spoiler: it did not).
Re implementation tweaks, I saw but cannot find now some blog post where Amazon translated some complex distributed system of theirs into a machine checked model (probably Lean?) and it found a bug in their implementation. Feels like the future, maybe?