Jaana Dogan ヤナ ドガン (@rakyll)
한 팀이 클러스터에서 반복적으로 리더가 바뀌는 문제를 3주간 디버깅한 사례. 원인은 특정 스레드의 지속적 경쟁(thread contention)을 프로파일링하지 않았기 때문이며, 그로 인한 기아(starvation)가 리더 재선출을 촉발했다는 실무 인사이트를 전달함. 프로파일링의 필요성을 강조.
https://x.com/rakyll/status/2012581644025385392
#distributedsystems #leaderelection #debugging #profiling

Jaana Dogan ヤナ ドガン (@rakyll) on X
I heard from a friend that they spent 3 weeks debugging why their cluster kept electing new leaders. Turns out their team never profiled the contention effecting certain threads continuously online. Starvation ends up being the cause precisely when we assume it wouldn't.
X (formerly Twitter)Leader Election in Go with a Postgres database using Kubernetes tooling
Leader Election is an important part of building distributed and scalable web applications. Often applications may have multiple replicas and could be deployed in a number of ways. Using the principal of read-many-write-one and Raft concensus, application replicas are all able to coordinate for a lease to be able to perform write-sensistive tasks to avoid data duplication or conflicts. Whether writing an application as a monolith or soley a scheduler, ensuring one replica writes greatly increases reliability.

Paxos vs Raft: Have we reached consensus on distributed consensus?
Distributed consensus is a fundamental primitive for constructing fault-tolerant, strongly-consistent distributed systems. Though many distributed consensus algorithms have been proposed, just two dominate production systems: Paxos, the traditional, famously subtle, algorithm; and Raft, a more recent algorithm positioned as a more understandable alternative to Paxos.
In this paper, we consider the question of which algorithm, Paxos or Raft, is the better solution to distributed consensus? We analyse both to determine exactly how they differ by describing a simplified Paxos algorithm using Raft's terminology and pragmatic abstractions.
We find that both Paxos and Raft take a very similar approach to distributed consensus, differing only in their approach to leader election. Most notably, Raft only allows servers with up-to-date logs to become leaders, whereas Paxos allows any server to be leader provided it then updates its log to ensure it is up-to-date. Raft's approach is surprisingly efficient given its simplicity as, unlike Paxos, it does not require log entries to be exchanged during leader election. We surmise that much of the understandability of Raft comes from the paper's clear presentation rather than being fundamental to the underlying algorithm being presented.
arXiv.org