@sqlrush

1 Followers
0 Following
12 Posts
7/ Code and honest status board: https://github.com/sqlrush/pgrac · https://pgrac.dev/features Build log #2, written from the source. If you have shipped I/O fencing and think cooperative-first is the wrong order, tell me why. ---
GitHub - sqlrush/pgrac: Bringing Oracle RAC-style shared-disk clustering to PostgreSQL — a shared-everything multi-node cluster (Cache Fusion / SCN / GES), reimplemented on PG 16.13. Early, built in public.

Bringing Oracle RAC-style shared-disk clustering to PostgreSQL — a shared-everything multi-node cluster (Cache Fusion / SCN / GES), reimplemented on PG 16.13. Early, built in public. - sqlrush/pgrac

GitHub
6/ Honest scope: this is cooperative fencing. It stops a node still running the gate. Hardware fencing (STONITH / SCSI-3 / cloud) is a later layer. Demonstrated on 2- and 3-node shared-FS CI: a self-fenced node's writes all fail closed. Faithful-crash auto-recovery is mechanism-proven but SKIP-with-limitation in a single-machine harness. Not claiming more.
5/ The hard part was not fencing a dead node. It was not fencing a healthy one. An idle cluster has no fence marker, so the lease expires and everyone starves. Fix: a steady-state baseline marker that keeps renewing the lease, which a real fence still supersedes instantly.
4/ Hot path: each node distills that marker into a local shmem token (authorized epoch, lease, self-fenced bit). Six storage entry points check a lock-free judge before writing: exact-epoch match, live lease, not self-fenced. Else fail closed, 53R51, or PANIC in a critical section. Lose the voting disks, the lease expires, the node fences itself.
3/ pgrac just shipped the layer above it: an in-process cooperative write-fence, default-ON. Authority is a CRC'd marker (epoch + generation + dead-bitmap) on a quorum-majority of voting disks. Reconfig fails closed if a majority does not ack.
2/ Oracle RAC fences that node below its own software: STONITH, SCSI-3 reservations, a watchdog. You cannot trust a dead node's software to behave, so you cut it off at the hardware. That layer is real work. It is not the layer pgrac built first.
1/ In a shared-everything DB cluster the scariest failure is not a node dying. It is a node that has been declared dead but is not, waking up to finish a write to shared storage that another node already remastered. Two owners, one disk, silent corruption.
5/ I wrote the four hard problems up, broken down to code level (with links into the real source). Run RAC or any shared-storage cluster? I want your "this breaks under X."
Deep-dive: https://dev.to/sqlrush/rebuilding-oracle-racs-core-machinery-on-postgresql-the-four-problems-that-fight-back-2dhl
Repo (⭐): https://github.com/sqlrush/pgrac
#distributedsystems
Rebuilding Oracle RAC's core machinery on PostgreSQL — the four problems that fight back

pgrac is an attempt to build many of Oracle RAC's core capabilities — shared-everything storage,...

DEV Community
4/ Still in progress: live-holder Cache Fusion transfer, full cross-node GES locking, crash recovery + fencing. The anchor under all of it: the --disable-cluster build is binary-identical to upstream PG 16.13 and passes the full 219-test regression suite.
3/ Running today, on real code paths: the global SCN clock, dual-track cross-node MVCC, Cache Fusion's data plane, cluster catalog invalidation, the interconnect+heartbeat substrate. Honest caveat: cross-node *behavioral* test coverage is still being built.