The Post-Mortem Problem

소프트웨어 엔지니어링에서 포스트모템(post-mortem)은 흔히 형식적 절차로 전락해 실질적 학습 도구로서의 역할을 하지 못하는 문제가 있다. 효과적인 포스트모템은 사건 직후 감정과 맥락이 생생할 때 빠르게 작성되어야 하며, 단순 로그 나열이 아닌 사건의 경험을 이야기 형식으로 전달해야 한다. AI는 사건 요약과 초안 작성 같은 반복 작업을 지원할 수 있으나, 분석과 교훈 도출은 인간이 주도해야 한다. 또한, 포스트모템 작성 기준을 낮추고 읽기 기준을 높이며, 후속 조치를 구체적이고 책임 있게 관리하는 문화가 중요하다.

https://incident.io/blog/the-post-mortem-problem

#postmortem #incidentmanagement #softwareengineering #aiassistedwriting #devculture

The post-mortem problem | Blog | incident.io

Post-mortems are one of the most consistently underperforming rituals in software engineering. Most teams do them. Most teams know theirs aren't working. And most teams reach for the same diagnosis: the templates are too long, nobody has time, nobody reads them anyway.

When two Hetzner servers died at the same time

On May 12, 2026, two of my Arch Linux + LUKS servers at Hetzner became unreachable at the same moment. Both had been running for 4+ months without issue. Both had received the same pacman -Syyu the day before, but had stayed on the old kernel until the morning the websites stopped responding. I rebooted — SSH never came back. nmap -Pn -p 22 showed filtered from anywhere. No ping. No banner. The Hetzner Robot panel insisted the hardware was fine.

Several hours went into hypotheses that turned out to be wrong:

  • The encryptssh initcpio hook referencing a /usr/lib/initcpio/udev/11-dm-initramfs.rules file that no longer exists. Real bug, no boot impact — the initramfs rebuilds anyway.
  • PermitRootLogin no in sshd_config. Real misconfiguration, fixed it, didn’t help. A refusing sshd shows closed, not filtered.
  • Predictable interface-naming drift after the systemd 260 upgrade. Patched the .network config to match by MAC. Useful hardening; not the cause.
  • Stale GRUB stage1 + core.img in the MBR. Arch never re-runs grub-install after a grub package upgrade. Refreshed it. Still filtered.
  • Kernel 7.0.5 regression. Downgraded to 6.18.3, the kernel that had run for 4 months. Still filtered. So the kernel itself wasn’t it either.

The clue was in the persistent journal: a single recorded boot from December 31 to May 12 10:13 UTC, and absolutely nothing after. Every reboot since the upgrade was failing before systemd-journald could flush to disk — so the failure had to be in the initramfs, before the root filesystem was even mounted.

What it almost certainly was

Hetzner Dedicated servers configure the initramfs network with ip=dhcp on the kernel command line. That depends on Hetzner’s DHCP server replying to whatever request format the current kernel sends. Somewhere between kernel 6.18 / iproute2 6.18 and kernel 7.0 / iproute2 7.0, the request format changed enough that Hetzner’s DHCP stopped responding. Effects:

  • Old kernel at runtime kept the interface already configured (Phase A — 32 hours of healthy operation after the package upgrade).
  • New kernel cold-boots, hits DHCP, never gets an IP, dropbear cannot listen, port 22 stays filtered.

Hetzner’s own documentation has been quietly moving away from ip=dhcp toward static IPv4 in the kernel command line. The fix is exactly that:

GRUB_CMDLINE_LINUX="cryptdevice=/dev/md1:cryptroot ip=A.B.C.D::GATEWAY:255.255.255.255:hostname:eth0:none"

One line in /etc/default/grub, grub-mkconfig, reboot. No more dependency on Hetzner’s DHCP responding to whatever your current kernel sends.

Why it matters for anyone running this stack

If you run Arch on Hetzner Dedicated with full-disk encryption and remote unlock via dropbear, the ip=dhcp shipped by installimage is a latent bug. It can keep working for years and then break overnight, on every machine you have, after a routine pacman -Syyu. The static-IP version is what Hetzner now recommends and removes the entire dependency.

Tooling

While debugging, I turned the whole rescue / chroot / diagnose / fix workflow into a Python CLI (hal) — including hal fix static-ip, which derives the static cmdline directly from your existing systemd-networkd .network file:

github.com/kevinveenbirkenbach/hetzner-arch-luks

Single command, idempotent, reversible (the original /etc/default/grub is backed up to .hal-backup). If you’re on this stack, switch to static IP before the next kernel upgrade catches you.

#ArchLinux #bootFailure #debugging #DevOps #DHCP #Dropbear #fullDiskEncryption #GRUB #Hetzner #initramfs #kernelUpgrade #Linux #LUKS #mkinitcpio #pacman #postmortem #PythonCLI #serverOutage #sysadmin #systemdNetworkd
GitHub - kevinveenbirkenbach/hetzner-arch-luks: Guide to install Arch Linux with LUKS encryption on an hetzner server

Guide to install Arch Linux with LUKS encryption on an hetzner server - kevinveenbirkenbach/hetzner-arch-luks

GitHub
L’âme mécanique est en ligne.
Pas de format « nouvelle »… Des épisodes à suivre !
À lire sur mon espace Panodyssey : https://chk.me/uRlZVks
#amemecanique #haroldcath #IA #robotique #postmortem #jeuneauteur #1erroman
Post-mortem of my failed attempt to vibe-code a metroidvania game

Holy civilised AI thread. And good on you for not only noticing that it doesn’t work but also telling others about it. I see a lot of people justify vibe coding by saying that the programmer telling them already knows how to code so it’s different. And this is great for that. Thnx.

Godot Forum

The Disturbing Origins of the Camera

Ordinary objects. Disturbing origins. Taking a photo feels natural. You capture a moment. Save a memory. Move on. It’s something you do without thinking. But in its earliest days, photography had a very different purpose. It wasn’t about preserving life. It was about preserving death. A Time Before Easy Memories In the 19th century, photography was still new, expensive, and far from accessible. Most people would never have their picture taken. Not while they were alive. For […]

https://darkbydesign7.wordpress.com/2026/05/05/the-disturbing-origins-of-the-camera/

The Strange Origins of Perfume

Ordinary objects. Disturbing origins. Perfume is supposed to smell pleasant. Clean. Elegant. Luxurious. A small detail people use every day without thinking much about it. But perfume wasn’t originally created to make people smell good. It was created to cover something far worse. A World That Smelled Different For most of human history, cities smelled terrible. Waste filled the streets. Bathing was inconsistent. Disease spread easily through crowded populations. And during certain […]

https://darkbydesign7.wordpress.com/2026/05/07/the-strange-origins-of-perfume/

Show HN: Agent Postmortem Skill – Force AI coding agents to prove their work

agent-postmortem-skill은 AI 코딩 에이전트가 작업 완료를 주장할 때 실제 증거를 제시하도록 강제하는 오픈소스 검증 도구입니다. git 상태, diff, 명령어 실행 결과 등 하드 신호를 수집해 작업 완료 여부를 검증하며, 거짓 완료 상태를 사전에 차단해 품질을 표준화합니다. 모든 셸 명령 실행과 git 상태 확인이 가능한 코딩 에이전트와 호환되며, 작업 후 검증 리포트를 생성해 공유 및 감사가 가능합니다. CI를 대체하지 않고, 에이전트의 작업 주장에 대한 집중적인 거짓 탐지 기능을 제공합니다.

https://github.com/plus8bit/agent-postmortem-skill

#aiagent #verification #softwarequality #opensource #postmortem

GitHub - plus8bit/agent-postmortem-skill

Contribute to plus8bit/agent-postmortem-skill development by creating an account on GitHub.

GitHub

Sabe aquele tipo de bug que fica escondido na sua cara, passa em todos os testes e quando você vai ver, está gerando centenas de falsos positivos? Pois é, passei exatamente por isso recentemente trabalhando no motor de análise estática do Ollanta.

Um de seus componentes o ollanta-scanner, de repente começou a acusar que qualquer chamada de função no código JavaScript era um perigoso eval(), não importava se era uma simples renderização de view ou configuração do sistema.

Fui investigar a fundo e o buraco era bem mais embaixo, envolvendo a forma como o binding Go da API do tree-sitter funciona. O problema não estava na query em si, mas em uma única chamada de método que faltava para avaliar os predicados semânticos.

O código compilava sem reclamar e os testes continuavam verdes porque só validavam o caminho feliz da detecção. Na prática, a ferramenta estava devolvendo todos os matches estruturais crus e ignorando completamente os filtros da regra.

Acabei escrevendo um artigo em meu blog pessoal https://scovl.github.io/2026/05/08/fixgo02/ detalhando toda essa investigação, que no fim das contas virou uma baita reflexão sobre design de APIs, o perigo do acoplamento temporal e aquele viés de confirmação clássico que temos ao esquecer de escrever testes negativos.

Se você trampa com #Go, #desenvolvimento de ferramentas, ou simplesmente curte ler um post-mortem sobre caça a #bugs que dão dor de cabeça, acho que vai gostar do texto.

#go #golang #desenvolvimento #development #developers #ast #cst #treesitter #postmortem

O predicado que ninguém chamou | scovl

Como uma linha de código ausente fez 20 regras de análise estática produzirem centenas de falsos positivos por meses

A year ago I released my turn-based horror strategy about underground oil drilling — and it quietly taught me more than I expected.

So I wrote a postmortem. Here are a few things I'd tell myself at the start.

Full postmortem here 👇

https://www.reddit.com/r/GameDevelopment/comments/1t7vcuo/anoxia_station_postmortem/

Anoxia didn't become a big hit. But it's slowly recouping its budget, and we're already deep into our next game — Bonereader, a Balatro-like deck-builder set in a shamanic Purgatory.

#gamedev #indiegame #postmortem

👉🏼Ya podéis leer mi reseña de «Post Mortem» (AESC, 2025) de Juan Carlos Martínez Buitrago en La Jungla de las Letras: https://lajungladelasletras.com/post-mortem/

Feliz día. 😉🖖🏼

#juancarlosmartinezdon #postmortem #AESC #lajungladelasletras #victormorata #reseñasliterarias #HablandoDeLibros

https://www.instagram.com/p/DX53wTZgcr_/?igsh=MWp0dXIzNDJkYTVxZg==

Post Mortem: reseña —La Jungla de las Letras

Reseña de Post Mortem, de Juan Carlos Martínez Don: teatro, metaficción, terror y resiliencia en una obra íntima y profundamente original.

LaJUnglaDElasLETras