Mastodawn

SRE is about sleeping well 🌙

The goal is not midnight heroics.
It is building systems that fail safely so humans can rest.

https://webdad.eu/2026/05/14/%f0%9f%98%b4-sre-is-about-sleeping-well/

😴 SRE Is About Sleeping Well - WebDaD - Web Development and Design

Learn what SRE really means through the metaphor of bedtime routines. This simple guide explains why Site Reliability Engineering exists to reduce heroics, enable safe failure, protect human energy, and help teams sleep through the night.

WebDaD - Web Development and Design

DSigmund Mar 19

SLIs, SLOs, and Error Budgets explained with pizza 🍕

Reliability isn’t about perfection.
It’s about delivering most pizzas on time — and having room to improve.

#SRE #DevOps #ReliabilityEngineering

https://webdad.eu/2026/03/19/%f0%9f%8d%95-slis-slos-and-error-budgets-explained-with-pizza-delivery/

🍕 SLIs, SLOs, and Error Budgets Explained with Pizza Delivery - WebDaD - Web Development and Design

Confused about SLIs, SLOs, and error budgets? This simple pizza delivery metaphor explains reliability engineering concepts in an easy and memorable way for developers, SREs, and DevOps teams.

WebDaD - Web Development and Design

relysam Mar 17

🚀 RelySAM v1.1.0 released! Now train all 10 AI models with your own custom data — via Web UI, CLI, scheduler, or 11 new API endpoints. Auto quality checks .🔗 codeberg.org/0ai/relysam/

#ReliabilityEngineering #AI #MachineLearning #FreeBSD #OpenSource

Smidmart Mar 8

Spacecraft controls don’t get a second chance — every input must be reliable, clean, and redundant. 🚀🔘
know more:https://zurl.co/zmWFi
#Smidmart #SpaceTech #AerospaceComponents #SealedSwitches #LowOutgassing #Spacecraft #ReliabilityEngineering #Redundancy #Avionics

TechNadu Mar 4

Global availability incident: Facebook.
Meta confirmed service disruptions impacting account access, alongside high disruptions reported in Ad Manager and business APIs.
Operational characteristics:
• Sudden spike in user reports (~4:15 PM ET)
• Global impact footprint
• No immediate root cause transparency
• Service restoration within ~2 hours
Availability is a security pillar — and outages expose:
- Centralization risk
- Cascading dependency exposure
- Business continuity gaps
- API reliance vulnerabilities

For security and reliability engineers:
Are social platforms integrated into your risk register and DR modeling?

Source: https://www.bleepingcomputer.com/news/technology/facebook-hit-with-worldwide-outage-stating-accounts-are-unavailable/

Engage below.
Follow @technadu for infrastructure resilience, cybersecurity, and outage intelligence.
Repost to inform your network.

#Infosec #ServiceAvailability #CloudRisk #Meta #FacebookOutage #BusinessContinuity #DigitalInfrastructure #ReliabilityEngineering #CyberResilience #PlatformRisk #ITOperations

relysam Feb 27

🚀 RelySAM v1.0.2 Released – Reliability Engineering + AI Gets Even Better on FreeBSD 🚀

#ReliabilityEngineering #AIinEngineering #FreeBSD #OpenSource #FMEA #HRA

Feedback, stars, or contributions very welcome — especially from reliability engineers, or FreeBSD users

Jeff Bailey Feb 18

Your service call said “BRB” and never came back.

Learn how to stop waiting forever on slow dependencies.

https://jeffbailey.us/blog/2026/02/01/fundamentals-of-timeouts/

#DistributedSystems #BackendEngineering #SystemDesign #ReliabilityEngineering #Software #Programming #SoftwareDevelopment #SoftwareEngineering

Fundamentals of Timeouts

Timeout fundamentals for software: why timeouts exist, connection vs read vs write, choosing values, and avoiding cascading failures in distributed systems.

Jeff Bailey

relysam Feb 14

🚀 RelySAM v1.0.0 is here!
Open-source reliability engineering + AI/ML power
50+ tools • 10 AI models • 9 HRA methods • Weibull/FMEA/RCA • fully offline

FreeBSD-native

⚙️ Try it now:
codeberg.org/0ai/relysam

#ReliabilityEngineering #AI #FreeBSD #OpenSource

Ismail Kovvuru Jan 21

The 2024 CrowdStrike outage caused a worldwide Windows Blue Screen crash, impacting airlines, banks, and enterprises.
This deep dive explains how DevOps & SRE teams mitigated impact, recovered systems, and prevented total failure.
🔗 https://shorturl.at/VLqxz

#CrowdStrikeOutage #DevOps #SRE #IncidentManagement #CyberResilience #CloudOps #PostMortem #ReliabilityEngineering #aws