24 hours until the CfP for "SREday London 2026 Q1" closes: https://papercall.io/cfps/6456/submissions/new
24 hours until the CfP for "SREday London 2026 Q1" closes: https://papercall.io/cfps/6456/submissions/new
System Administration, Week 1: Core Principles
In this video, we present a few core principles that will guide us throughout the semester: Scalability, Security, and Simplicity. We'll also get to know a few basic "laws", well known by any System Administrator. If you're wondering what all this has to do with Legos, please tune in...

🚀 The Best of #CloudComputing & #DevOps in 2025
#InfoQ published some serious heavy hitters last year. These 5 deep dives are essential reading for engineers who want to #StayAhead of the curve 👇
➡️ Designing Resilient Event-Driven Systems at Scale by Rajesh Kumar Pandey
https://bit.ly/3HlYOpa
➡️ Being Functionless: How to Develop a Serverless Mindset to Write Less Code! by Sheen Brisals
https://bit.ly/4rhWXmM
➡️ Checklist for Kubernetes in Production: Best Practices for SREs by Utku Darilmaz
https://bit.ly/43GZ4rO
➡️ When Reverse Proxies Surprise You: Hard Lessons from Operating at Scale by Mitendra Mahto
https://bit.ly/4nZJTR3
➡️ Why Is My Docker Image So Big? A Deep Dive with “dive” to Find the Bloat by Chirag Agrawal
https://bit.ly/44os5ar
📚 Knowledge is power! 💪
#SystemDesign #Serverless #Kubernetes #SRE #Docker #CloudNative
A lot of “scalability work” is really “making side effects predictable.”
Idempotency, retries, timeouts, and clear ownership of state sound boring until your first incident teaches you they were the product all along.
When a system is calm under failure, it is not because it never fails.
It is because 𝗶𝘁 𝗳𝗮𝗶𝗹𝘀 𝗶𝗻 𝘄𝗮𝘆𝘀 𝘆𝗼𝘂 𝗽𝗹𝗮𝗻𝗻𝗲𝗱 𝗳𝗼𝗿.
#SoftwareEngineering #DistributedSystems #Reliability #SRE #SystemDesign #EngineeringBasics #ByernNotes
I’m currently looking for a full-time or contract work in SRE / DevOps / IT Operations.
Portland, OR. Open to hybrid, on-site, or remote. Willing to relocate to Seattle.
Schedule: Any
Tools: Python, Bash, PowerShell, Terraform, Jenkins, Puppet, Ansible, Splunk, Grafana, BigPanda
CI/CD: Jenkins, Bitbucket, container builds with Docker/Podman, deployments to Openshift.
I have worked as an IT Operations Engineer in enterprise production environments, supporting on-prem VMware (RHEL and Windows) alongside Azure and AWS. My role included on-call rotations and incident command for high-severity outages.
My responsibilities included monitoring, alert triage, and root cause analysis across infrastructure and application layers, coordinating with infrastructure, development, and product teams to isolate failures, restore service, and prevent recurrence.
My focus was developing Python tooling for automation and production support, with Ansible used for routine infrastructure tasks.
I worked extensively with Splunk, Grafana, and BigPanda, building dashboards for investigation, event correlation, and metrics and trends.
Additional experience includes:
Terraform for cloud provisioning and Puppet for configuration enforcement
Network troubleshooting across Cisco and Arista environments
Production database support: Oracle, SQL Server, MongoDB, Postgres
My DM’s are open! Feel free to message me for my resume.
Git: github.com/Aleph0x
Web: https://www.al3f.com
A huge thank you to our #opensource community for landing Coroot in the top 30 most popular observability project on Github (out of 3,300+ entries!)
Love #Coroot and want to help share it with a world? Add your ⭐️ to the galaxy: https://github.com/coroot/coroot
#linux #ebpf #observability #softwarelibre #devops #sre #tech
System Administration, Week 1: The Job of a System Administrator
In this video, we try to capture the job of a System Administrator. We show what things SysAdmins may encounter in their day to day routine, ranging from blade servers and routers to cable ties and power tools and everything in between. As we try to define the job, we find out it's not quite that easy...
It's duct tape and WD40 all the way down.

After years in DevOps, I learned the most not from certifications, but from 2AM production outages and bulk-dollar cloud mistakes.
This post breaks down what 100 real incidents taught me about reliability, cost, and calm decision-making.
#DevOps #CloudEngineering #AWS #SRE #ProductionIncidents #CloudCosts #FinOps
Rejected again. Picked someone over me, again. Sisyphean rollercoaster all over again. If this happens one more time, I will have to start counting on two hands.
Seven months unemployed now.
The depression is really smacking me around.
I appreciate all the help with leads and roles that didn't work out. Still trying to find work.