24 hours until the CfP for "SREday London 2026 Q1" closes: https://papercall.io/cfps/6456/submissions/new

#cfp #conference #Sre #Reliability #Devops #Cloud #Ai

PaperCall.io

System Administration, Week 1: Core Principles

In this video, we present a few core principles that will guide us throughout the semester: Scalability, Security, and Simplicity. We'll also get to know a few basic "laws", well known by any System Administrator. If you're wondering what all this has to do with Legos, please tune in...

https://youtu.be/bfqP6PlS6Og

#SysAdmin #devops #sre

CS615 System Administration, Week 01, Segment 03 - SysAdmin Core Principles and Rules

YouTube

🚀 The Best of #CloudComputing & #DevOps in 2025

#InfoQ published some serious heavy hitters last year. These 5 deep dives are essential reading for engineers who want to #StayAhead of the curve 👇

➡️ Designing Resilient Event-Driven Systems at Scale by Rajesh Kumar Pandey
https://bit.ly/3HlYOpa

➡️ Being Functionless: How to Develop a Serverless Mindset to Write Less Code! by Sheen Brisals
https://bit.ly/4rhWXmM

➡️ Checklist for Kubernetes in Production: Best Practices for SREs by Utku Darilmaz
https://bit.ly/43GZ4rO

➡️ When Reverse Proxies Surprise You: Hard Lessons from Operating at Scale by Mitendra Mahto
https://bit.ly/4nZJTR3

➡️ Why Is My Docker Image So Big? A Deep Dive with “dive” to Find the Bloat by Chirag Agrawal
https://bit.ly/44os5ar

📚 Knowledge is power! 💪

#SystemDesign #Serverless #Kubernetes #SRE #Docker #CloudNative

A lot of “scalability work” is really “making side effects predictable.”

Idempotency, retries, timeouts, and clear ownership of state sound boring until your first incident teaches you they were the product all along.

When a system is calm under failure, it is not because it never fails.
It is because 𝗶𝘁 𝗳𝗮𝗶𝗹𝘀 𝗶𝗻 𝘄𝗮𝘆𝘀 𝘆𝗼𝘂 𝗽𝗹𝗮𝗻𝗻𝗲𝗱 𝗳𝗼𝗿.

#SoftwareEngineering #DistributedSystems #Reliability #SRE #SystemDesign #EngineeringBasics #ByernNotes

I’m currently looking for a full-time or contract work in SRE / DevOps / IT Operations.
Portland, OR. Open to hybrid, on-site, or remote. Willing to relocate to Seattle.
Schedule: Any

Tools: Python, Bash, PowerShell, Terraform, Jenkins, Puppet, Ansible, Splunk, Grafana, BigPanda
CI/CD: Jenkins, Bitbucket, container builds with Docker/Podman, deployments to Openshift.

I have worked as an IT Operations Engineer in enterprise production environments, supporting on-prem VMware (RHEL and Windows) alongside Azure and AWS. My role included on-call rotations and incident command for high-severity outages.

My responsibilities included monitoring, alert triage, and root cause analysis across infrastructure and application layers, coordinating with infrastructure, development, and product teams to isolate failures, restore service, and prevent recurrence.

My focus was developing Python tooling for automation and production support, with Ansible used for routine infrastructure tasks.

I worked extensively with Splunk, Grafana, and BigPanda, building dashboards for investigation, event correlation, and metrics and trends.

Additional experience includes:

Terraform for cloud provisioning and Puppet for configuration enforcement

Network troubleshooting across Cisco and Arista environments

Production database support: Oracle, SQL Server, MongoDB, Postgres

My DM’s are open! Feel free to message me for my resume.

Git: github.com/Aleph0x
Web: https://www.al3f.com

#fedihire #hiring #getfedihired #sre #DevOps #SysADmin

The Infinite Archive

formless and empty

A huge thank you to our #opensource community for landing Coroot in the top 30 most popular observability project on Github (out of 3,300+ entries!)

Love #Coroot and want to help share it with a world? Add your ⭐️ to the galaxy: https://github.com/coroot/coroot

#linux #ebpf #observability #softwarelibre #devops #sre #tech

System Administration, Week 1: The Job of a System Administrator

In this video, we try to capture the job of a System Administrator. We show what things SysAdmins may encounter in their day to day routine, ranging from blade servers and routers to cable ties and power tools and everything in between. As we try to define the job, we find out it's not quite that easy...

It's duct tape and WD40 all the way down.

https://youtu.be/osIO9CbqHQo

#sysadmin #devops #sre

CS615 System Administration, Week 01, Segment 02 - The Job of a System Administrator

YouTube

After years in DevOps, I learned the most not from certifications, but from 2AM production outages and bulk-dollar cloud mistakes.
This post breaks down what 100 real incidents taught me about reliability, cost, and calm decision-making.

🔗 https://shorturl.at/Cr4oJ

#DevOps #CloudEngineering #AWS #SRE #ProductionIncidents #CloudCosts #FinOps

What 100 Outages and a Million-Dollar Cloud Bill Taught Me

If you spend enough years in DevOps and Cloud, you realize the best lessons don’t come from certifications, vendor slides, or slick demos…

Medium
Một số lỗi thực tế trên môi trường sản xuất không gây sập hệ thống hay hiện lỗi rõ ràng, nhưng lại dẫn đến trạng thái sai lệch: người dùng bị chặn, giao dịch không thực hiện, webhook không gửi được... Dữ liệu "im lặng" lệch hướng trong khi mọi thứ vẫn hiển thị bình thường. Những lỗi này ẩn mình trong glue code, sự chênh lệch môi trường, cạnh thời gian xử lý hoặc các luồng dự phòng bị lãng quên. Có phải sản phẩm thực sự "ma ám"? 🤯 #SoftwareEngineering #SystemReliability #Debugging #SRE #LỗiẨn #K

Rejected again. Picked someone over me, again. Sisyphean rollercoaster all over again. If this happens one more time, I will have to start counting on two hands.

Seven months unemployed now.

The depression is really smacking me around.

I appreciate all the help with leads and roles that didn't work out. Still trying to find work.

#SRE #FediHire