Mastodawn

The fourth talk recording from DevOpsDays Zürich 2026 is online. 🎥

Melchior Thambipillai on Automated Root Cause Analysis: real-time dependency graphs, RAGs, and LLMs tracing cascading degradation back to the manual action that triggered it.

Not a concept talk. A live demo from production.

Watch on Vimeo: https://vimeo.com/1195129638
Or YouTube: https://youtu.be/l6YmnyHfVJw

#DevOpsDays #Observability #AIOps

Calling all papers 6d ago

24 hours until the CfP for "Charlemos de SQL Server" closes: https://sessionize.com/charlemos-de-sql-server

#cfp #conference ##aiops ##agenticsre ##sretrends

Charlemos de SQL Server: Call for Speakers

Charlemos de SQL Server es el punto de encuentro global para la comunidad de SQL en español. Impulsado conjuntamente por la Comunidad de SQL Server Es...

connexify May 29

Myth: cloud-native NMS is always better. Reality: telcos with sovereign data requirements, low-latency NOCs, and air-gapped sites need on-premise. The future is hybrid, not cloud-only. #TelecomAI #AIOps #StreamingTelemetry

Agree or disagree? Reply below.

Dmitrijs Zaicevs | AI Agents May 27

A lot of project delays are not execution problems. They are update problems.

Old way: Trello boards, Slack followups, and weekly status meetings to figure out what is already off track.

New way: an AI project copilot that reads task changes, docs, and team chatter, then updates timelines and flags blockers in real time.

That means fewer meetings and fewer surprises. If you want to automate your business, DM me. #AIOps #ProjectManagement #WorkflowAutomation

connexify May 27

Myth: 'we cannot monitor multi-vendor networks easily'. Reality: we do it for ISPs running MikroTik, Ubiquiti, Cambium, Huawei OLTs, Nokia OLTs, Calix, and FiberHome — together. The 'cannot' is a tooling problem, not a physics problem. #TelecomAI #AIOps #StreamingTelemetry

Calling all papers May 21

24 hours until the CfP for "Microsoft Speakers Hub en Español - Fabric Days 2026" closes: https://sessionize.com/microsoft-speakers-hub-en-espanol

#cfp #conference ##aiops ##agenticsre ##sretrends

Microsoft Speakers Hub en Español - Fabric Days 2026: Call for Speakers

FABRIC DAY Online es un evento virtual organizado por la comunidad Microsoft Speakers Hub en Español, en el marco del movimiento global en torno a Mic...

Beth Pariseau May 21

How can enterprise IT buyers choose among the plethora of AI automation tools now on the market from major vendors? Can they trust AI agent-driven infrastructure automation yet? Should they?

Steven Dickens, CEO and principal analyst at HyperFrame Research, offers his answers to these questions and more from the show floor at #DellTechWorld.

In today’s episode, we’ll cover…

· #Dell vs #AI infrastructure competitors: size matters

· The rollout of Dell Automation Platform

· The agentic #AIOps dilemma for IT organizations

And more!

Check it out here: https://youtu.be/ZfbiNMlfCO0

IT Ops Query: Digesting Dell Technologies World AI automation news

YouTube

Habr May 20

Как я Zabbix с LLM дружил в свободное время. Архитектурный обзор взаимодействия с нейросетью. Часть 3 HLD и немного LLD

Это третья статья из цикла о том, как я пытался сделать алерты Zabbix в домашней лаборатории чуть умнее, прикрутив к ним локальную LLM и не получить на выходе архитектурного монстра Франкенштейна. В первой части мы разобрались с постановкой задачи и ТЗ, затем выбрали себе фаворита из локальных LLM, теперь же займемся скучным занятием- проектированием. В этой статье рассмотрим составление HLD и почему это должен делать человек, а что уже можно отдать нейросети в помощь. В процессе написания материал разросся до неимоверных размеров, поэтому пришлось поделить его аж на четыре части. Впереди осталась самая интересная заключительная часть с тем, что получилось на выходе. Ее планирую подготовить за 2-3 недели, т.к. это просто хобби. Часть 1: Вводная и формирование ТЗ Часть 2: Выбор локальной LLM Часть 3: Формирование HLD и немного LLD -> вы здесь Часть 4: Что из этого вышло

https://habr.com/ru/articles/1037466/

#zabbix #llm #aiops #мониторинг #алерты #автоматизация #itинфраструктура #hld #lld #c4

Как я Zabbix с LLM дружил в свободное время. Архитектурный обзор взаимодействия с нейросетью. Часть 3 HLD и немного LLD

Лапки котику помогли! Это третья статья из цикла о том, как при правильно поставленной задаче и грамотном подходе к архитектуре можно собрать реализацию self-hosted системы по анализу алертов при...

Хабр

Khan May 16

New survey on arXiv: Large Language Models for Agentic NetOps and AIOps.

The paper looks at LLM-based agents for incident diagnosis, root-cause analysis, configuration and change planning, policy checking, human approval, and safer operational decisions.

The core argument: reliability will depend less on the model alone, and more on evidence traces, tool boundaries, verification gates, rollback, and governance.

https://arxiv.org/abs/2605.12729

#AIOps #NetOps #LLM #SRE

Large Language Models for Agentic NetOps and AIOps: Architectures, Evaluation, and Safety

Large language models are increasingly being used to support network operations (NetOps) and artificial intelligence for IT operations (AIOps), including incident investigation, root-cause analysis, configuration synthesis, and limited self-healing. In both NetOps and AIOps, this shift is changing how tasks are managed. Agent-based operations work as workflows, from gathering evidence to taking action, following permissions, policies, and checks, and providing rollback options when necessary. This is crucial because operational decisions can have instant impacts. To make the argument concrete, we organise the relevant literature around the hierarchy of autonomy, tool scope, evidence traces, and assurance contracts. These contracts define what an agent may observe, propose, and execute. They also define the checks that must pass before any action is allowed. A consistent pattern appears across work on telemetry query recommendation, diagnosis, root-cause analysis, configuration synthesis, change planning, and limited self-healing. Operational reliability does not come chiefly from the model itself. It depends on the machinery around the model. We also argue that evaluation should go beyond static question answering. Agentic NetOps and AIOps systems require workflow-centred evaluation, including trace quality, bounded tool use, safe proposal generation, replay in sandboxed environments, and canary trials with rollback-aware scoring. Without these measures, a system may appear robust yet remain too fragile. Finally, we examine security, privacy, and governance risks that become acute when agents sit close to operational control surfaces. Taken together, the survey concludes that progress in intelligent NetOps and AIOps will depend on treating autonomy as a constrained operational control problem, whose outputs must be reliable, auditable, and securely deployable.

arXiv.org

Habr May 12

Как я Zabbix с LLM дружил в свободное время. Архитектурный обзор взаимодействия с нейросетью. Часть 2 «Выбор модели»

Это вторая статья из цикла о том, как я пытался сделать алерты Zabbix в домашней лаборатории чуть умнее, прикрутив к ним локальную LLM и не получить на выходе архитектурного монстра Франкенштейна. В первой части мы разобрались с постановкой задачи и ТЗ, теперь же пришло время выбрать саму модель. В этой части мы формируем критерии к LLM (отдельно от общего ТЗ), сравниваем небольшие open-weight модели для self-hosted сценария и делаем выбор одной из моделей. В процессе написания материал разросся до неимоверных размеров, поэтому пришлось поделить его аж на четыре части. Ссылки буду добавлять по мере выпуска (примерно раз в одну-две недели). Часть 1: Вводная и формирование ТЗ Часть 2: Выбор локальной LLM -> вы здесь Часть 3: Формирование HLD и немного LLD Часть 4: Что из этого вышло

https://habr.com/ru/articles/1033798/

#zabbix #llm #aiops #мониторинг #алерты #itинфраструктура #rca

Как я Zabbix с LLM дружил в свободное время. Архитектурный обзор взаимодействия с нейросетью. Часть 2 «Выбор модели»

Введение У котика есть не только лапки Это вторая статья из цикла о том, как при правильно поставленной задаче и грамотном подходе к архитектуре можно собрать реализацию self-hosted системы по анализу...

Хабр