Trying to understand the different selfhosted monitoring solutions

Note: It seems my original post from last week didn't get posted on lemmy.world from kbin (I can't seem to find it) so I'm reposting it. Apologies to those who may have already seen this....

https://kbin.social/m/selfhosted@lemmy.world/t/279506

Trying to understand the different selfhosted monitoring solutions - selfhosted - kbin.social

Note: It seems my original post from last week didn't get posted on lemmy.world from kbin (I can't seem to find it) so I'm reposting it. Apologies to those who may have already seen this....

check_mk is what I use at home and at work, it’s a fork of nagios/icinga, works with agents, nagios plugins, or snmp, and if somehow you can’t find what you want to monitor, writing custom checks is as easy as writing a bash script

I opted for checkmk as well and don’t want to switch. It’s got a good default for Linux monitoring and it will tell me about random things to fix after reboots, or that memory/disc is getting low so I can fix it quickly.

When monitoring 15 virtual machines on one physical the default of checking every minute for all machines raised the temp over 80 degrees Celsius on the physical machine and triggered a warning. Checking every five minutes is more that I need, so I went with that change.

That’s odd. I’m currently monitoring 17 vms on one host along with a handful of physical devices. Nothing like the issues you’ve encountered has happened.