Server monitoring 101

I have been running a @yunohost server for ~5 years now, but there is one question I have never been able to answer: how loaded is my server? 👀

I know, I am a terrible sysad (actually, I am not a sysad, at all), because I have no idea how to determine:

  • if my server is running smoothly
  • if the server is under stress, and why
  • what applications are the heaviest
  • if there is the possibility of installing more apps
  • when peaks of stress are happening and what is causing them

In general, I would like to understand the fundamentals of server monitoring: what are the most critical metrics and what do they mean? What parameters do I have to keep an eye on?

I installed Prometheus and Grafana, but then I realized I have absolutely no idea what to do next… Do you have any suggestions?

I thought about watching some video tutorials, but I would not really know how they would relate to YunoHost installations.

Please, if possible reply in this thread of the YunoHost forum, so that we can keep track of this useful information also for others in the future. 🌻

Once I will have learned the basics, I would be very happy to write some pointers about this in the documentation, or an essential YunoHost Monitoring tutorial.

#sysAd #YunoHost #askFedi #help #systemAdministration #serverMonitoring #server #selfHosting #selfHost #Linux

YunoHost Monitoring 101

Hi everyone! I have been running a YunoHost server for ~5 years now, but there is one question I have never been able to reply to: how loaded is my server? 👀 I am a terrible administrator, I have no idea how to determine: if my server is running smoothly if the server is under stress, and why what applications are the heaviest if there is the possibility of installing more apps when peaks of stress are happening and what is causing them In general, I would like to understand the fundam...

YunoHost Forum

@tommi

I installed Prometheus and Grafana, but then I realized I have absolutely no idea what to do next… Do you have any suggestions?

Yes: https://prometheus.io/docs/guides/node-exporter/

Monitoring Linux host metrics with the Node Exporter | Prometheus

Prometheus project documentation for Monitoring Linux host metrics with the Node Exporter

@etam Wooooah that looks like a lot. I’ll dive in later.

Thank you 🌻

@tommi

Since I am not registered in the YunoHost forum, I will answer here:

You can log in via ssh and install btop, then you already have a good overview of the system.

(Translated with DeepL)

@yunohost

@tommi @yunohost hi
I'm using netdata, which comes with a lot of charts, and also mail notifications.

@petitmote @yunohost I tried NetData, but:

  • I’d still like to understand metrics and charts a bit better
  • It is now partially closed source, and it invites me to login to their cloud service even from my own instance… It’s very not nice.
  • @tommi @yunohost yep, I really don't like this too. However, it's really powerful, and comes whith a lot of app monitoring (for example, it monitors ip bans from fail2ban)

    @tommi @yunohost

    When you don't know what you're looking for, prometheus is overkill.

    (Prometheus is usually overkill.)

    Open a terminal window. SSH to your server. Install btop (Most Linux distros have it available.)

    Run btop.

    You now have graphs in your terminal to show you CPU load, CPU temperature, RAM use, Swap use, disk space use, disk I/O, and network I/O.

    If your RAM is full and your swap is in use, you don't have enough memory: either buy and install more or stop running so much stuff. There is a tree of processes and they can be ordered by CPU use or memory use.

    If your RAM is not full and some swap is in use, that's fine.

    If your CPUs are all busy all the time, the temperature is high, or your network usage seems high, those are all things to be concerned about.

    Thank you, @dashdsrdash! Somebody else justentioned btop, and I was already looking into it. This is a great start, but it seems to be limited to real-time monitoring… How about notifications of critical loads when I am not checking it btop?

    @yunohost

    @tommi @yunohost

    Generally, you don't care.

    What you do care about is service availability:

    • do my boxes have network connectivity?
    • do daemons answer requests from the Internet?
    • does mail go through?
    • does a web request which needs a database to function, work properly?

    All of those questions are best answered by an external service which will send you an alert when the answer is "no". Many of these have limited free plans which will work for you.

    If the services are running, then performance can wait until you have time to sit down and look at it.

    Now, the more important these services are to you, the more you will be interested in advanced monitoring, performance statistics, and alerting -- but that comes later. Prometheus may be appropriate then; or it may still be overkill.

    @dashdsrdash
    "Prometheus is usually overkill"
    Well, it somewhat depends. It's a somewhat different approach to monitoring. "Is my site working correctly" is usually "open http connection and see if I get a 200 code, and possibly a string in returned data". That's how nagios-age monitoring does it.
    Prometheus can also do that... But instead of such "black box monitoring" / behaviour based monitoring, it has the approach of "white box monitoring", i.e.
    [1/3] @tommi @yunohost
    each of components exposes detailed information about its internal workings. Number of requests and processing time of web server, backend response times, database load, interrupts on CPU, latency on disk reads... Which can get you very detailed information when you know what you're looking at/for, and be totally overwhelming when you don't.
    [2/3]
    @tommi @yunohost @dashdsrdash
    @tommi
    I would have to think whether I would recommend it at this point in the conversation, since it has caveats, security implications, etc, and I *like* Prometheus, but Zabbix exists, and may be easier to start with.
    [3/3]
    @yunohost @dashdsrdash
    @tommi
    Also I think YunoHost runs things in containers, which adds a lot of complexity, but can make it easier to see *what* is taking up resources.
    @yunohost @dashdsrdash

    @viq it does not use containers.

    Also for your information sadly Zabbix package is currently kind of broken, and needs some love to test and improve the WIP update&fix that is ongoing.

    Netdata and Grafana package exist and are running fine, but appart from other issues (such as mentioned somewhere else in the replies), they tends to apply a quite high server load on their own, which is a pitty :(
    @tommi @yunohost @dashdsrdash

    @viq @tommi @yunohost @dashdsrdash But Zabbix is overkill too, in this case…
    @breizh @tommi @dashdsrdash @yunohost uh, that heavily depends. From the stuff mentioned in the initial post, "server running smoothly/under stress", "when peaks of stress are happening", "is there possibility of installing more apps" are pretty easy graphs to generate (though "server running smoothly" is somewhat fractal in complexity). [1/n]
    @breizh @tommi @dashdsrdash @yunohost But "why is server under stress", "which applications are the heaviest", "what is causing peaks of stress" are again somewhat fractal in complexity. They require gathering *MUCH* more data, possibly including logs, a way to present both abstracted version and diving into details, and ability to search - and each of those requirements increases complexity and resource usage, and someone who knows what they're doing having set it up. [2/n]
    @breizh @tommi @dashdsrdash @yunohost Where "someone who knows what they're doing" can be the user, meaning they have to understand the system, understand the tools, and figure out how to get out of it all what's useful to them. Or it can be someone having set up such monitoring system, and made it generic enough for most cases - meaning it has all those features, which means it needs to be heavy enough to be able to support them. [3/n]
    @breizh @tommi @dashdsrdash @yunohost so for "does system usage look OK", simple graphite metrics from CPU and network card and disk usage will do that. Knowing *what* causes increased load, or even more so, what caused increased load 3 days ago, requires a lot of details, and increases complexity a lot.
    Though having read the description of monitorix that someone mentioned, if it's as lightweight as claimed and does what it claims, with proper configuration it may fit the request. [4/4]

    @viq @tommi @dashdsrdash @yunohost I’m a sysadmin and I love monitoring everything so I can to know exactly what’s going on.

    But we're talking about a small personal server, managed with an easy-to-use managing interface for someone that seems to just starting out.

    That's how I started out, too, and now you're getting into some issues that are certainly relevant to a more advanced use, but in my opinion far too advanced for the case we're interested in here. It just seems… off-topic to me.

    @breizh @tommi @dashdsrdash @yunohost being able to answer the "why / what causes it" in my view/experience does lead to the advanced use.
    I admit I may have skewed perception, but for a single host, prometheus and grafana should not generate noticeable load. I have a host on which docker reports those containers together use 150MB RAM.
    On yet another hand, on the list of applications I see Cockpit, which should allow to find out what was asked.
    @dashdsrdash @tommi @yunohost if you want pretty graphs arranged for you, you may want to check out netdata. I don't think it keeps history though, unless there's an option you need to set for it to do so.
    @dashdsrdash @tommi @yunohost there's also Sensu, which is an interesting mix of behavioural and metrics monitoring, but last I looked at it several years ago, it was getting closed and difficult to deploy, so I don't know how usable it is currently.
    @tommi I'm happy to see I'm not the only one struggling to find a good monitoring tool. It has been on my server to-do list for a long time now.
    I've currently settled with eZ Server Monitor but there is no history. I tested most apps in YunoHost but I find they are either too simple or too complicated.
    Can't wait to find a good tool!
    @tsgt @tommi Are Prometheus & Grafana not available for YunoHost?
    @jak2k @tommi They are but it didn’t seem to work very well out of the box and it looked quite complicated. It was a quick test, I sould probably dive into it a little more.
    @tommi Hey 👋 I finally had time to review people recommendations and I settled with Beszel. There is no YunoHost package but the install is very simple et you can use the Redirect app to make it available. The UI is very simple, there is all the useful data (including temperatures), you can monitor it over time (1 hour to 30 days) and it has alerting. Already love it!
    @tommi @yunohost maybe netdata will helps you.
    Tommi 🤯 (@[email protected])

    @[email protected] @[email protected] I tried NetData, but: 1. I’d still like to understand metrics and charts a bit better 2. It is now partially closed source, and it invites me to login to their cloud service even from my own instance… It’s very not nice.

    Pan
    @tommi @yunohost so maybe in command line for instant monitoring you can use btop (I prefer) or htop
    @tommi @yunohost it Meter the hardware and processes
    @tommi @yunohost IIRC this is one of better dashboards for looking at node exporter data in prometheus: https://grafana.com/grafana/dashboards/1860-node-exporter-full/
    Node Exporter Full | Grafana Labs

    Grafana Labs