RE: https://en.osm.town/@amapanda/116137039112062611

Any recommendations for how to turn my #logs into some #metrics that my #Prometheus server can injest? Some alternative to #mtail? (this problem aside, it looked like a dead project for a while)

#o11y

I'm using #mtail to parse my logs and export that as a local #prometheus exporter server.

I got into a panic, because suddenly loads of production machines were showing 0 accesses, even though I could see they were in use.

Looks like mtail was outputting “total number of requests” in scientific notation, causing prometheus to essentially see nothing... 🤦🏻‍♀️🤦🏻‍♀️🤦🏻‍♀️

`accesses_total{prog="parse.mtail"} 2.991904e+06`

I tried calculating the max latency of an #Apache server for every #prometheus scrape period using the #grok exporter and summaries, to see exactly how bad are our worse cases. Unluckily, it didn't work:

https://www.grulic.org.ar/~mdione/glob/posts/monitoring-the-maximum-latency-for-a-scrape-period-with-prometheus/

#SRE #mtail

Update: I mention implementing a new exporter. #Python has all the pieces: `inotify` to reimplement `tail -F`; reading grok patterns; and web services (with #flask or #FastAPI). That first part seems the hardest, but I could just `popen('tail -F')` :)

Monitoring the maximum latency for a scrape period with prometheus

Ever since I watched (several times) Gil Tene's talk about how we are measuring latency wrong I set myself to try get such values in a graph. His thesis is that we are using histograms and other simil

.:: Marcos Dione/StyXman's glob ::.