RE: https://en.osm.town/@amapanda/116137039112062611
Any recommendations for how to turn my #logs into some #metrics that my #Prometheus server can injest? Some alternative to #mtail? (this problem aside, it looked like a dead project for a while)
RE: https://en.osm.town/@amapanda/116137039112062611
Any recommendations for how to turn my #logs into some #metrics that my #Prometheus server can injest? Some alternative to #mtail? (this problem aside, it looked like a dead project for a while)
I'm using #mtail to parse my logs and export that as a local #prometheus exporter server.
I got into a panic, because suddenly loads of production machines were showing 0 accesses, even though I could see they were in use.
Looks like mtail was outputting “total number of requests” in scientific notation, causing prometheus to essentially see nothing... 🤦🏻♀️🤦🏻♀️🤦🏻♀️
`accesses_total{prog="parse.mtail"} 2.991904e+06`
I tried calculating the max latency of an #Apache server for every #prometheus scrape period using the #grok exporter and summaries, to see exactly how bad are our worse cases. Unluckily, it didn't work:
Update: I mention implementing a new exporter. #Python has all the pieces: `inotify` to reimplement `tail -F`; reading grok patterns; and web services (with #flask or #FastAPI). That first part seems the hardest, but I could just `popen('tail -F')` :)
Ever since I watched (several times) Gil Tene's talk about how we are measuring latency wrong I set myself to try get such values in a graph. His thesis is that we are using histograms and other simil