Listening to @dmagliola talking about happy sidekiq queues

Could've used some of this a while ago 👀

#hazelAtSRECon #srecon

@dmagliola "no matter what we do the queue problem keeps happening"

Lol. Yes. Too real.

#hazelAtSRECon #srecon

@dmagliola answer to people sticking everything in the "important" queues: create queues for purpose, not for priority

#hazelAtSRECon #srecon

@dmagliola Why am I making a story about obvious mistakes and telling you this? Because it happens!

A series of obvious steps can lead to things that look wrong in hindsight

#hazelAtSRECon #srecon

@dmagliola queues are broken really means

A job didn't run...
... Yet (but I think it should've)
... So it's late (in my opinion)

Ergo, "broken"

Queues should give indication themselves as to what it means for them to perform as expected or not. The vocabulary we use gets us trapped in locally bad decisions.

#hazelAtSRECon #srecon

@dmagliola
"The one thing we care about is latency"

Yes! We ran into this the hard way when figuring out what worked for hachyderm.

Latency or bust.

#hazelAtSRECon #srecon

@dmagliola name queues after their latency

within_X_time

And then keep that promise!

(This would make mastodon's queues so much easier to understand and optimize, omg)

#hazelAtSRECon #srecon

@dmagliola I particularly like the latency thing here because it ties directly into knowing how to build metrics and alerts around things. One thing hachyderm ran into was an inability to figure out (for a long time) how bad "bad" was for various queues.

We figured it out eventually, but we did it by black box testing, and it was awful

#hazelAtSRECon #srecon

@dmagliola making queues built around latency also let's you enforce the contract however you can. Which also means evicting the jobs whenever they violate the contract.

If a job wants to run very soon, it has to start up fast and complete fast.

#hazelAtSRECon #srecon

@dmagliola this would've been excellent for hachyderm as well. We would get queue latencies of hours because the database was slow and the file system was jacked. Having something like "jobs must finish in X time" would've given us upper bounds to judge the system performance against and would've helped us narrow more unknowns down. Particularly query times being unreasonably long.

#hazelAtSRECon #srecon

GitHub - dmagliola/happy_queues: Companion information for my Rubyconf 2022 talk: "The secret to happy queues"

Companion information for my Rubyconf 2022 talk: "The secret to happy queues" - GitHub - dmagliola/happy_queues: Companion information for my Rubyconf 2022 talk: "The secret to happy...

GitHub

@hazelweakly I talk more about how to set up observability for your metrics in the talk repo, near the end. Hope that helps!

https://github.com/dmagliola/happy_queues

GitHub - dmagliola/happy_queues: Companion information for my Rubyconf 2022 talk: "The secret to happy queues"

Companion information for my Rubyconf 2022 talk: "The secret to happy queues" - GitHub - dmagliola/happy_queues: Companion information for my Rubyconf 2022 talk: "The secret to happy...

GitHub
@dmagliola it will! I'll take a look at that :)