Earlier this week, I read the "The Tail At Scale" paper by Jeffrey Dean and Luiz André Barroso.

I really liked the intuitive techniques described in it.

I wrote the below blog post to try draw an analogy with a physical world example, and to summarize my main takeaways from it on:

- What is tail latency
- Why should we care about it
- Why reducing component level variability is not sufficient
- Two classes of patterns to become tail-tolerant

https://blog.techlanika.com/reducing-tail-latency-three-patterns-to-improve-responsiveness-of-large-scale-systems-47b5664baf61

Reducing Tail Latency: Three Patterns to Improve Responsiveness of Large-Scale Systems

Let’s say you run a travel agency. You deal with customer requests to look up travel information from different datasets. To start with, you are the only person, and soon you start getting a lot of…

Medium
@kalyanaj nice write up. Some of these techniques were used at Uber as well. A variant we used at Netflix is to use short timeouts and always retry against a different instance. It was built into the load balancer code (NetflixOSS Ribbon) that most systems used.

@adrianco Thanks for your feedback, and thanks for sharing the additional context.

Yes, that sounds similar to the hedged requests approach, will check out the Ribbon client-side load balancer!