Service Reliability Math that Every Engineer Should Know

Aug 8, 2021

Uptime

Downtime (Yearly)

99.00000%

3d 15h 39m

99.90000%

8h 45m 56s

99.99000%

52m 35s

99.99900%

5m 15s

99.99990%

31s

99.99999%

3s

For a service to be up 99.99999% of the time, it can only be down at most 3 seconds every year. Unfortunately, achieving that milestone is an arduous task, even for the most experienced site reliability engineering teams.

Visualizing service uptime is essential for all types of engineers. Know what your service can realistically deliver. Know what the customer requirements are. Adding an extra "9" might be linear in duration but is exponential in cost.

For the last 90 days, Stripe's API has had 99.999% uptime, or five 9's. That's a gold standard for many companies. Service-level agreements are more likely to count downtime on a quarterly or rolling basis rather than yearly. Calculating it like that gives you a bit more leeway on how you calculate it, but the magnitudes stay the same. Some will even remove "planned maintenance" from the downtime calculation.

I originally posted this on Twitter, and the response was overwhelming. Follow me on there for more valuable engineering snippets like this.