You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/non-functional-requirements/reliability.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -54,7 +54,7 @@ Look into the difference between **snapshot** and **incremental** backups. A goo
54
54
55
55
## Target Uptime & Failing Gracefully
56
56
57
-
It's a known fact that systems cannot target 100% uptime. There are too many factors in today's software systems to achieve this, many outside of our control. Even a service that never gets updated and is 100% bug free will fail. Upstream DNS servers have issues all the time. Hardware breaks. Power outages, backup generators fail. The world is chaotic. Good services target a number of "9's" of uptime. Ie, 99.99% uptime means that the system has a "budget" of 4 minutes and 22 seconds of downtime each month. Some months might achieve 100% uptime, which means that the budget gets rolled over to the next month. What uptime means is different for everybody, and it is up to the service to define.
57
+
It's a known fact that systems cannot target 100% uptime. There are too many factors in today's software systems to achieve this, many outside of our control. Even a service that never gets updated and is 100% bug free will fail. Upstream DNS servers have issues all the time. Hardware breaks. Power outages, backup generators fail. The world is chaotic. Good services target a number of "9's" of uptime. i.e., 99.99% uptime means that the system has a "budget" of 4 minutes and 22 seconds of downtime each month. Some months might achieve 100% uptime, which means that the budget gets rolled over to the next month. What uptime means is different for everybody, and it is up to the service to define.
58
58
59
59
A good practice is to use any leftover budget at the end of the period (ie, year, quarter), to intentionally take that service down, and ensure that the rest of your systems fail as expected. Often, other engineers and services come to rely on that additional achieved availability, and it can be healthy to ensure that systems fail gracefully.
0 commit comments