Skip to content

Commit 1812a0b

Browse files
authored
Merge pull request #635 from mariash/rfc-readiness-healthchecks-mariash
Update readiness health checks RFC per comments in the discussion
2 parents d24c093 + 7024625 commit 1812a0b

File tree

1 file changed

+42
-15
lines changed

1 file changed

+42
-15
lines changed

toc/rfc/rfc-draft-readiness-healthchecks.md

Lines changed: 42 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -10,15 +10,15 @@
1010
## Summary
1111

1212
Add a readiness healthcheck option for apps. When the readiness healthcheck
13-
passes, the app is marked "ready" and the app will be routable. When the
14-
readiness healthcheck fails, the app is marked as "not ready" and its route will
15-
be removed from gorouter's route table.
13+
passes, the app instance (AI) is marked "ready" and the AI will be routable.
14+
When the readiness healthcheck fails, the AI is marked as "not ready" and its
15+
route will be removed from gorouter's route table.
1616

1717
## Problem
1818

1919
With the current implementation of application healthchecks, when the
20-
application healthcheck detects that an app instance (AI) is unhealthy, then
21-
Diego will stop the AI, delete the AI, and reschedule a new AI.
20+
application healthcheck detects that an AI is unhealthy, then Diego will stop
21+
the AI, delete the AI, and reschedule a new AI.
2222

2323
This is too aggressive from some apps. There could be many reasons why a single
2424
request could fail, but the app is actually running fine. Additionally, many
@@ -33,11 +33,25 @@ the app should be kept alive, but in a non-routable state.
3333
We intend to support readiness healthchecks. (This was requested previously in
3434
this [issue](https://github.com/cloudfoundry/cloud_controller_ng/issues/1706).)
3535
This would be an additional healthcheck that app developers could configure.
36-
When the readiness healthcheck passes, the app is marked "ready" and the app
37-
will be routable. When the readiness healthcheck fails, the app is marked as
38-
"not ready" and its route will be removed from gorouter's route table.
39-
This new readiness healthcheck will give users a healthcheck option that is less
40-
drastic than the current option.
36+
When the readiness healthcheck passes, the AI is marked "ready" and the AI will
37+
be routable. When the readiness healthcheck fails, the AI is marked as "not
38+
ready" and its route will be removed from gorouter's route table. This new
39+
readiness healthcheck will give users a healthcheck option that is less drastic
40+
than the current option.
41+
42+
## Types of readiness healthcheck
43+
44+
Readiness healthcheck can be either "http" or "tcp" type. The format of healthcheck
45+
type is [similar to liveness
46+
healthcheck](https://docs.cloudfoundry.org/devguide/deploy-apps/healthchecks.html).
47+
The "process" healthcheck type will not be supported since it doesn't make sense
48+
to have "process" readiness healthcheck type. Once any defined process exits AI
49+
is marked as crashed.
50+
51+
## Rolling deploys
52+
53+
Rolling deploys should take into account the AI routable status. Old AI should
54+
be replaced with the new once new is running and routable.
4155

4256
### Architecture Overview
4357
This feature will require changes in the following releases
@@ -52,15 +66,15 @@ This feature will require changes in the following releases
5266
2. The Diego executor will see these new readiness healthchecks on the desired
5367
LRP and will run the healthchecker binary in the app container with
5468
configuration provided.
55-
3. When the readiness healthcheck succeeds, the container will be marked as
56-
"ready". When the readiness healthcheck fails, the container will be marked
69+
3. When the readiness healthcheck succeeds, the actual LRP will be marked as
70+
"ready". When the readiness healthcheck fails, the actual LRP will be marked
5771
as "not ready".
5872
4. When the route emitter gets route information, it will inspect if the AI is
5973
ready or not ready. It will emit registration or unregistration messages as
60-
appropriate for the gorouter to consume
74+
appropriate for the gorouter to consume.
6175

6276
### CC Design
63-
Users will be able to set the healthcheck via the app manifest.
77+
Users will be able to set the readiness healthcheck via the app manifest.
6478

6579
```
6680
applications:
@@ -103,9 +117,22 @@ The readiness healthcheck data will be apart of the desired LRP object.
103117
},
104118
```
105119

120+
### Logging and Metrics
121+
122+
#### App logs
123+
124+
When AI readiness healthcheck succeeds a log line is printed to AI logs:
125+
"Container became ready". When AI readiness healthcheck fails a log line is
126+
printed to AI logs: "Container became not ready".
127+
128+
#### App events
129+
130+
When AI readiness healthcheck succeeds a new application event is emitted:
131+
"app.ready". When AI readiness healthcheck fails a new event is emitted:
132+
"app.notready".
106133

107134
### Open Questions
108-
* What logging and metrics would be helpful for app devs and operators?
135+
* What metrics would be helpful for app devs and operators?
109136

110137
This work is ongoing. All comments and concerns are welcomed from the community.
111138
Either add a comment here or reach out in slack in #wg-app-runtime-platform.

0 commit comments

Comments
 (0)