10
10
## Summary
11
11
12
12
Add a readiness healthcheck option for apps. When the readiness healthcheck
13
- passes, the app is marked "ready" and the app will be routable. When the
14
- readiness healthcheck fails, the app is marked as "not ready" and its route will
15
- be removed from gorouter's route table.
13
+ passes, the app instance (AI) is marked "ready" and the AI will be routable.
14
+ When the readiness healthcheck fails, the AI is marked as "not ready" and its
15
+ route will be removed from gorouter's route table.
16
16
17
17
## Problem
18
18
19
19
With the current implementation of application healthchecks, when the
20
- application healthcheck detects that an app instance (AI) is unhealthy, then
21
- Diego will stop the AI, delete the AI, and reschedule a new AI.
20
+ application healthcheck detects that an AI is unhealthy, then Diego will stop
21
+ the AI, delete the AI, and reschedule a new AI.
22
22
23
23
This is too aggressive from some apps. There could be many reasons why a single
24
24
request could fail, but the app is actually running fine. Additionally, many
@@ -33,11 +33,25 @@ the app should be kept alive, but in a non-routable state.
33
33
We intend to support readiness healthchecks. (This was requested previously in
34
34
this [ issue] ( https://github.com/cloudfoundry/cloud_controller_ng/issues/1706 ) .)
35
35
This would be an additional healthcheck that app developers could configure.
36
- When the readiness healthcheck passes, the app is marked "ready" and the app
37
- will be routable. When the readiness healthcheck fails, the app is marked as
38
- "not ready" and its route will be removed from gorouter's route table.
39
- This new readiness healthcheck will give users a healthcheck option that is less
40
- drastic than the current option.
36
+ When the readiness healthcheck passes, the AI is marked "ready" and the AI will
37
+ be routable. When the readiness healthcheck fails, the AI is marked as "not
38
+ ready" and its route will be removed from gorouter's route table. This new
39
+ readiness healthcheck will give users a healthcheck option that is less drastic
40
+ than the current option.
41
+
42
+ ## Types of readiness healthcheck
43
+
44
+ Readiness healthcheck can be either "http" or "tcp" type. The format of healthcheck
45
+ type is [ similar to liveness
46
+ healthcheck] ( https://docs.cloudfoundry.org/devguide/deploy-apps/healthchecks.html ) .
47
+ The "process" healthcheck type will not be supported since it doesn't make sense
48
+ to have "process" readiness healthcheck type. Once any defined process exits AI
49
+ is marked as crashed.
50
+
51
+ ## Rolling deploys
52
+
53
+ Rolling deploys should take into account the AI routable status. Old AI should
54
+ be replaced with the new once new is running and routable.
41
55
42
56
### Architecture Overview
43
57
This feature will require changes in the following releases
@@ -52,15 +66,15 @@ This feature will require changes in the following releases
52
66
2 . The Diego executor will see these new readiness healthchecks on the desired
53
67
LRP and will run the healthchecker binary in the app container with
54
68
configuration provided.
55
- 3 . When the readiness healthcheck succeeds, the container will be marked as
56
- "ready". When the readiness healthcheck fails, the container will be marked
69
+ 3 . When the readiness healthcheck succeeds, the actual LRP will be marked as
70
+ "ready". When the readiness healthcheck fails, the actual LRP will be marked
57
71
as "not ready".
58
72
4 . When the route emitter gets route information, it will inspect if the AI is
59
73
ready or not ready. It will emit registration or unregistration messages as
60
- appropriate for the gorouter to consume
74
+ appropriate for the gorouter to consume.
61
75
62
76
### CC Design
63
- Users will be able to set the healthcheck via the app manifest.
77
+ Users will be able to set the readiness healthcheck via the app manifest.
64
78
65
79
```
66
80
applications:
@@ -103,9 +117,22 @@ The readiness healthcheck data will be apart of the desired LRP object.
103
117
},
104
118
```
105
119
120
+ ### Logging and Metrics
121
+
122
+ #### App logs
123
+
124
+ When AI readiness healthcheck succeeds a log line is printed to AI logs:
125
+ "Container became ready". When AI readiness healthcheck fails a log line is
126
+ printed to AI logs: "Container became not ready".
127
+
128
+ #### App events
129
+
130
+ When AI readiness healthcheck succeeds a new application event is emitted:
131
+ "app.ready". When AI readiness healthcheck fails a new event is emitted:
132
+ "app.notready".
106
133
107
134
### Open Questions
108
- * What logging and metrics would be helpful for app devs and operators?
135
+ * What metrics would be helpful for app devs and operators?
109
136
110
137
This work is ongoing. All comments and concerns are welcomed from the community.
111
138
Either add a comment here or reach out in slack in #wg-app-runtime-platform.
0 commit comments