Skip to content

Commit e2efc34

Browse files
wilybracejkatz
authored andcommitted
prevents misleading error-level logs
The apiserver and nsqd processes do not gracefully handle the empty TCP connections generated by kubelet as part of probe execution, causing misleading error messages to accumulate in the logs at the probe polling intervals. For the nsqd process, there exists an HTTP endpoint that can be used for liveness testing, but it not useful for readiness testing as the `/ping` endpoint does not reflect readiness for connections on port 4150. Thus, the liveness test is moved to an HTTP test of the `/ping` endpoint and the readiness test is removed. Furthermore, each access to `/ping` is logged as an INFO level log in the nsqd log files. While this is not as misleading as ERROR level logging, it still represents unnecessary logging and the log level for the nsqd process has been moved up to WARNING. For the pgo-apiserver, a new lightweight route is added to the conventional location of `/healthz` with a simple `ok` response with the HTTP 200. As a result, NOAUTH_ROUTES no longer optionally activates deferred certificate checking - it is required to support `/healthz` as kubelet does not present a certificate when performing HTTPS checks. NOAUTH_ROUTES may still be used to enable unauthenticated access to the existing whitelist of routes. Fixes missing probes in Ansible template [ch6429] [ch6453]
1 parent fabdad1 commit e2efc34

File tree

7 files changed

+59
-23
lines changed

7 files changed

+59
-23
lines changed

ansible/roles/pgo-operator/templates/deployment.json.j2

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,24 @@
3030
"containerPort": {{ pgo_apiserver_port }}
3131
}
3232
],
33+
"readinessProbe": {
34+
"httpGet": {
35+
"path": "/healthz",
36+
"port": {{ pgo_apiserver_port }},
37+
"scheme": "HTTPS"
38+
},
39+
"initialDelaySeconds": 15,
40+
"periodSeconds": 5
41+
},
42+
"livenessProbe": {
43+
"httpGet": {
44+
"path": "/healthz",
45+
"port": {{ pgo_apiserver_port }},
46+
"scheme": "HTTPS"
47+
},
48+
"initialDelaySeconds": 15,
49+
"periodSeconds": 5
50+
},
3351
"env": [
3452
{
3553
"name": "CRUNCHY_DEBUG",
@@ -175,6 +193,14 @@
175193
"image": "{% if pgo_event_image | default('') != '' %}{{ pgo_event_image }}
176194
{%- else %}{{ pgo_image_prefix }}/pgo-event:{{ pgo_image_tag }}
177195
{%- endif %}",
196+
"livenessProbe": {
197+
"httpGet": {
198+
"path": "/ping",
199+
"port": 4151,
200+
},
201+
"initialDelaySeconds": 15,
202+
"periodSeconds": 5
203+
},
178204
"env": [
179205
{
180206
"name": "TIMEOUT",

apiserver.go

Lines changed: 10 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -109,6 +109,7 @@ func main() {
109109
r := mux.NewRouter()
110110
r.HandleFunc("/version", versionservice.VersionHandler)
111111
r.HandleFunc("/health", versionservice.HealthHandler)
112+
r.HandleFunc("/healthz", versionservice.HealthyHandler)
112113
r.HandleFunc("/policies", policyservice.CreatePolicyHandler)
113114
r.HandleFunc("/showpolicies", policyservice.ShowPolicyHandler).Methods("POST")
114115
//here
@@ -185,11 +186,15 @@ func main() {
185186
r.HandleFunc("/benchmarkdelete", benchmarkservice.DeleteBenchmarkHandler).Methods("POST")
186187
r.HandleFunc("/benchmarkshow", benchmarkservice.ShowBenchmarkHandler).Methods("POST")
187188

188-
// Optionally lighten mTLS requirement if some paths don't require certs
189-
certsVerify := tls.RequireAndVerifyClientCert
190-
if !disableTLS && len(skipAuthRoutes) > 0 {
191-
certsVerify = tls.VerifyClientCertIfGiven
192-
optCertEnforcer, err := apiserver.NewCertEnforcer(strings.Split(skipAuthRoutes, ","))
189+
certsVerify := tls.VerifyClientCertIfGiven
190+
skipAuth := []string{
191+
"/healthz", // Required for kube probes
192+
}
193+
if !disableTLS {
194+
if len(skipAuthRoutes) > 0 {
195+
skipAuth = append(skipAuth, strings.Split(skipAuthRoutes, ",")...)
196+
}
197+
optCertEnforcer, err := apiserver.NewCertEnforcer(skipAuth)
193198
if err != nil {
194199
log.Fatalf("NOAUTH_ROUTES configured incorrectly: %s", err)
195200
os.Exit(2)

apiserver/middleware.go

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,8 @@ type certEnforcer struct {
3333
func NewCertEnforcer(reqRoutes []string) (*certEnforcer, error) {
3434
allowed := map[string]struct{}{
3535
// List of allowed routes is part of the published documentation
36-
"/health": struct{}{},
36+
"/health": struct{}{},
37+
"/healthz": struct{}{},
3738
}
3839

3940
ce := &certEnforcer{

apiserver/versionservice/versionservice.go

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -51,3 +51,10 @@ func HealthHandler(w http.ResponseWriter, r *http.Request) {
5151

5252
json.NewEncoder(w).Encode(resp)
5353
}
54+
55+
// HealthyHandler follows the health endpoint convention of HTTP/200 and
56+
// body "ok" used by other cloud services, typically on /healthz
57+
func HealthyHandler(w http.ResponseWriter, r *http.Request) {
58+
w.WriteHeader(http.StatusOK)
59+
w.Write([]byte("ok"))
60+
}

bin/pgo-event/pgo-event.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ sleep 3
3131

3232
echo "pgo-event starting nsqd"
3333

34-
/usr/local/bin/nsqd --data-path=/tmp --http-address=0.0.0.0:4151 --tcp-address=0.0.0.0:4150 &
34+
/usr/local/bin/nsqd --data-path=/tmp --http-address=0.0.0.0:4151 --tcp-address=0.0.0.0:4150 --log-level=warn &
3535

3636
echo "pgo-event waiting till sigterm"
3737

deploy/deployment.json

Lines changed: 11 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -29,15 +29,19 @@
2929
}
3030
],
3131
"readinessProbe": {
32-
"tcpSocket": {
33-
"port": $PGO_APISERVER_PORT
32+
"httpGet": {
33+
"path": "/healthz",
34+
"port": $PGO_APISERVER_PORT,
35+
"scheme": "HTTPS"
3436
},
3537
"initialDelaySeconds": 15,
3638
"periodSeconds": 5
3739
},
3840
"livenessProbe": {
39-
"tcpSocket": {
40-
"port": $PGO_APISERVER_PORT
41+
"httpGet": {
42+
"path": "/healthz",
43+
"port": $PGO_APISERVER_PORT,
44+
"scheme": "HTTPS"
4145
},
4246
"initialDelaySeconds": 15,
4347
"periodSeconds": 5
@@ -183,16 +187,10 @@
183187
{
184188
"name": "event",
185189
"image": "$PGO_IMAGE_PREFIX/pgo-event:$PGO_IMAGE_TAG",
186-
"readinessProbe": {
187-
"tcpSocket": {
188-
"port": 4150
189-
},
190-
"initialDelaySeconds": 15,
191-
"periodSeconds": 5
192-
},
193190
"livenessProbe": {
194-
"tcpSocket": {
195-
"port": 4150
191+
"httpGet": {
192+
"path": "/ping",
193+
"port": 4151
196194
},
197195
"initialDelaySeconds": 15,
198196
"periodSeconds": 5

hugo/content/Configuration/configuration.md

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -96,9 +96,8 @@ this setting:
9696
/health
9797
```
9898

99-
If the health route has its authentication disabled, the existing readiness
100-
and liveness probes for the apiserver container could be enhanced by
101-
configuring them with HTTP-based checks against the health route.
99+
The `/healthz` route is used by kubernetes probes and has its authentication
100+
disabed without requiring NOAUTH_ROUTES.
102101

103102

104103
## Security

0 commit comments

Comments
 (0)