Skip to content

Commit e72ea72

Browse files
committed
KEP-2133: add kubelet metrics for credential provider plugins
Signed-off-by: Andrew Sy Kim <[email protected]>
1 parent 2e8fb5a commit e72ea72

File tree

1 file changed

+13
-7
lines changed
  • keps/sig-node/2133-kubelet-credential-providers

1 file changed

+13
-7
lines changed

keps/sig-node/2133-kubelet-credential-providers/README.md

Lines changed: 13 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@
1313
- [Credential Provider Request API](#credential-provider-request-api)
1414
- [Credential Provider Response API](#credential-provider-response-api)
1515
- [Caching Credentials](#caching-credentials)
16+
- [Metrics](#metrics)
1617
- [Test Plan](#test-plan)
1718
- [Graduation Criteria](#graduation-criteria)
1819
- [Alpha](#alpha)
@@ -312,6 +313,12 @@ The plugin can signal to the kubelet how it should cache a given response. There
312313
2. Registry: the kubelet should cache and use this response only for future images with the same registry hostname (and port if included).
313314
3. Image: the kubelet should cache and use this response only for future images that match the image exactly.
314315

316+
### Metrics
317+
318+
Two kubelet metrics will be added:
319+
* `kubelet_credential_provider_plugin_errors`: this will track the number errors that occurred from invoking an exec plugin
320+
* `kubelet_credential_provider_plugin_duration`: this will track the duration of execution by plugins.
321+
315322
### Test Plan
316323

317324
Alpha:
@@ -330,6 +337,7 @@ can be achieved using the exec plugin.
330337

331338
* integration or e2e tests.
332339
* at least one working plugin implementation.
340+
* kubelet metrics for failed calls to exec plugins.
333341
* improvements to concurrency and caching:
334342
- use `singleflight.Group` to ensure only a single call per image. Today the kubelet holds a single lock for every call to `Provide`.
335343
See [this](https://github.com/kubernetes/kubernetes/pull/94196#discussion_r517805701) and [this](https://github.com/kubernetes/kubernetes/pull/94196#discussion_r518487386) discussion.
@@ -381,7 +389,7 @@ _This section must be completed when targeting beta graduation to a release._
381389

382390
* **What specific metrics should inform a rollback?**
383391

384-
This feature does not have metrics.
392+
High error rates from `kubelet_credential_provider_plugin_error` and long durations from `kubelet_credential_provider_plugin_duration`.
385393

386394
* **Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?**
387395

@@ -403,10 +411,9 @@ _This section must be completed when targeting beta graduation to a release._
403411

404412
* **What are the SLIs (Service Level Indicators) an operator can use to determine
405413
the health of the service?**
406-
- [ ] Metrics
407-
- Metric name:
408-
- [Optional] Aggregation method:
409-
- Components exposing the metric:
414+
- [X] Metrics
415+
- Metric name: `kubelet_credential_provider_plugin_error`, `kubelet_credential_provider_plugin_duration`
416+
- Components exposing the metric: kubelet
410417
- [X] Other (treat as last resort)
411418
- Details: the kubelet has several error-level logs for when exec plugins time out or return a non-zero exit code.
412419

@@ -420,8 +427,7 @@ the health of the service?**
420427
* **Are there any missing metrics that would be useful to have to improve observability
421428
of this feature?**
422429

423-
Possibly. We could add a metric for failed calls to exec plugins.
424-
430+
No.
425431

426432
### Dependencies
427433

0 commit comments

Comments
 (0)