Skip to content

Commit 4b5e2f9

Browse files
authored
Merge pull request kubernetes#2275 from ankeesler/exec-plugin-metrics
exec credential provider: prep for 1.21 (first pass at metrics design, PRR updates)
2 parents 4db4ee9 + a764f28 commit 4b5e2f9

File tree

3 files changed

+121
-9
lines changed

3 files changed

+121
-9
lines changed

keps/prod-readiness/sig-auth/541.yaml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
kep-number: 541
2+
beta:
3+
approver: "@deads2k"

keps/sig-auth/541-external-credential-providers/README.md

Lines changed: 116 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@
1313
- [Provider configuration](#provider-configuration)
1414
- [Provider input format](#provider-input-format)
1515
- [Provider output format](#provider-output-format)
16+
- [Metrics](#metrics)
1617
- [Risks and Mitigations](#risks-and-mitigations)
1718
- [Client authentication to the binary](#client-authentication-to-the-binary)
1819
- [Invalid credentials before cache expiry](#invalid-credentials-before-cache-expiry)
@@ -493,6 +494,102 @@ credentials).
493494
PEM format. The certificate must be valid at the time of execution. These
494495
credentials are used for mTLS handshakes.
495496

497+
### Metrics
498+
499+
As discussed [below](#rollout-upgrade-and-rollback-planning), there are 3
500+
primary metrics used by this feature set.
501+
502+
```golang
503+
var (
504+
execPluginCertTTL = k8smetrics.NewGaugeFunc(
505+
k8smetrics.GaugeOpts{
506+
Name: "rest_client_exec_plugin_ttl_seconds",
507+
Help: "Gauge of the shortest TTL (time-to-live) of the client " +
508+
"certificate(s) managed by the auth exec plugin. The value " +
509+
"is in seconds until certificate expiry (negative if " +
510+
"already expired). If auth exec plugins are unused or manage no " +
511+
"TLS certificates, the value will be +INF.",
512+
},
513+
func() float64 {
514+
if execPluginCertTTLAdapter.e == nil {
515+
return math.Inf(1)
516+
}
517+
return execPluginCertTTLAdapter.e.Sub(time.Now()).Seconds()
518+
},
519+
)
520+
521+
execPluginCertRotation = k8smetrics.NewHistogram(
522+
&k8smetrics.HistogramOpts{
523+
Name: "rest_client_exec_plugin_certificate_rotation_age",
524+
Help: "Histogram of the number of seconds the last auth exec " +
525+
"plugin client certificate lived before being rotated. " +
526+
"If auth exec plugin client certificates are unused, " +
527+
"histogram will contain no data.",
528+
// There are three sets of ranges these buckets intend to capture:
529+
// - 10-60 minutes: captures a rotation cadence which is
530+
// happening too quickly.
531+
// - 4 hours - 1 month: captures an ideal rotation cadence.
532+
// - 3 months - 4 years: captures a rotation cadence which is
533+
// is probably too slow or much too slow.
534+
Buckets: []float64{
535+
600, // 10 minutes
536+
1800, // 30 minutes
537+
3600, // 1 hour
538+
14400, // 4 hours
539+
86400, // 1 day
540+
604800, // 1 week
541+
2592000, // 1 month
542+
7776000, // 3 months
543+
15552000, // 6 months
544+
31104000, // 1 year
545+
124416000, // 4 years
546+
},
547+
},
548+
)
549+
550+
execPluginCalls = k8smetrics.NewCounterVec(
551+
&k8smetrics.CounterOpts{
552+
Name: "rest_client_exec_plugin_call_total",
553+
Help: "Number of calls to an exec plugin, partitioned by exit code.",
554+
},
555+
[]string{"code"},
556+
)
557+
)
558+
```
559+
560+
As is common practice, these labels will be hidden behind abstract global
561+
variables that will be called by the exec plugin code.
562+
```golang
563+
// DurationMetric is a measurement of some amount of time.
564+
type DurationMetric interface {
565+
Observe(duration time.Duration)
566+
}
567+
568+
// ExpiryMetric sets some time of expiry. If nil, assume not relevant.
569+
type ExpiryMetric interface {
570+
Set(expiry *time.Time)
571+
}
572+
573+
// CallsMetric counts calls that take place for a specific exec plugin.
574+
type CallsMetric interface {
575+
// Increment increments a counter per exitCode.
576+
Increment(exitCode int)
577+
}
578+
579+
var (
580+
// ClientCertExpiry is the expiry time of a client certificate
581+
ClientCertExpiry ExpiryMetric = noopExpiry{}
582+
// ClientCertRotationAge is the age of a certificate that has just been rotated.
583+
ClientCertRotationAge DurationMetric = noopDuration{}
584+
// ExecPluginCalls is the number of calls made to an exec plugin, partitioned by
585+
// exit code.
586+
ExecPluginCalls CallsMetric = noopCalls{}
587+
)
588+
```
589+
590+
The `"code"` label of these metrics is an attempt to elucidate the exec plugin
591+
failure mode to the user.
592+
496593
### Risks and Mitigations
497594

498595
#### Client authentication to the binary
@@ -532,6 +629,7 @@ Unit tests to confirm:
532629
that structs are kept up to date
533630
- Helper methods properly create `"k8s.io/client-go/rest".Config` from
534631
`"k8s.io/client-go/pkg/apis/clientauthentication".Cluster` and vice versa
632+
- Metrics are reported as they should
535633

536634
Integration (or e2e CLI) tests to confirm:
537635

@@ -546,6 +644,7 @@ Integration (or e2e CLI) tests to confirm:
546644
+ Cert based auth
547645
- Interactive login flows work
548646
+ TTY forwarding between client and executable works
647+
- Metrics are reported as they should
549648

550649
### Graduation Criteria
551650

@@ -565,6 +664,7 @@ Feature is already in Beta.
565664
- Address known bugs and add tests to prevent regressions
566665
- Docs are up-to-date with latest version of APIs
567666
- Docs describe set of best practices (i.e. do not mutate `kubeconfig`)
667+
- Sufficient metrics
568668

569669
Note: this feature set does not need conformance tests because it is inherently
570670
opt-in on the client-side and it relies on an extra binary to be present.
@@ -674,7 +774,10 @@ The downsides of this approach compared to exec model are:
674774
authenticator could be behaving incorrectly. For example, if the certificate
675775
expiration time is constantly increasing upon every authentication to the API, then
676776
perhaps the exec plugin authenticator is refreshing the certificate credential too
677-
often.
777+
often. Furthermore, the certificate's age (i.e., the time since the certificate's
778+
`NotBefore` field) will be emitted as a metric. If this value is frequently much smaller
779+
than the certificate's expected lifetime, then the exec plugin authenticator may be
780+
rotating credentials too quickly which may point to a bug.
678781
- The total number of calls to the exec plugin would also be helpful to obtain. This
679782
metric should increase each time a credential is refreshed (see previous bullet point
680783
for when this happens). If this number increases rapidly, then the exec plugin
@@ -710,16 +813,22 @@ _This section must be completed when targeting beta graduation to a release._
710813

711814
* **What are the SLIs (Service Level Indicators) an operator can use to
712815
determine the health of the service?**
713-
- [ ] Metrics
714-
- Metric name:
715-
- [Optional] Aggregation method:
716-
- Components exposing the metric:
717-
- [x] Other (treat as last resort)
816+
- [X] Metrics
817+
- Metric name: `rest_client_exec_plugin_ttl_seconds`, `rest_client_exec_plugin_certificate_rotation_age`,
818+
`rest_client_exec_plugin_call_total`
819+
- Components exposing the metric: client-go
820+
- [ ] Other (treat as last resort)
718821
- Details:
719822
- This feature set operates on the client-side.
720823

721824
* **What are the reasonable SLOs (Service Level Objectives) for the above SLIs?**
722-
- This feature set operates on the client-side.
825+
- We target certificate rotations to happen within 1% of a certificate's
826+
lifetime. This is measured by
827+
`rest_client_exec_plugin_certificate_rotation_age` and
828+
`rest_client_exec_plugin_ttl_seconds`.
829+
- We target 0.01% unsuccessful calls to the exec plugin in a moving 24h
830+
window. This is measured by
831+
`rest_client_exec_plugin_call_total`.
723832

724833
* **Are there any missing metrics that would be useful to have to improve
725834
observability if this feature?**

keps/sig-auth/541-external-credential-providers/kep.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -18,8 +18,8 @@ approvers:
1818
prr-approvers:
1919
- "@deads2k"
2020
stage: beta
21-
latest-milestone: "v1.20"
21+
latest-milestone: "v1.21"
2222
milestone:
2323
alpha: "v1.10"
2424
beta: "v1.11"
25-
stable: "v1.21"
25+
stable: "v1.22"

0 commit comments

Comments
 (0)