|
| 1 | +# KEP-3130: KMS Observability |
| 2 | + |
| 3 | +<!-- toc --> |
| 4 | +- [Release Signoff Checklist](#release-signoff-checklist) |
| 5 | +- [Summary](#summary) |
| 6 | +- [Motivation](#motivation) |
| 7 | + - [Goals](#goals) |
| 8 | + - [Non-Goals](#non-goals) |
| 9 | +- [Proposal](#proposal) |
| 10 | +- [Design Details](#design-details) |
| 11 | + - [Test Plan](#test-plan) |
| 12 | + - [Graduation Criteria](#graduation-criteria) |
| 13 | + - [Alpha](#alpha) |
| 14 | + - [Beta](#beta) |
| 15 | + - [GA](#ga) |
| 16 | +- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire) |
| 17 | + - [Feature Enablement and Rollback](#feature-enablement-and-rollback) |
| 18 | + - [Monitoring Requirements](#monitoring-requirements) |
| 19 | + - [Dependencies](#dependencies) |
| 20 | + - [Scalability](#scalability) |
| 21 | + - [Troubleshooting](#troubleshooting) |
| 22 | +- [Implementation History](#implementation-history) |
| 23 | +- [Alternatives](#alternatives) |
| 24 | +<!-- /toc --> |
| 25 | + |
| 26 | +## Release Signoff Checklist |
| 27 | + |
| 28 | +Items marked with (R) are required *prior to targeting to a milestone / release*. |
| 29 | + |
| 30 | +- [ ] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR) |
| 31 | +- [ ] (R) KEP approvers have approved the KEP status as `implementable` |
| 32 | +- [ ] (R) Design details are appropriately documented |
| 33 | +- [ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors) |
| 34 | + - [ ] e2e Tests for all Beta API Operations (endpoints) |
| 35 | + - [ ] (R) Ensure GA e2e tests for meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md) |
| 36 | + - [ ] (R) Minimum Two Week Window for GA e2e tests to prove flake free |
| 37 | +- [ ] (R) Graduation criteria is in place |
| 38 | + - [ ] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md) |
| 39 | +- [ ] (R) Production readiness review completed |
| 40 | +- [ ] (R) Production readiness review approved |
| 41 | +- [ ] "Implementation History" section is up-to-date for milestone |
| 42 | +- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io] |
| 43 | +- [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes |
| 44 | + |
| 45 | +[kubernetes.io]: https://kubernetes.io/ |
| 46 | +[kubernetes/enhancements]: https://git.k8s.io/enhancements |
| 47 | +[kubernetes/kubernetes]: https://git.k8s.io/kubernetes |
| 48 | +[kubernetes/website]: https://git.k8s.io/website |
| 49 | + |
| 50 | +## Summary |
| 51 | + |
| 52 | +Currently, it is not possible to correlate (in logs) the sequence of calls that are involved in the enveloping operation: kube-apiserver->kms-plugin->KMS. This KEP proposes extending the signature of the kms-plugin interface to include the transaction ID (to be generated by the kube-apiserver), which kms-plugin could pass to KMS. |
| 53 | + |
| 54 | +## Motivation |
| 55 | + |
| 56 | +The only way to correlate a successful/failed envelope operation today is to use the approximate timestamp of the operation to check events in kube-apiserver, kms-plugin and KMS. There is no guarantee that the timestamp of the operation is the same as the timestamp of the corresponding event in KMS. This KEP proposes extending the signature of the kms-plugin interface to include the transaction ID (to be generated by the kube-apiserver), which kms-plugin could pass to KMS. This transaction ID will be logged with additional metadata such a secret name and namespace for the envelope operation. Similarly, the transaction ID will be logged in the kms-plugin and optionally passed to KMS. |
| 57 | + |
| 58 | +### Goals |
| 59 | + |
| 60 | +- Add transaction ID to kms-plugin interface |
| 61 | +- Update the logging in kube-apiserver to include transaction ID and non-sensitive metadata such as secret name, namespace for envelope operations |
| 62 | + |
| 63 | +### Non-Goals |
| 64 | + |
| 65 | +- Using this transaction ID for audit logging |
| 66 | + |
| 67 | +## Proposal |
| 68 | + |
| 69 | +- Generate a new UID for each envelope operation in kube-apiserver. |
| 70 | +- Add a new UID field to the envelope operation in kms-plugin interface. |
| 71 | + |
| 72 | +## Design Details |
| 73 | + |
| 74 | +<!-- |
| 75 | +This section should contain enough information that the specifics of your |
| 76 | +change are understandable. This may include API specs (though not always |
| 77 | +required) or even code snippets. If there's any ambiguity about HOW your |
| 78 | +proposal will be implemented, this is the place to discuss them. |
| 79 | +--> |
| 80 | + |
| 81 | +This design is centered around generating a new UID for each envelope operation similar to UID generation in admission review requests here: https://github.com/kubernetes/kubernetes/blob/e9e669aa6037c380469b45200e59cff9b52d6d68/staging/src/k8s.io/apiserver/pkg/admission/plugin/webhook/request/admissionreview.go#L137. |
| 82 | + |
| 83 | +A new UID field will be added to the `EncryptRequest` and `DecryptRequest` structs in the kms-plugin interface. The field is a pointer to a string. If the feature gate is disabled, the UID field will be nil and this results in byte equivalent data on the wire when compared to a 1.23 API server. |
| 84 | + |
| 85 | +```go |
| 86 | +type EncryptRequest struct { |
| 87 | + // UID is a unique identifier for the request. |
| 88 | + UID *string `protobuf:"bytes,3,opt,name=uid,proto3" json:"uid,omitempty"` |
| 89 | + // Version of the KMS plugin API. |
| 90 | + Version string `protobuf:"bytes,1,opt,name=version,proto3" json:"version,omitempty"` |
| 91 | + // The data to be encrypted. |
| 92 | + Plain []byte `protobuf:"bytes,2,opt,name=plain,proto3" json:"plain,omitempty"` |
| 93 | + XXX_NoUnkeyedLiteral struct{} `json:"-"` |
| 94 | + XXX_unrecognized []byte `json:"-"` |
| 95 | + XXX_sizecache int32 `json:"-"` |
| 96 | +} |
| 97 | +``` |
| 98 | + |
| 99 | +```go |
| 100 | +type DecryptRequest struct { |
| 101 | + // UID is a unique identifier for the request. |
| 102 | + UID *string `protobuf:"bytes,3,opt,name=uid,proto3" json:"uid,omitempty"` |
| 103 | + // Version of the KMS plugin API. |
| 104 | + Version string `protobuf:"bytes,1,opt,name=version,proto3" json:"version,omitempty"` |
| 105 | + // The data to be decrypted. |
| 106 | + Cipher []byte `protobuf:"bytes,2,opt,name=cipher,proto3" json:"cipher,omitempty"` |
| 107 | + XXX_NoUnkeyedLiteral struct{} `json:"-"` |
| 108 | + XXX_unrecognized []byte `json:"-"` |
| 109 | + XXX_sizecache int32 `json:"-"` |
| 110 | +} |
| 111 | +``` |
| 112 | + |
| 113 | +The UID generated in the kube-apiserver will be used: |
| 114 | + |
| 115 | +1. For logging in the kube-apiserver. All envelope operations to the kms-plugin will be logged with the corresponding UID. |
| 116 | + 1. The UID will be logged using a wrapper in the kube-apiserver to ensure that the UID is logged in the same format and is always logged. |
| 117 | + 2. In addition to the UID, the kube-apiserver will also log non-sensitive metadata such as name, namespace and GroupVersionResource of the object that triggered the envelope operation. |
| 118 | +2. Sent to the kms-plugin as part of the `EncryptRequest` and `DecryptRequest` structs. |
| 119 | + |
| 120 | +### Test Plan |
| 121 | + |
| 122 | +Unit tests covering: |
| 123 | + |
| 124 | +1. Generation of UID for each envelope operation |
| 125 | + |
| 126 | +Integration test covering: |
| 127 | + |
| 128 | +1. Logging of UID in kube-apiserver |
| 129 | +2. UID in the `EncryptRequest` and `DecryptRequest` |
| 130 | +3. UID set to nil in the `EncryptRequest` and `DecryptRequest` when the feature gate is disabled |
| 131 | + 1. Confirm this results in byte equivalent data on the wire when compared to a 1.23 API server. |
| 132 | + |
| 133 | +### Graduation Criteria |
| 134 | + |
| 135 | +#### Alpha |
| 136 | + |
| 137 | +- Feature implemented behind a feature flag |
| 138 | +- Initial unit and integration tests completed and enabled |
| 139 | + |
| 140 | +#### Beta |
| 141 | + |
| 142 | +- Gather feedback from providers using the feature |
| 143 | +- Any known bugs fixed |
| 144 | + |
| 145 | +#### GA |
| 146 | + |
| 147 | +- This is part of the KMS reference implementation |
| 148 | + |
| 149 | +## Production Readiness Review Questionnaire |
| 150 | + |
| 151 | +### Feature Enablement and Rollback |
| 152 | + |
| 153 | +###### How can this feature be enabled / disabled in a live cluster? |
| 154 | + |
| 155 | +<!-- |
| 156 | +Pick one of these and delete the rest. |
| 157 | +--> |
| 158 | + |
| 159 | +- Feature gate |
| 160 | + - Feature gate name: `KMSUID` |
| 161 | + - Components depending on the feature gate: |
| 162 | + - kube-apiserver |
| 163 | + |
| 164 | +```go |
| 165 | +FeatureSpec{ |
| 166 | + Default: false, |
| 167 | + LockToDefault: false, |
| 168 | + PreRelease: featuregate.Alpha, |
| 169 | +} |
| 170 | +``` |
| 171 | + |
| 172 | +###### Does enabling the feature change any default behavior? |
| 173 | + |
| 174 | +UID sent as part of the envelope operation is a change in the default behavior. This is backwards compatible. |
| 175 | + |
| 176 | +###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)? |
| 177 | + |
| 178 | +Yes, via the `KMSUID` feature gate. Disabling this gate will cause the API server to not send the UID as part of `Encrypt` or `Decrypt` envelope operation. |
| 179 | + |
| 180 | +### Monitoring Requirements |
| 181 | + |
| 182 | +###### How can someone using this feature know that it is working for their instance? |
| 183 | + |
| 184 | +- [x] Other (treat as last resort) |
| 185 | + - Details: Logs in kube-apiserver, kms-plugin and KMS will be logged with the corresponding UID. |
| 186 | + |
| 187 | +###### What are the reasonable SLOs (Service Level Objectives) for the enhancement? |
| 188 | + |
| 189 | +There should be no impact on the SLO with this change. |
| 190 | + |
| 191 | +###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service? |
| 192 | + |
| 193 | +- [x] Other (treat as last resort) |
| 194 | + - Details: Logs in kube-apiserver, kms-plugin and KMS will be logged with the corresponding UID. |
| 195 | + |
| 196 | +### Dependencies |
| 197 | + |
| 198 | +###### Does this feature depend on any specific services running in the cluster? |
| 199 | + |
| 200 | +No. |
| 201 | + |
| 202 | +### Scalability |
| 203 | + |
| 204 | +###### Will enabling / using this feature result in any new API calls? |
| 205 | + |
| 206 | +No. |
| 207 | + |
| 208 | +###### Will enabling / using this feature result in introducing new API types? |
| 209 | + |
| 210 | +No. |
| 211 | + |
| 212 | +###### Will enabling / using this feature result in any new calls to the cloud provider? |
| 213 | + |
| 214 | +No. |
| 215 | + |
| 216 | +###### Will enabling / using this feature result in increasing size or count of the existing API objects? |
| 217 | + |
| 218 | +This proposal adds a new field `UID` to the gRPC API for envelope operations. |
| 219 | + |
| 220 | +###### Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs? |
| 221 | + |
| 222 | +No. |
| 223 | + |
| 224 | +###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components? |
| 225 | + |
| 226 | +No. |
| 227 | + |
| 228 | +### Troubleshooting |
| 229 | + |
| 230 | +###### How does this feature react if the API server and/or etcd is unavailable? |
| 231 | + |
| 232 | +- ETCD data encryption with external kms-plugin is unavailable |
| 233 | + |
| 234 | +## Implementation History |
| 235 | + |
| 236 | +<!-- |
| 237 | +Major milestones in the lifecycle of a KEP should be tracked in this section. |
| 238 | +Major milestones might include: |
| 239 | +- the `Summary` and `Motivation` sections being merged, signaling SIG acceptance |
| 240 | +- the `Proposal` section being merged, signaling agreement on a proposed design |
| 241 | +- the date implementation started |
| 242 | +- the first Kubernetes release where an initial version of the KEP was available |
| 243 | +- the version of Kubernetes where the KEP graduated to general availability |
| 244 | +- when the KEP was retired or superseded |
| 245 | +--> |
| 246 | + |
| 247 | +## Alternatives |
| 248 | + |
| 249 | +<!-- |
| 250 | +What other approaches did you consider, and why did you rule them out? These do |
| 251 | +not need to be as detailed as the proposal, but should include enough |
| 252 | +information to express the idea and why it was not acceptable. |
| 253 | +--> |
| 254 | + |
| 255 | +We considered using the AuditID from the kube-apiserver request that generated the envelope operation. This approach has the following drawbacks: |
| 256 | + |
| 257 | +1. AuditID can be configured by the user with the `Audit-ID` header in the API server request. Multiple requests can be sent to the kube-apiserver with the same Audit-ID. |
| 258 | +2. Not all API server requests will generate an envelope operation. The API server caches DEKs and for the DEK that's available in the cache, the kube-apiserver will not generate an envelope operation. |
| 259 | +3. Since not all calls to the KMS correspond to an audit log, using audit ID is not complete for correlating calls from kube-apiserver->kms-plugin->KMS. |
0 commit comments