Skip to content

Commit 80579de

Browse files
authored
Merge pull request kubernetes#2260 from zshihang/master
prepare CSIServiceAccountToken for beta
2 parents ca22dc3 + fe2e748 commit 80579de

File tree

3 files changed

+78
-3
lines changed

3 files changed

+78
-3
lines changed
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
kep-number: 1855
2+
beta:
3+
approver: "@deads2k"

keps/sig-storage/1855-csi-driver-service-account-token/README.md

Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33
## Table of Contents
44

55
<!-- toc -->
6+
67
- [Summary](#summary)
78
- [Motivation](#motivation)
89
- [User stories](#user-stories)
@@ -19,7 +20,11 @@
1920
- [Beta-&gt;GA](#beta-ga)
2021
- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire)
2122
- [Feature Enablement and Rollback](#feature-enablement-and-rollback)
23+
- [Rollout, Upgrade and Rollback Planning](#rollout-upgrade-and-rollback-planning)
24+
- [Monitoring Requirements](#monitoring-requirements)
25+
- [Dependencies](#dependencies)
2226
- [Scalability](#scalability)
27+
- [Troubleshooting](#troubleshooting)
2328
- [Alternatives](#alternatives)
2429
- [Implementation History](#implementation-history)
2530
<!-- /toc -->
@@ -215,6 +220,53 @@ Option 1 is adopted. See discussion
215220
- **Are there any tests for feature enablement/disablement?** yes, unit tests
216221
will cover this.
217222

223+
### Rollout, Upgrade and Rollback Planning
224+
225+
- **How can a rollout fail? Can it impact already running workloads?**
226+
Rollout will not fail because this change only exposes an extra field in CSIDriverSpec.
227+
228+
* **What specific metrics should inform a rollback?**
229+
230+
- `storage_operation_duration_seconds`: if the corresponding csi plugin has
231+
high error rates by aggregating on `status`.
232+
233+
* **Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?**
234+
No. When downgrade happens where kube-apiserver doesn't have the added fields,
235+
the existing volumes will continue to work as long as it doesn't rely on the
236+
acquired token being valid.
237+
238+
* **Is the rollout accompanied by any deprecations and/or removals of features, APIs,
239+
fields of API types, flags, etc.?**
240+
No.
241+
242+
### Monitoring Requirements
243+
244+
- **How can an operator determine if the feature is in use by workloads?**
245+
run `kubectl get CSIDriver` to see whether `tokenRequests` or `requiresRepublish`
246+
is specified.
247+
248+
- **What are the SLIs (Service Level Indicators) an operator can use to determine
249+
the health of the service?**
250+
251+
- [x] Metrics
252+
- Metric name: `storage_operation_duration_seconds`
253+
- Aggregation method: volume_plugin, operation_name, status
254+
- Components exposing the metric: kubelet
255+
256+
* **What are the reasonable SLOs (Service Level Objectives) for the above SLIs?**
257+
for the particular csi plugin, per-day percentage of failed storage operations
258+
<= 1%
259+
260+
- **Are there any missing metrics that would be useful to have to improve observability
261+
of this feature?**
262+
None
263+
264+
### Dependencies
265+
266+
- **Does this feature depend on any specific services running in the cluster?**
267+
268+
There are no new components required, but requires kubelets >= 1.12
269+
218270
### Scalability
219271

220272
- **Will enabling / using this feature result in any new API calls?**
@@ -245,6 +297,25 @@ Option 1 is adopted. See discussion
245297
- **Will enabling / using this feature result in non-negligible increase of
246298
resource usage (CPU, RAM, disk, IO, ...) in any components?** no.
247299

300+
### Troubleshooting
301+
302+
- **How does this feature react if the API server and/or etcd is unavailable?**
303+
`RequiresRepublish` will continue to function but `TokenRequests` will fail.
304+
305+
- **What are other known failure modes?**
306+
307+
- Failed to fetch token
308+
309+
- Detection: Check mount failure in Pod events or kubelet log.
310+
- Mitigations: Set `TokenRequests=[]`, subsequent `NodePublishVolume` will
311+
not have tokens in volume attributes. Tokens retrieved before will
312+
eventually expire.
313+
- Diagnostics: Search "mounter.SetUpAt failed to get service accoount token attributes"
314+
- Testing: E2E test
315+
316+
- **What steps should be taken if SLOs are not being met to determine the problem?**
317+
None.
318+
248319
## Alternatives
249320

250321
1. Instead of fetching tokens in kubelet, CSI drivers will be granted

keps/sig-storage/1855-csi-driver-service-account-token/kep.yaml

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -15,12 +15,13 @@ approvers:
1515
- "@msau42"
1616
- "@mikedanese"
1717
creation-date: 2020-06-09
18-
last-updated: 2020-11-12
18+
last-updated: 2021-02-04
1919
status: implementable
20-
stage: alpha
21-
latest-milestone: "v1.20"
20+
stage: beta
21+
latest-milestone: "v1.21"
2222
milestone:
2323
alpha: "v1.20"
24+
beta: "v1.21"
2425
feature-gates:
2526
- name: CSIServiceAccountToken
2627
components:

0 commit comments

Comments
 (0)