Skip to content

Commit ba220af

Browse files
committed
Move log sanitization to implementable and test plan and PRR review
1 parent eb0150d commit ba220af

File tree

2 files changed

+177
-11
lines changed

2 files changed

+177
-11
lines changed

keps/sig-instrumentation/1753-logs-sanitization/README.md

Lines changed: 171 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -27,18 +27,25 @@
2727
- [Theoretical](#theoretical)
2828
- [Practical](#practical)
2929
- [Strengths and Weaknesses of Static and Dynamic Analyses ([3])](#strengths-and-weaknesses-of-static-and-dynamic-analyses-3)
30+
- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire)
31+
- [Feature Enablement and Rollback](#feature-enablement-and-rollback)
32+
- [Rollout, Upgrade and Rollback Planning](#rollout-upgrade-and-rollback-planning)
33+
- [Monitoring Requirements](#monitoring-requirements)
34+
- [Dependencies](#dependencies)
35+
- [Scalability](#scalability)
36+
- [Troubleshooting](#troubleshooting)
3037
<!-- /toc -->
3138

3239
## Release Signoff Checklist
3340

3441
Items marked with (R) are required *prior to targeting to a milestone / release*.
3542

36-
- [ ] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
37-
- [ ] (R) KEP approvers have approved the KEP status as `implementable`
38-
- [ ] (R) Design details are appropriately documented
39-
- [ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input
40-
- [ ] (R) Graduation criteria is in place
41-
- [ ] (R) Production readiness review completed
43+
- [x] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
44+
- [x] (R) KEP approvers have approved the KEP status as `implementable`
45+
- [x] (R) Design details are appropriately documented
46+
- [x] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input
47+
- [x] (R) Graduation criteria is in place
48+
- [x] (R) Production readiness review completed
4249
- [ ] Production readiness review approved
4350
- [ ] "Implementation History" section is up-to-date for milestone
4451
- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
@@ -114,8 +121,7 @@ This risk can be mitigated by:
114121
}
115122
```
116123

117-
- implementations of this interface could be auto generated using a dedicated code generator similar to deep-copy generator or manually implemented when needed,
118-
caching negative inspection results for parameter types which does not have any references to types which may contain sensitive data.
124+
- implementations of this interface could be auto generated using a dedicated code generator similar to deep-copy generator or manually implemented when needed, caching negative inspection results for parameter types which does not have any references to types which may contain sensitive data.
119125

120126
Which of those methods will be used and to what extent will be decided after running performance tests.
121127

@@ -198,6 +204,16 @@ To allow configuring if logs should be sanitized we will introduce a new logging
198204

199205
### Test Plan
200206

207+
Tests should cover two things:
208+
* Test feature log sanitization works e2e
209+
* Test Kubernetes components don't leak sensitive data
210+
211+
To address them we propose:
212+
* Add e2e tests that enabling log sanitization on test component (simple binary using `k8s.io/components-base`)
213+
prevents leaking sensitive data
214+
* Add periodic running standard k8s e2e tests with log sanitization enabled and analyse logs to ensure no log message
215+
was redacted.
216+
201217
### Graduation Criteria
202218

203219
#### Alpha (1.20)
@@ -213,6 +229,9 @@ To allow configuring if logs should be sanitized we will introduce a new logging
213229

214230
## Implementation History
215231

232+
* 2020-05-08 - Original Proposal
233+
* 2020-08-07 - Merged as provisional
234+
216235
## Drawbacks
217236

218237
## Alternatives
@@ -246,3 +265,147 @@ Static analysis, with its whitebox visibility, is certainly the more thorough ap
246265
Therefore static and dynamic analysis should not be considered as disjoint alternatives but rather as a complementary solutions and in the end we should have both implemented in Kubernetes.
247266

248267
[3] "Static Testing vs. Dynamic Testing - Veracode." 3 Dec. 2013, https://www.veracode.com/blog/2013/12/static-testing-vs-dynamic-testing. Accessed 15 May. 2020.
268+
269+
## Production Readiness Review Questionnaire
270+
271+
### Feature Enablement and Rollback
272+
273+
_This section must be completed when targeting alpha to a release._
274+
275+
* **How can this feature be enabled / disabled in a live cluster?**
276+
- [ ] Feature gate (also fill in values in `kep.yaml`)
277+
- Feature gate name:
278+
- Components depending on the feature gate:
279+
- [x] Other
280+
- Describe the mechanism:
281+
282+
New flag `--logging-sanitization` will be used to enable sanitization of for components.
283+
This flag will become standard flag for all k8s components and will be added to
284+
`k8s.io/component-base`. For alpha we are planning to add it to: `kube-apiserver`,
285+
`kube-scheduler`, `kubelet`, `kube-controller-manager`
286+
287+
- Will enabling / disabling the feature require downtime of the control
288+
plane?
289+
290+
Enabling log sanitization will require restarting control plane.
291+
292+
- Will enabling / disabling the feature require downtime or reprovisioning
293+
of a node? (Do not assume `Dynamic Kubelet Config` feature is enabled).
294+
295+
Enabling / disabling sanitization will require restarting system components.
296+
297+
* **Does enabling the feature change any default behavior?**
298+
Any change of default behavior may be surprising to users or break existing
299+
automations, so be extremely careful here.
300+
301+
Enabling sanitization should not change logs in normally functioning cluster. Only changes
302+
could be caused if bug in kubernetes code caused possibility of leaking sensitive data. In
303+
that case that log lines will be obfuscated. This should not be surprising to users as they
304+
should not depend on those buggy log lines.
305+
306+
* **Can the feature be disabled once it has been enabled (i.e. can we roll back
307+
the enablement)?**
308+
Also set `disable-supported` to `true` or `false` in `kep.yaml`.
309+
Describe the consequences on existing workloads (e.g., if this is a runtime
310+
feature, can it break the existing applications?).
311+
312+
Yes.
313+
314+
* **What happens if we reenable the feature if it was previously rolled back?**
315+
316+
Feature will start working again.
317+
318+
* **Are there any tests for feature enablement/disablement?**
319+
The e2e framework does not currently support enabling or disabling feature
320+
gates. However, unit tests in each component dealing with managing data, created
321+
with and without the feature, are necessary. At the very least, think about
322+
conversion tests if API types are being modified.
323+
324+
We will run manual tests before enabling the feature by default
325+
326+
### Rollout, Upgrade and Rollback Planning
327+
328+
_This section must be completed when targeting beta graduation to a release._
329+
330+
### Monitoring Requirements
331+
332+
_This section must be completed when targeting beta graduation to a release._
333+
334+
### Dependencies
335+
336+
_This section must be completed when targeting beta graduation to a release._
337+
338+
### Scalability
339+
340+
_For alpha, this section is encouraged: reviewers should consider these questions
341+
and attempt to answer them._
342+
343+
_For beta, this section is required: reviewers must answer these questions._
344+
345+
_For GA, this section is required: approvers should be able to confirm the
346+
previous answers based on experience in the field._
347+
348+
* **Will enabling / using this feature result in any new API calls?**
349+
Describe them, providing:
350+
- API call type (e.g. PATCH pods)
351+
- estimated throughput
352+
- originating component(s) (e.g. Kubelet, Feature-X-controller)
353+
focusing mostly on:
354+
- components listing and/or watching resources they didn't before
355+
- API calls that may be triggered by changes of some Kubernetes resources
356+
(e.g. update of object X triggers new updates of object Y)
357+
- periodic API calls to reconcile state (e.g. periodic fetching state,
358+
heartbeats, leader election, etc.)
359+
360+
No
361+
362+
* **Will enabling / using this feature result in introducing new API types?**
363+
Describe them, providing:
364+
- API type
365+
- Supported number of objects per cluster
366+
- Supported number of objects per namespace (for namespace-scoped objects)
367+
368+
No
369+
370+
* **Will enabling / using this feature result in any new calls to the cloud
371+
provider?**
372+
373+
No
374+
375+
* **Will enabling / using this feature result in increasing size or count of
376+
the existing API objects?**
377+
Describe them, providing:
378+
- API type(s):
379+
- Estimated increase in size: (e.g., new annotation of size 32B)
380+
- Estimated amount of new objects: (e.g., new Object X for every existing Pod)
381+
382+
No
383+
384+
* **Will enabling / using this feature result in increasing time taken by any
385+
operations covered by [existing SLIs/SLOs]?**
386+
Think about adding additional work or introducing new steps in between
387+
(e.g. need to do X to start a container), etc. Please describe the details.
388+
389+
For log calls that are synchronous there is a low risk of increasing time taken by operations covered by SLIs/SLOs.
390+
Feature will be disabled by default until we are able to address performance overhead in Beta.
391+
More detail in [#Performance overhead] section of KEP.
392+
393+
* **Will enabling / using this feature result in non-negligible increase of
394+
resource usage (CPU, RAM, disk, IO, ...) in any components?**
395+
Things to keep in mind include: additional in-memory state, additional
396+
non-trivial computations, excessive access to disks (including increased log
397+
volume), significant amount of data sent and/or received over network, etc.
398+
This through this both in small and large cases, again with respect to the
399+
[supported limits].
400+
401+
Sanitizing logs will require additional computation for performance sensitive logging calls. Feature will be disabled
402+
by default until we are able to address performance overhead in Beta.
403+
More detail in [#Performance overhead] section of KEP.
404+
405+
### Troubleshooting
406+
407+
The Troubleshooting section currently serves the `Playbook` role. We may consider
408+
splitting it into a dedicated `Playbook` document (potentially with some monitoring
409+
details). For now, we leave it here.
410+
411+
_This section must be completed when targeting beta graduation to a release._

keps/sig-instrumentation/1753-logs-sanitization/kep.yaml

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -5,9 +5,12 @@ authors:
55
- "@immutableT"
66
- "@PurelyApplied"
77
owning-sig: sig-instrumentation
8-
status: provisional
8+
status: implementable
99
creation-date: 2020-05-07
1010
reviewers:
11-
- TBD
11+
- "@ehashman"
1212
approvers:
13-
- TBD
13+
- "@dashpole"
14+
prr-approvers:
15+
- "@wojtek-t"
16+
disable-supported: true

0 commit comments

Comments
 (0)