|
| 1 | +# KEP-1753: Kubernetes system components logs sanitization |
| 2 | + |
| 3 | +<!-- toc --> |
| 4 | +- [Release Signoff Checklist](#release-signoff-checklist) |
| 5 | +- [Summary](#summary) |
| 6 | +- [Motivation](#motivation) |
| 7 | + - [Goals](#goals) |
| 8 | + - [Non-Goals](#non-goals) |
| 9 | +- [Proposal](#proposal) |
| 10 | + - [Risks and Mitigations](#risks-and-mitigations) |
| 11 | + - [Performance overhead](#performance-overhead) |
| 12 | +- [Design Details](#design-details) |
| 13 | + - [Source code tags](#source-code-tags) |
| 14 | + - [datapolicy verification library](#datapolicy-verification-library) |
| 15 | + - [klog integration](#klog-integration) |
| 16 | + - [Logging configuration](#logging-configuration) |
| 17 | + - [Test Plan](#test-plan) |
| 18 | + - [Graduation Criteria](#graduation-criteria) |
| 19 | + - [Alpha (1.20)](#alpha-120) |
| 20 | + - [Beta (1.21)](#beta-121) |
| 21 | + - [GA (1.22)](#ga-122) |
| 22 | +- [Implementation History](#implementation-history) |
| 23 | +- [Drawbacks](#drawbacks) |
| 24 | +- [Alternatives](#alternatives) |
| 25 | + - [Static code analysis](#static-code-analysis) |
| 26 | + - [Limitations of Static Analysis](#limitations-of-static-analysis) |
| 27 | + - [Theoretical](#theoretical) |
| 28 | + - [Practical](#practical) |
| 29 | + - [Strengths and Weaknesses of Static and Dynamic Analyses ([3])](#strengths-and-weaknesses-of-static-and-dynamic-analyses-3) |
| 30 | +<!-- /toc --> |
| 31 | + |
| 32 | +## Release Signoff Checklist |
| 33 | + |
| 34 | +Items marked with (R) are required *prior to targeting to a milestone / release*. |
| 35 | + |
| 36 | +- [ ] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR) |
| 37 | +- [ ] (R) KEP approvers have approved the KEP status as `implementable` |
| 38 | +- [ ] (R) Design details are appropriately documented |
| 39 | +- [ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input |
| 40 | +- [ ] (R) Graduation criteria is in place |
| 41 | +- [ ] (R) Production readiness review completed |
| 42 | +- [ ] Production readiness review approved |
| 43 | +- [ ] "Implementation History" section is up-to-date for milestone |
| 44 | +- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io] |
| 45 | +- [ ] Supporting documentation e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes |
| 46 | + |
| 47 | +<!-- |
| 48 | +**Note:** This checklist is iterative and should be reviewed and updated every time this enhancement is being considered for a milestone. |
| 49 | +--> |
| 50 | + |
| 51 | +[kubernetes.io]: https://kubernetes.io/ |
| 52 | +[kubernetes/enhancements]: https://git.k8s.io/enhancements |
| 53 | +[kubernetes/kubernetes]: https://git.k8s.io/kubernetes |
| 54 | +[kubernetes/website]: https://git.k8s.io/website |
| 55 | + |
| 56 | +## Summary |
| 57 | + |
| 58 | +This KEP proposes the introduction of a logging filter which could be applied to all Kubernetes system components logs to prevent various types of sensitive information from leaking via logs. |
| 59 | + |
| 60 | +## Motivation |
| 61 | + |
| 62 | +One of the outcomes of the [Kubernetes Security Audit](https://www.cncf.io/blog/2019/08/06/open-sourcing-the-kubernetes-security-audit/) was identification of two vulnerabilities which were directly related to sensitive data like tokens or passwords being written to Kubernetes system components logs: |
| 63 | + |
| 64 | +- *6.* Bearer tokens are revealed in logs |
| 65 | + |
| 66 | +- *22.* iSCSI volume storage cleartext secrets in logs |
| 67 | + |
| 68 | +To address this problem audit authors suggested what follows: |
| 69 | + |
| 70 | +_**Ensure that sensitive data cannot be trivially stored in logs.** Prevent dangerous logging actions with improved code review policies. Redact sensitive information with logging filters. Together, these actions can help to prevent sensitive data from being exposed in the logs_ |
| 71 | + |
| 72 | +Taking into account the size of the Kubernetes source code and the pace at which it changes it is very likely that many similar vulnerabilities which have not been identified yet still exist in the source code. Those impose significant threat to the Kubernetes clusters security whenever logs from Kubernetes system components are exposed to the users. |
| 73 | + |
| 74 | +This KEP directly addresses _“Redact sensitive information with logging filters”_ part of the audit document recommendation by proposing the introduction of a dynamic filter integrated with Kubernetes logging library which could inspect all log entry parameters before those are used to build a log message and redact those logs entries which might contain sensitive data. |
| 75 | + |
| 76 | + |
| 77 | +### Goals |
| 78 | + |
| 79 | +- Prevent commonly known types of sensitive data from being accidentally logged. |
| 80 | +- Make it easy to extend sanitization logic by adding new sources of sensitive data which should never be logged. |
| 81 | +- Limit the performance overhead related to logs sanitization to acceptable level. |
| 82 | + |
| 83 | +### Non-Goals |
| 84 | + |
| 85 | +- Eliminate completely the risk of exposing any security sensitive data via Kubernetes system components logs. |
| 86 | +- Identify all places in the Kubernetes source code which store sensitive data. |
| 87 | +- Provide a generic set of source code tags which go beyond what is needed for this KEP. |
| 88 | + |
| 89 | +## Proposal |
| 90 | + |
| 91 | +We propose to define a set of standard Kubernetes [go lang struct tags](https://golang.org/ref/spec#Struct_types) which could be used to tag fields which contain sensitive information which should not be logged because of security risks. |
| 92 | + |
| 93 | +Adding those tags in the source code will be a manual process which we want to initiate with this KEP. Finding places where those should be added can be aided by grepping Kubernetes source code for common phrases like password, securitykey or token. |
| 94 | + |
| 95 | +When it comes to standard go lang types and third party libraries used in Kubernetes which cannot be changed we propose listing them in one place with information about the type of the sensitive data which they contain. |
| 96 | + |
| 97 | +We also propose to implement a small library which could use the above information to verify if any of the provided values contain reference to sensitive data. |
| 98 | + |
| 99 | +Finally we propose to integrate this library with the klog logging library used by Kubernetes in a way that when enabled the log entries which contain information marked as sensitive will be redacted from the logs. |
| 100 | + |
| 101 | +### Risks and Mitigations |
| 102 | + |
| 103 | +#### Performance overhead |
| 104 | + |
| 105 | +Inspection of log parameters can be time consuming and it can impose significant performance overhead on each log library invocation. |
| 106 | + |
| 107 | +This risk can be mitigated by: |
| 108 | +- running inspection only on log entry parameters which are going to be actually logged - running inspection after log v-level has been evaluated. |
| 109 | +- introducing a dedicated go lang interface with a single method returning information about types of sensitive data which given value contains like: |
| 110 | + |
| 111 | + ```go |
| 112 | + type DatapolicyDetector interface { |
| 113 | + ContainsDatapolicyValues() []string |
| 114 | + } |
| 115 | + ``` |
| 116 | + |
| 117 | +- implementations of this interface could be auto generated using a dedicated code generator similar to deep-copy generator or manually implemented when needed, |
| 118 | +caching negative inspection results for parameter types which does not have any references to types which may contain sensitive data. |
| 119 | + |
| 120 | +Which of those methods will be used and to what extent will be decided after running performance tests. |
| 121 | + |
| 122 | + |
| 123 | +## Design Details |
| 124 | + |
| 125 | +### Source code tags |
| 126 | + |
| 127 | +We propose to mark all struct fields which may contain sensitive data with a new datapolicy tag which as a value will accept a list of predefined types of data. |
| 128 | + |
| 129 | +For now we propose following types of data to be available as values: |
| 130 | +- password |
| 131 | +- token |
| 132 | +- security-key |
| 133 | + |
| 134 | +Example of using datapolicy tag: |
| 135 | + |
| 136 | +```go |
| 137 | +type ConfigMap struct { |
| 138 | + Data map[string]string `json:"data,omitempty" datapolicy:”password,token,security-key”` |
| 139 | +} |
| 140 | +``` |
| 141 | + |
| 142 | +For external types which are not part of the Kubernetes source code such as go lang standard libraries or third party vendored libraries for which we cannot change source code we will define a global function which will map those into relevant datapolicy tag: |
| 143 | + |
| 144 | +```go |
| 145 | +func GlobalDatapolicyMapping(val interface{}) []string |
| 146 | +``` |
| 147 | + |
| 148 | +### datapolicy verification library |
| 149 | + |
| 150 | +datapolicy verification library will implement logic for checking if the provided value contains any sensitive data identified by the datapolicy tag. Verification will be performed using reflection. Recursion will stop on pointers as values for those are usually not logged. Verification will depend only on the datapol tags values. Values of individual fields of primitive types like string will not be checked other than checking if a given field is not empty. |
| 151 | + |
| 152 | +```go |
| 153 | +package datapol { |
| 154 | + // Verify returns a slice with types of sensitive data the given value contains. |
| 155 | + func Verify(value interface{}) []string |
| 156 | +} |
| 157 | +``` |
| 158 | + |
| 159 | +When used datapolicy verification library will be initialized with GlobalDatapolicyMapping function to take into account information about external types which may contain sensitive data. |
| 160 | + |
| 161 | +### klog integration |
| 162 | + |
| 163 | +Currently the klog library used by Kubernetes does not provide any extension point which could be used for filtering logs. |
| 164 | + |
| 165 | +We propose to add a new interface to klog library: |
| 166 | + |
| 167 | +```go |
| 168 | +type LogFilter interface { |
| 169 | + Filter(args []interface{}) (args []interface{}) |
| 170 | + FilterF(format string, args []interface{}) (string,[]interface{}) |
| 171 | + FilterS(msg string, keysAndValues []interface{}) (string, []interface{}) |
| 172 | +} |
| 173 | +``` |
| 174 | + |
| 175 | +and the global function: |
| 176 | + |
| 177 | +```go |
| 178 | +func SetLogFilter(filter LogFilter) |
| 179 | +``` |
| 180 | + |
| 181 | +which will make provided log filter methods to be called: |
| 182 | +- `Filter()` - for each `Info()`, `Infoln()`, `InfoDepth()` and related methods invocations. |
| 183 | +- `FilterF()` - for each `Infof()` and related methods invocations. |
| 184 | +- `FilterS()` - called for each `InfoS()` and `ErrorS()` methods invocations. |
| 185 | + |
| 186 | +datapolicy verification library will be integrated with klog library in a way that when enabled log entries for which datapolicy.Verify() will return non empty value will be replaced with the following message: |
| 187 | + |
| 188 | +`“Log message has been redacted. Log argument #%d contains: %v”` |
| 189 | + |
| 190 | +where `%d` is the position of log parameter which contains sensitive data and `%v` is the result of datapol.Verify() function. |
| 191 | + |
| 192 | +### Logging configuration |
| 193 | + |
| 194 | +To allow configuring if logs should be sanitized we will introduce a new logging configuration field shared by all kubernetes components. |
| 195 | + |
| 196 | +`--logging-sanitization` flag should allow to pick if the sanitization will be enabled. Setting this flag to true will enable it. |
| 197 | + |
| 198 | + |
| 199 | +### Test Plan |
| 200 | + |
| 201 | +### Graduation Criteria |
| 202 | + |
| 203 | +#### Alpha (1.20) |
| 204 | +- All well-known places which contain sensitive data have been tagged. |
| 205 | +- All external well-known types are handled by the GlobalDatapolicyMapping function. |
| 206 | +- It is possible to turn on logs sanitization for all Kubernetes components including API Server, Scheduler, Controller manager and kubelet using --logging-sanitization flag. |
| 207 | + |
| 208 | +#### Beta (1.21) |
| 209 | +- Performance overhead related to enabling dynamic logs sanitization has been reduced to 50% compared to time spent in klog library functions with this feature disabled. This should be verified using Kubernetes scalability tests. |
| 210 | + |
| 211 | +#### GA (1.22) |
| 212 | +- Logs sanitization is enabled by default. |
| 213 | + |
| 214 | +## Implementation History |
| 215 | + |
| 216 | +## Drawbacks |
| 217 | + |
| 218 | +## Alternatives |
| 219 | + |
| 220 | +### Static code analysis |
| 221 | + |
| 222 | +Instead of introducing optional dynamic filtering of logs at runtime we could use the same metadata to perform static code analysis. |
| 223 | + |
| 224 | +#### Limitations of Static Analysis |
| 225 | + |
| 226 | +##### Theoretical |
| 227 | + |
| 228 | +The major theoretical limitation of static analysis is imposed by results from [decidability theory](https://www.tutorialspoint.com/automata_theory/language_decidability.htm) - given an arbitrary program written in a general-purpose programming language (one capable of simulating a Turing machine), it is impossible to examine the program algorithmically and determine if an arbitrarily chosen statement in the program will be executed when the program operates on arbitrarily chosen input data ([1]). Furthermore, there is no algorithmic way to identify those programs for which the analysis is possible; otherwise, the halting problem for Turing machines would be solvable. |
| 229 | + |
| 230 | +One major, ramification of this result concerns the distinctinction between syntactic and semantic control paths. Syntactic paths comprise all possible control paths through the flow graph. Semantic paths are those syntactic paths that can be executed. However, not all syntactic paths are executable paths. Thus, the semantic paths are a subset of the syntactic paths. In static analysis, it would be highly desirable to identify the semantic path. However, decidability results state that there is no algorithmic way to detect the semantic path through an arbitrary program written in a general purpose programming language. |
| 231 | + |
| 232 | +[1] "Tutorial: Static Analysis and Dynamic Testing of Computer ...." https://ieeexplore.ieee.org/abstract/document/1646907/. Accessed 15 May. 2020. |
| 233 | + |
| 234 | +##### Practical |
| 235 | + |
| 236 | +A major practical limitation of static analysis concerns array references and evaluation of pointer variables. Array subscripts and pointers are mechanisms for selecting a data item at runtime, based on previous computations performed by the program. Static analysis cannot evaluate subscripts pointers and, thus, is unable to distinguish between elements of an array or members of a list. Although it might be possible to analyze subscripts and pointers using techniques of symbolic execution, it is generally simpler and more efficient to use dynamic testing. |
| 237 | + |
| 238 | +The other practical limitation of static analysis is slow execution on large models of state ([2]). |
| 239 | + |
| 240 | +[2] "Static and dynamic analysis: synergy and duality - Computer ...." https://homes.cs.washington.edu/~mernst/pubs/staticdynamic-woda2003.pdf. Accessed 15 May. 2020. |
| 241 | + |
| 242 | +#### Strengths and Weaknesses of Static and Dynamic Analyses ([3]) |
| 243 | + |
| 244 | +Static analysis, with its whitebox visibility, is certainly the more thorough approach and may also prove more cost-efficient with the ability to detect bugs at an early phase of the software development life cycle. Static analysis can also unearth errors that would not emerge in a dynamic test. Dynamic analysis, on the other hand, is capable of exposing a subtle flaw or vulnerability too complicated for static analysis alone to reveal. A dynamic test, however, will only find defects in the part of the code that is actually executed. |
| 245 | + |
| 246 | +Therefore static and dynamic analysis should not be considered as disjoint alternatives but rather as a complementary solutions and in the end we should have both implemented in Kubernetes. |
| 247 | + |
| 248 | +[3] "Static Testing vs. Dynamic Testing - Veracode." 3 Dec. 2013, https://www.veracode.com/blog/2013/12/static-testing-vs-dynamic-testing. Accessed 15 May. 2020. |
0 commit comments