Skip to content

Commit d749966

Browse files
committed
Iterate on feedback
1 parent b3db94e commit d749966

File tree

1 file changed

+53
-40
lines changed
  • keps/sig-api-machinery/3716-webhook-predicates

1 file changed

+53
-40
lines changed

keps/sig-api-machinery/3716-webhook-predicates/README.md

Lines changed: 53 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,8 @@
3434
- [Troubleshooting](#troubleshooting)
3535
- [Implementation History](#implementation-history)
3636
- [Drawbacks](#drawbacks)
37+
- [Future Work](#future-work)
38+
- [Cross-webhook match conditions](#cross-webhook-match-conditions)
3739
- [Alternatives](#alternatives)
3840
- [Exclusion Expressions](#exclusion-expressions)
3941
- [Resource Exclusions](#resource-exclusions)
@@ -85,12 +87,6 @@ avoid circular-dependencies and exclude critical system resources, a webhook out
8587
major impact on cluster availability. This proposal aims to mitigate (but not eliminate) these
8688
issues by allowing webhooks to be more narrowly scoped and targeted.
8789

88-
89-
The same benefits also apply to bootstrapping a new cluster, or in a disaster-recovery scenario
90-
where objects in a cluster need to be recreated from source code. Being able to define and deploy a
91-
webhook before any workload Pods are defined means that admins don't need to bypass security
92-
controls to fix things, and helps to ensure there's never a security gap when restoring service.
93-
9490
**Performance:** Admission webhooks sit in the critical request path for write-requests. Validating
9591
webhooks can be run in parallel, but Mutating webhooks must be run in serial (up to 2 times!). This
9692
makes webhooks extremely latency sensitive, and even a webhook that doesn't do any work still needs
@@ -129,8 +125,8 @@ rules:
129125
apiVersions: '*'
130126
resources: '*'
131127
matchConditions:
132-
# Exclude leases from the webhook
133-
- expression: '!(request.resource.group == "coordination.k8s.io" && resource.resource == "leases")'
128+
- name: 'exclude-leases'
129+
expression: '!(request.resource.group == "coordination.k8s.io" && resource.resource == "leases")'
134130
```
135131
136132
#### Exempt system users from security policy
@@ -146,22 +142,20 @@ With `matchConditions`, a managed cluster could append system-exclusion rules to
146142

147143
```yaml
148144
matchConditions:
149-
# Exclude node requests from the webhook
150-
- expression: '!("system:nodes" in request.userInfo.groups)'
145+
- name: 'exclude-kubelet-requests'
146+
expression: '!("system:nodes" in request.userInfo.groups)'
151147
```
152148

153149
Since the expression will be evaluated using a common Kubernetes CEL library, these expressions
154150
should also get automatic access to the secondary authorization check mechanism described in
155151
[KEP-3488: CEL for Admission Control](/keps/sig-api-machinery/3488-cel-admission-control#secondary-authz).
156152
In practice, this means that RBAC bindings can be used to opt-out privileged users from security policy:
157153

158-
_Note: The secondary authz mechanism has not yet been designed, and the example syntax here is just
159-
to illustrate how it might be used._
160-
161154
```yaml
162155
matchConditions:
163-
# Exclude users with the 'breakglass' permission on the 'security-policy' webhook.
164-
- expression: '!authorized(request.userInfo, "breakglass", "validatingwebhookconfigurations.admissionregistration.k8s.io/security-policy")'
156+
# Requests by users without breakglass should be included.
157+
- name: 'breakglass'
158+
expression: 'authorizer.resource('admissionregistration.k8s.io', 'validatingwebhookconfigurations', '*').name('security-policy').check('breakglass').denied()'
165159
```
166160

167161
#### Scope an NFS access management webhook to Pods mounting NFS volumes
@@ -191,7 +185,8 @@ rules:
191185
resources: 'pods'
192186
matchConditions:
193187
# Only include pods with an NFS volume.
194-
- expression: 'object.spec.volumes.exists(v, v.has(nfs))'
188+
- name: 'nfs-volume-present'
189+
expression: 'object.spec.volumes.exists(v, v.has(nfs))'
195190
```
196191

197192
### Goals
@@ -219,6 +214,8 @@ type ValidatingWebhook struct {
219214
// for a request to be sent to this webhook. All conditions in the list must evaluate to TRUE for
220215
// the request to be matched.
221216
// +optional
217+
// +patchMergeKey=name
218+
// +patchStrategy=merge
222219
MatchConditions []MatchCondition `json:"matchConditions,omitempty"`
223220
}
224221

@@ -229,6 +226,12 @@ type MutatingWebhook struct {
229226

230227
// MatchCondition represents a condition which must by fulfilled for a request to be sent to a webhook.
231228
type MatchCondition struct {
229+
// Name is an identifier for this match condition, used for strategic merging of MatchConditions,
230+
// as well as providing an identifier for logging purposes. A good name should be descriptive of
231+
// the associated expression.
232+
// Name must be a valid RFC 1123 DNS subdomain, and unique in a set of MatchConditions.
233+
// Required.
234+
Name string `json:"name"`
232235
// NOTE: Placeholder documentation, to be replaced by https://github.com/kubernetes/website/issues/39089.
233236
//
234237
// Expression represents the expression which will be evaluated by CEL.
@@ -266,7 +269,7 @@ a CEL expression adds a potential failure point. This can be mitigated by testin
266269
ecosystem currently lacks some of the tools that would make this easier.
267270

268271
Of particular significance are match conditions tied to non-functional properties of an object, such
269-
as using labels to decide whether to opt an object out of a policy. Without additional admition
272+
as using labels to decide whether to opt an object out of a policy. Without additional admission
270273
controls on who can set those non-functional aspects, exempting the policy based on that could be a
271274
security vulnerability. In contrast, the
272275
[NFS example usecase](#scope-an-nfs-access-management-webhook-to-pods-mounting-nfs-volumes) exempts
@@ -449,32 +452,18 @@ in back-to-back releases.
449452

450453
### Upgrade / Downgrade Strategy
451454

452-
<!--
453-
If applicable, how will the component be upgraded and downgraded? Make sure
454-
this is in the test plan.
455-
456-
Consider the following in developing an upgrade/downgrade strategy for this
457-
enhancement:
458-
- What changes (in invocations, configurations, API use, etc.) is an existing
459-
cluster required to make on upgrade, in order to maintain previous behavior?
460-
- What changes (in invocations, configurations, API use, etc.) is an existing
461-
cluster required to make on upgrade, in order to make use of the enhancement?
462-
-->
455+
Downgrading in a way that disables match conditions after it is already in use can increase the
456+
scope of requests evaluated by a webhook. See
457+
[Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?](#can-the-feature-be-disabled-once-it-has-been-enabled-ie-can-we-roll-back-the-enablement)
458+
for more details
463459

464460
### Version Skew Strategy
465461

466-
<!--
467-
If applicable, how will the component handle version skew with other
468-
components? What are the guarantees? Make sure this is in the test plan.
469-
470-
Consider the following in developing a version skew strategy for this
471-
enhancement:
472-
- Does this enhancement involve coordinating behavior in the control plane and
473-
in the kubelet? How does an n-2 kubelet without this feature available behave
474-
when this feature is used?
475-
- Will any other components on the node change? For example, changes to CSI,
476-
CRI or CNI may require updating that component before the kubelet.
477-
-->
462+
The new field is only evaluated by the apiserver, so only HA apiserver version skew is relevant. In
463+
this case, if the feature is enabled in one apiserver and not another, a request could
464+
non-deterministically be sent to a webhook. Enabling match conditions without setting
465+
`matchConditions` on an webhooks is a no-op, so the version skew non-determinism is best avoided by
466+
waiting until it has been enabled in all apiservers before starting to use the new field.
478467

479468
## Production Readiness Review Questionnaire
480469

@@ -590,6 +579,14 @@ Even if applying deprecation policies, they may still surprise some users.
590579

591580
### Monitoring Requirements
592581

582+
A new per-webhook metric will measure the number of requests excluded by match conditions:
583+
584+
Metric name: `webhook_admission_match_condition_exclusions_total`
585+
Labels:
586+
- `name`: webhook name
587+
- `type`: `validate` or `admit`
588+
- `operation`: the admission operation
589+
593590
<!--
594591
This section must be completed when targeting beta to a release.
595592
@@ -762,6 +759,8 @@ This through this both in small and large cases, again with respect to the
762759

763760
### Troubleshooting
764761

762+
See [Debuggability](#debuggability).
763+
765764
<!--
766765
This section must be completed when targeting beta to a release.
767766
@@ -811,6 +810,20 @@ Major milestones might include:
811810
Why should this KEP _not_ be implemented?
812811
-->
813812

813+
## Future Work
814+
815+
### Cross-webhook match conditions
816+
817+
In the future, we should explore ways to apply common match conditions across multiple webhooks.
818+
819+
Example use cases:
820+
- Apply a [break-glass exemption](#exempt-system-users-from-security-policy) across many (or all) webhooks.
821+
- Managed cluster provider wants to exempt provider-managed resources from user-managed webhooks.
822+
823+
Considerations:
824+
- Access by managed cluster provider vs. cluster admin
825+
- Side effects & mutations
826+
814827
## Alternatives
815828

816829
### Exclusion Expressions

0 commit comments

Comments
 (0)