Skip to content

Commit 982c883

Browse files
cici37jpbetz
authored andcommitted
Define beta graduation criteria and namespaced policy configuration capability.
1 parent a6e99a8 commit 982c883

File tree

3 files changed

+144
-70
lines changed

3 files changed

+144
-70
lines changed
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
11
kep-number: 3488
22
alpha:
33
approver: "@johnbelamaric"
4+
beta:
5+
approver: "@johnbelamaric"

keps/sig-api-machinery/3488-cel-admission-control/README.md

Lines changed: 140 additions & 68 deletions
Original file line numberDiff line numberDiff line change
@@ -28,13 +28,13 @@
2828
- [Enforcement Actions](#enforcement-actions)
2929
- [Audit Annotations](#audit-annotations)
3030
- [Audit Events](#audit-events)
31-
- [Namespace scoped policy binding](#namespace-scoped-policy-binding)
31+
- [Per namespace policy params](#per-namespace-policy-params)
32+
- [Match Conditions](#match-conditions)
3233
- [CEL Expression Composition](#cel-expression-composition)
3334
- [Use Cases](#use-cases)
34-
- [Match Conditions](#match-conditions)
3535
- [Variables](#variables)
3636
- [Secondary Authz](#secondary-authz)
37-
- [Access to namespace metadata](#access-to-namespace-metadata)
37+
- [Access to namespace](#access-to-namespace)
3838
- [Transition rules](#transition-rules)
3939
- [Resource constraints](#resource-constraints)
4040
- [Safety Features](#safety-features)
@@ -43,6 +43,8 @@
4343
- [Audit Annotations](#audit-annotations-1)
4444
- [Client visibility](#client-visibility)
4545
- [Metrics](#metrics)
46+
- [Future Plan](#future-plan)
47+
- [Namespace scoped policy binding](#namespace-scoped-policy-binding)
4648
- [User Stories](#user-stories)
4749
- [Use Case: Singleton Policy](#use-case-singleton-policy)
4850
- [Use Case: Shared Parameter Resource](#use-case-shared-parameter-resource)
@@ -1186,48 +1188,42 @@ are included with the key provided. E.g.:
11861188
}
11871189
```
11881190

1189-
#### Namespace scoped policy binding
1190-
1191-
For phase 1, policy bindings were only allowed to be cluster scoped. We can
1192-
support namespace scoped policy bindings as follows:
1193-
1194-
- Add a `NamespacePolicyBinding` resource.
1195-
- If the parameter resource is namespace scoped, it implicitly matches
1196-
resources only in the namespace it is in, but may further constrain what
1197-
resources it matches with additional match criteria.
1198-
1199-
Benefits: Allows policy of a namespace to be controlled from within the
1200-
namespace. As an example, ResourceQuota works this way.
1201-
1202-
Details to consider:
1191+
#### Per namespace policy params
12031192

1204-
- Should a policy support both cluster scoped and namespace scoped binding? If
1205-
so how? It would need two different parameter CRDs (since a CRD must either be
1206-
cluster scoped or namespace scoped, not both).
1193+
Currently the policies and bindings are only allowed to be cluster scoped.
1194+
We want to support per namespace configuration with namespace scoped param resources.
12071195

1208-
#### CEL Expression Composition
1196+
(Thanks for the input from @dead2k)
1197+
This sort of mapping allows:
1198+
- A cluster-admin can write a single resource to say, “this is the policy I want in all my namespaces”.
1199+
- If namespace admins can read the param resources, but not write that resource, they can understand the limitations they currently have.
1200+
- A single lenient cluster policy and cluster policybinding can enforce the minimum constraint, and a single cluster policy with a cluster policybinding pointing to a namespace level param can further restrict.
12091201

1210-
##### Use Cases
1211-
1212-
###### Code re-use for complicated expressions
1202+
A new optional field `namespaceParamRef` could be added inside ValidatingAdmissionPolicyBinding to support such use case.
1203+
In contrast, a namespace scoped policybinding will require creation and maintenance of both policybindings and parameters
1204+
in every namespace to enforce the policy itself, versus the single policybinding and many parameters.
12131205

1214-
A CEL expression may not be computationally expensive, but could still be
1215-
intricate enough that copy-pasting could prove to be a bad decision later on
1216-
in time. With the addition of the `messageExpression` field, more copy-pasting
1217-
is expected as well. If a sufficiently complex expression ended up copy-pasted everywhere,
1218-
and then needs to be updated somehow, it will need that update in every place
1219-
it was copy-pasted. A variable, on the other hand, will only need to be updated
1220-
in one place.
1206+
```yaml
1207+
apiVersion: admissionregistration.k8s.io/v1alpha1
1208+
kind: ValidatingAdmissionPolicyBinding
1209+
metadata:
1210+
name: "demo-binding-test.example.com"
1211+
spec:
1212+
policyName: "demo-policy.example.com"
1213+
namespaceParamRef:
1214+
name: "param-resource.example.com"
1215+
failAction: “allow”
1216+
validationActions: [Deny]
1217+
```
12211218

1222-
###### Reusing/memoizing an expensive computation
1219+
- a new optional field `namespaceParamRef` is added as a peer to `paramRef`. User has to choose one for parameterization.
1220+
It allows users to configure param resource per namespace.
1221+
- failAction defines the behavior when the param resource cannot be found in current namespace.
1222+
Set to `allow` will ignore the validation and let the request through. Set to `deny` will fail the validation if specific param resource not found.
1223+
- if the resource be validated on is a cluster scoped resource and have `namespaceParamRef` set, return error.
1224+
- the existing behavior should not be affected.
12231225

1224-
For a CEL expression that runs in O(n^2) time or worse (or otherwise
1225-
takes a significant amount of time to execute), it would be nice to only run
1226-
it when necessary. For instance, if multiple validation expressions used the
1227-
same expensive expression, that expression could be refactored out into a
1228-
variable.
1229-
1230-
##### Match Conditions
1226+
#### Match Conditions
12311227

12321228
Note that the syntax of the `matchConditions` resource is intended to
12331229
align with the [Admission Webhook Match Conditions KEP #3716](https://github.com/kubernetes/enhancements/pull/3717),
@@ -1283,6 +1279,27 @@ Note that `matchConditions` and `validations` look similar, but `matchConditions
12831279
* All match conditions must be satisfied (evaluate to `true`) before `validations` are tested
12841280
* If there is an error executing a match condition, the failure policy for the (definition, binding) tuple is invoked
12851281

1282+
#### CEL Expression Composition
1283+
1284+
##### Use Cases
1285+
1286+
###### Code re-use for complicated expressions
1287+
1288+
A CEL expression may not be computationally expensive, but could still be
1289+
intricate enough that copy-pasting could prove to be a bad decision later on
1290+
in time. With the addition of the `messageExpression` field, more copy-pasting
1291+
is expected as well. If a sufficiently complex expression ended up copy-pasted everywhere,
1292+
and then needs to be updated somehow, it will need that update in every place
1293+
it was copy-pasted. A variable, on the other hand, will only need to be updated
1294+
in one place.
1295+
1296+
###### Reusing/memoizing an expensive computation
1297+
1298+
For a CEL expression that runs in O(n^2) time or worse (or otherwise
1299+
takes a significant amount of time to execute), it would be nice to only run
1300+
it when necessary. For instance, if multiple validation expressions used the
1301+
same expensive expression, that expression could be refactored out into a
1302+
variable.
12861303

12871304
##### Variables
12881305

@@ -1480,16 +1497,17 @@ If we were to offer a way to lookup arbitrary other resources, or even if
14801497
we provided selective access to just some resources, this might become
14811498
easier. This can explored as future work.
14821499

1483-
#### Access to namespace metadata
1500+
#### Access to namespace
14841501

1485-
We have general agreement to include this as a feature, but need to provide
1486-
a concrete design.
1502+
We have general agreement to grant CEL expressions access to the admission object's namespace through a newly added CEL variable `namespaceObject`.
1503+
If the resource is cluster scoped, `namespaceObject` will be null.
14871504

1488-
- Namespace labels and annotations are the most commonly needed fields not
1489-
already available in the resource being validated. Note that
1490-
namespaceSelectors already allow matches to examine namespace levels, but we
1491-
also have use cases that need to be able to inspects the fields in CEL
1492-
expressions.
1505+
`namespaceObject` will provide access to all existing fields under namespace metadata, namespace spec and namespace status except for metadata.managedFields and metadata.ownerReferences.
1506+
The fields could be directly accessed through `namespaceObject` variable. e.g. `namespaceObject.metadata.name` or `namespaceObject.status.phase`.
1507+
1508+
Namespace labels and annotations are the most commonly needed fields not already available in the resource being validated.
1509+
labels and annotations could be accessed through `namespaceObject.metadata.labels` for example `namespaceObject.metadata.labels.env`.
1510+
Note that we recommend to check if the specific label/annotation exists before validation: `'env' in namespaceObject.metadata.labels`.
14931511

14941512
#### Transition rules
14951513

@@ -1618,6 +1636,32 @@ the number of biindings can become quite large, so let's limit it to
16181636
- xref: [Metrics Provided by OPA Gatekeeper](https://open-policy-agent.github.io/gatekeeper/website/docs/metrics/)
16191637
- xref: [Admission Webhook Metrics](https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/#admission-webhook-metrics)
16201638

1639+
### Future Plan
1640+
1641+
#### Namespace scoped policy binding
1642+
1643+
**Note**
1644+
The namespace scoped policy binding will require a new API in place.
1645+
It will be planned separately and will not be affecting the existing ValidatingAdmissionPolicy behavior.
1646+
1647+
For phase 1, policy bindings were only allowed to be cluster scoped. We can
1648+
support namespace scoped policy bindings as follows:
1649+
1650+
- Add a `NamespacePolicyBinding` resource.
1651+
- If the parameter resource is namespace scoped, it implicitly matches
1652+
resources only in the namespace it is in, but may further constrain what
1653+
resources it matches with additional match criteria.
1654+
1655+
Benefits: Allows policy of a namespace to be controlled from within the
1656+
namespace. As an example, ResourceQuota works this way.
1657+
1658+
Details to consider:
1659+
1660+
- Should a policy support both cluster scoped and namespace scoped binding? If
1661+
so how? It would need two different parameter CRDs (since a CRD must either be
1662+
cluster scoped or namespace scoped, not both).
1663+
1664+
16211665
### User Stories
16221666

16231667
In addition to "User Stores", see below "Potential Applications" for a list of
@@ -2312,6 +2356,11 @@ in back-to-back releases.
23122356
If multiple admission policies require the same conversion, convert only once.
23132357
From @liggitt: "webhook code loops up one level, first accumulates all the validation webhooks we'll run, then converts to the versions needed by those webhooks then evaluates in parallel"
23142358
- authz check to the specific resource referenced in the policy's paramKind. ([comment](https://github.com/kubernetes/kubernetes/pull/113314#discussion_r1013135860))
2359+
- complete feature of access to namespace metadata
2360+
- complete type check for CRD
2361+
- add controlled rollout strategy to support future CEL library/function/variable changes
2362+
- [Quantity](https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/apimachinery/pkg/api/resource/quantity.go#L100) support from CEL expression and tested properly
2363+
- support the list of features mentioned under phrase 2
23152364

23162365
### Upgrade / Downgrade Strategy
23172366

@@ -2387,15 +2436,9 @@ well as the [existing list] of feature gates.
23872436
[existing list]: https://kubernetes.io/docs/reference/command-line-tools-reference/feature-gates/
23882437
-->
23892438

2390-
- [ ] Feature gate (also fill in values in `kep.yaml`)
2391-
- Feature gate name: CELValidatingAdmission
2439+
- [X] Feature gate (also fill in values in `kep.yaml`)
2440+
- Feature gate name: ValidatingAdmissionPolicy
23922441
- Components depending on the feature gate: kube-apiserver
2393-
- [ ] Other
2394-
- Describe the mechanism:
2395-
- Will enabling / disabling the feature require downtime of the control
2396-
plane?
2397-
- Will enabling / disabling the feature require downtime or reprovisioning
2398-
of a node?
23992442

24002443
###### Does enabling the feature change any default behavior?
24012444

@@ -2506,6 +2549,9 @@ Ideally, this should be a metric. Operations against the Kubernetes API (e.g.,
25062549
checking if there are objects with field X set) may be a last resort. Avoid
25072550
logs or events for this purpose.
25082551
-->
2552+
The following metrics could be used to see if the feature is in use:
2553+
- validating_admission_policy/check_total
2554+
- validating_admission_policy/definition_total
25092555

25102556
###### How can someone using this feature know that it is working for their instance?
25112557

@@ -2518,13 +2564,10 @@ and operation of this feature.
25182564
Recall that end users cannot usually observe component logs or access metrics.
25192565
-->
25202566

2521-
- [ ] Events
2522-
- Event Reason:
2523-
- [ ] API .status
2524-
- Condition name:
2525-
- Other field:
2526-
- [ ] Other (treat as last resort)
2527-
- Details:
2567+
- Metrics like `validating_admission_policy/check_total` can be used to check how many validation applied in total
2568+
- Audit mode can be used to check audit event following [this documentation](https://kubernetes.io/docs/reference/access-authn-authz/validating-admission-policy/#audit-annotations)
2569+
- ValidatingAdmissionPolicy.Status can be used to see if typechecking performed as expected
2570+
- User can also verify if the admission request is rejected or a warning is shown as expected based on how validationAction is set.
25282571

25292572
###### What are the reasonable SLOs (Service Level Objectives) for the enhancement?
25302573

@@ -2542,26 +2585,28 @@ high level (needs more precise definitions) those may be things like:
25422585
These goals will help you determine what you need to measure (SLIs) in the next
25432586
question.
25442587
-->
2588+
No impact on latency for admission request when ValidatingAdmissionPolicy are absent.
2589+
2590+
Performance when ValidatingAdmissionPolicy are in use will need to be measured and optimized.
25452591

25462592
###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
25472593

25482594
<!--
25492595
Pick one more of these and delete the rest.
25502596
-->
25512597

2552-
- [ ] Metrics
2553-
- Metric name:
2554-
- [Optional] Aggregation method:
2555-
- Components exposing the metric:
2556-
- [ ] Other (treat as last resort)
2557-
- Details:
2598+
- [ ] The Metrics below could be used:
2599+
- validating_admission_policy/check_total
2600+
- validating_admission_policy/definition_total
2601+
- validating_admission_policy/check_duration_seconds
25582602

25592603
###### Are there any missing metrics that would be useful to have to improve observability of this feature?
25602604

25612605
<!--
25622606
Describe the metrics themselves and the reasons why they weren't added (e.g., cost,
25632607
implementation difficulties, etc.).
25642608
-->
2609+
No. We are open to input.
25652610

25662611
### Dependencies
25672612

@@ -2585,6 +2630,7 @@ and creating new ones, as well as about cluster-level services (e.g. DNS):
25852630
- Impact of its outage on the feature:
25862631
- Impact of its degraded performance or high-error rates on the feature:
25872632
-->
2633+
No.
25882634

25892635
### Scalability
25902636

@@ -2612,6 +2658,7 @@ Focusing mostly on:
26122658
- periodic API calls to reconcile state (e.g. periodic fetching state,
26132659
heartbeats, leader election, etc.)
26142660
-->
2661+
Yes. A new API group is introduced which will be used for this feature.
26152662

26162663
###### Will enabling / using this feature result in introducing new API types?
26172664

@@ -2621,6 +2668,7 @@ Describe them, providing:
26212668
- Supported number of objects per cluster
26222669
- Supported number of objects per namespace (for namespace-scoped objects)
26232670
-->
2671+
Yes. We introduced two new kinds for this feature: ValidatingAdmissionPolicy and ValidatingAdmissionPolicyBinding as described in [this doc](https://kubernetes.io/docs/reference/access-authn-authz/validating-admission-policy/)
26242672

26252673
###### Will enabling / using this feature result in any new calls to the cloud provider?
26262674

@@ -2629,6 +2677,7 @@ Describe them, providing:
26292677
- Which API(s):
26302678
- Estimated increase:
26312679
-->
2680+
No.
26322681

26332682
###### Will enabling / using this feature result in increasing size or count of the existing API objects?
26342683

@@ -2638,6 +2687,7 @@ Describe them, providing:
26382687
- Estimated increase in size: (e.g., new annotation of size 32B)
26392688
- Estimated amount of new objects: (e.g., new Object X for every existing Pod)
26402689
-->
2690+
No.
26412691

26422692
###### Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?
26432693

@@ -2649,6 +2699,7 @@ Think about adding additional work or introducing new steps in between
26492699

26502700
[existing SLIs/SLOs]: https://git.k8s.io/community/sig-scalability/slos/slos.md#kubernetes-slisslos
26512701
-->
2702+
The existing admission request latency might be affected when the feature is used. We expect this to be negligible and will measure it before GA.
26522703

26532704
###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components?
26542705

@@ -2661,6 +2712,20 @@ This through this both in small and large cases, again with respect to the
26612712

26622713
[supported limits]: https://git.k8s.io/community//sig-scalability/configs-and-limits/thresholds.md
26632714
-->
2715+
We don't expect it to. Especially comparing to the existing method to achieve the same goal, using this feature will not result in non-negligible increase of resource usage.
2716+
2717+
###### Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)?
2718+
2719+
<!--
2720+
Focus not just on happy cases, but primarily on more pathological cases
2721+
(e.g. probes taking a minute instead of milliseconds, failed pods consuming resources, etc.).
2722+
If any of the resources can be exhausted, how this is mitigated with the existing limits
2723+
(e.g. pods per node) or new limits added by this KEP?
2724+
2725+
Are there any tests that were run/should be run to understand performance characteristics better
2726+
and validate the declared limits?
2727+
-->
2728+
No.
26642729

26652730
### Troubleshooting
26662731

@@ -2676,6 +2741,7 @@ details). For now, we leave it here.
26762741
-->
26772742

26782743
###### How does this feature react if the API server and/or etcd is unavailable?
2744+
Same as without this feature.
26792745

26802746
###### What are other known failure modes?
26812747

@@ -2691,9 +2757,15 @@ For each of them, fill in the following information by copying the below templat
26912757
Not required until feature graduated to beta.
26922758
- Testing: Are there any tests for failure mode? If not, describe why.
26932759
-->
2760+
N/A
26942761

26952762
###### What steps should be taken if SLOs are not being met to determine the problem?
26962763

2764+
- The feature can be disabled by disabling the API or setting the feature-gate to false if the performance impact of it is not tolerable.
2765+
- Try to run the validations separately to see which rule is slow
2766+
- Remove the problematic rules or update the rules to meet the requirement
2767+
2768+
26972769
## Implementation History
26982770

26992771
<!--

keps/sig-api-machinery/3488-cel-admission-control/kep.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -33,12 +33,12 @@ see-also:
3333
- "/keps/sig-api-machinery/2876-crd-validation-expression-language"
3434

3535
# The target maturity stage in the current dev cycle for this KEP.
36-
stage: alpha
36+
stage: beta
3737

3838
# The most recent milestone for which work toward delivery of this KEP has been
3939
# done. This can be the current (upcoming) milestone, if it is being actively
4040
# worked on.
41-
latest-milestone: "v1.27"
41+
latest-milestone: "v1.28"
4242

4343
# The milestone at which this feature was, or is targeted to be, at each stage.
4444
milestone:

0 commit comments

Comments
 (0)