@@ -598,31 +598,67 @@ coverage of unit tests.
598
598
599
599
### Monitoring
600
600
601
- A single metric will be added to track policy evaluations against pods and [ templated pods] .
602
- [ Namespace evaluations] ( #namespace-policy-update-warnings ) are not counted.
601
+ Three metrics will be introduced:
603
602
604
603
```
605
604
pod_security_evaluations_total
606
605
```
607
606
607
+ This metric will be added to track policy evaluations against pods and [ templated pods] .
608
+ [ Namespace evaluations] ( #namespace-policy-update-warnings ) are not counted.
609
+ The metric will only be incremented when the policy check is actually performed. In other words,
610
+ this metric will not be incremented if any of the following are true:
611
+
612
+ - Ignored resource types, subresources, or workload resources without a pod template
613
+ - Update requests that are out of scope (see [ Updates] ( #updates ) above)
614
+ - Exempt requests (these are reported in the ` pod_security_exemptions_total ` metric instead)
615
+ - Errors that make policy evaluation impossible (these are reported in the ` pod_security_exemptions_total ` metric instead)
616
+
608
617
The metric will use the following labels:
609
618
610
- 1 . ` decision {exempt, allow, deny, error} ` - The policy decision. Error is reserved for panics or
611
- other errors in policy evaluation. Update requests that are out of scope (see [ Updates] ( #updates )
612
- above) are not counted.
619
+ 1 . ` decision {allow, deny} ` - The policy decision. ` allow ` is only recorded with ` enforce ` mode.
613
620
3 . ` policy_level {privileged, baseline, restricted} ` - The policy level that the request was
614
621
evaluated against.
615
622
4 . ` policy_version {v1.X, v1.Y, latest, future} ` - The policy version that was used for the evaluation.
616
623
Explicit versions less than or equal to the build of the API server or webhook are recorded in the form ` v1.x ` (e.g. ` v1.22 ` ).
617
624
Explicit versions greater than the build of the API server or webhook (which are evaluated as ` latest ` ) are recorded as ` future ` .
618
625
Explicit use of the ` latest ` version or implicit use by omitting a version or specifying an unparseable version will be recorded as ` latest ` .
619
626
5 . ` mode {enforce, warn, audit} ` - The type of evaluation mode being recorded. Note that a single
620
- request can increment this metric 3 times, once for each mode. If this admission controller is
621
- enabled, every every create request and in-scope update request will at least increment the
622
- ` enforce ` total. Privileged evaluations for warn and audit modes are not counted .
627
+ request can increment this metric 3 times, once for each mode. ` audit ` and ` warn ` mode metrics
628
+ are only incremented for violations. If this admission controller is enabled, every
629
+ evaluated request will at least increment the ` enforce ` total .
623
630
6 . ` request_operation {create, update} ` - The operation of the request being checked.
624
631
7 . ` resource {pod, controller} ` - Whether the request object is a Pod, or a [ templated
625
632
pod] ( #podtemplate-resources ) resource.
633
+ 8 . ` subresource {ephemeralcontainers} ` - The subresource, when relevant & in scope.
634
+
635
+ ```
636
+ pod_security_exemptions_total
637
+ ```
638
+
639
+ This metric will be added to track requests that are considered exempt. Ignored resources and out of
640
+ scope requests do not count towards the total. Errors encountered before the exemption logic will
641
+ not be counted as exempt.
642
+
643
+ The metric will use the following labels. The definitions match from the above label definitions.
644
+
645
+ 1 . ` request_operation {create, update} `
646
+ 2 . ` resource {pod, controller} `
647
+ 3 . ` subresource {ephemeralcontainers} `
648
+
649
+ ```
650
+ pod_security_errors_total
651
+ ```
652
+
653
+ This metric will be added to track errors encountered during request evaluation.
654
+
655
+ The metric will use the following labels. The definitions match from the above label definitions.
656
+
657
+ 1 . ` fatal {true, false} ` - Whether the error prevented evaluation (short-circuit deny). If
658
+ ` fatal=false ` then the latest restricted profile may be used to evaluate the pod.
659
+ 2 . ` request_operation {create, update} `
660
+ 3 . ` resource {pod, controller} `
661
+ 4 . ` subresource {ephemeralcontainers} `
626
662
627
663
### Audit Annotations
628
664
@@ -810,7 +846,7 @@ _This section must be completed when targeting alpha to a release._
810
846
of the following metrics mean the feature is not working as expected:
811
847
812
848
* ` pod_security_evaluations_total{decision=deny,mode=enforce} `
813
- * ` pod_security_evaluations_total{decision=error,mode=enforce} `
849
+ * ` pod_security_errors_total `
814
850
815
851
* ** Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?**
816
852
@@ -831,15 +867,21 @@ _This section must be completed when targeting alpha to a release._
831
867
832
868
* ** What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?**
833
869
- [x] Metrics
834
- - Metric name: ` pod_security_evaluations_total `
870
+ - Metric name: ` pod_security_evaluations_total ` , ` pod_security_errors_total `
835
871
- Components exposing the metric: ` kube-apiserver `
836
872
837
873
* ** What are the reasonable SLOs (Service Level Objectives) for the above SLIs?**
838
- - ` pod_security_evaluations_total{decision=error} `
874
+ - ` pod_security_errors_total `
839
875
- any rising count of these metrics indicates an unexpected problem evaluating the policy
840
- - ` pod_security_evaluations_total{decision=error,mode=enforce }`
876
+ - ` pod_security_errors_total{fatal=true }`
841
877
- any rising count of these metrics indicates an unexpected problem evaluating the policy that
842
878
is preventing pod write requests
879
+ - ` pod_security_errors_total{fatal=false} ` ,
880
+ ` pod_security_evaluations_total{decision=deny,mode=enforce,level=restricted,version=latest} `
881
+ - a rising count of non-fatal errors indicates an error resolving namespace policies, which
882
+ causes PodSecurity to default to enforcing ` restricted:latest `
883
+ - a corresponding rise in ` restricted:latest ` denials may indicate that these errors are
884
+ preventing pod write requests
843
885
- ` pod_security_evaluations_total{decision=deny,mode=enforce} `
844
886
- a rising count indicates that the policy is preventing pod creation as intended, but is
845
887
preventing a user or controller from successfully writing pods
@@ -922,8 +964,8 @@ details). For now, we leave it here.
922
964
- Testing: unit testing on configuration validation
923
965
924
966
- Enforce mode rejects pods because invalid level/version defaulted to ` restricted ` level
925
- - Detection: rising ` pod_security_evaluations_total{decision=error,mode=enforce }` metric counts
926
- - Mitigations:
967
+ - Detection: rising ` pod_security_errors_total{fatal=false }` metric counts
968
+ - Mitigations: fix the malformed labels
927
969
- Diagnostics:
928
970
- Locate audit logs containing ` pod-security.kubernetes.io/error ` annotations on affected requests
929
971
- Locate namespaces with malformed level labels:
0 commit comments