-
Notifications
You must be signed in to change notification settings - Fork 218
NE-1953: Add Validating Admission Policy for Gateway API CRDs #1192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,13 @@ | ||
| apiVersion: admissionregistration.k8s.io/v1 | ||
| kind: ValidatingAdmissionPolicyBinding | ||
| metadata: | ||
| name: openshift-ingress-operator-gatewayapi-crd-admission | ||
| annotations: | ||
| capability.openshift.io/name: Ingress | ||
| include.release.openshift.io/ibm-cloud-managed: "true" | ||
| include.release.openshift.io/self-managed-high-availability: "true" | ||
| include.release.openshift.io/single-node-developer: "true" | ||
| release.openshift.io/feature-set: "TechPreviewNoUpgrade,CustomNoUpgrade" | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We don't use the feature-set annotation in other manifest yaml. Is it temporary and will be removed before GA?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No, it will be included in GA.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh, wait, I thought we could associate the manifest with a specific featuregate, which is what we really want to do.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. After reviewing https://github.com/openshift/enhancements/blob/master/enhancements/update/cvo-techpreview-manifests.md (and looking through the CVO code), I see that there isn't an option to associate a manifest with a featuregate rather than a featureset. It looks like the purpose of this annotation was to make it easier to introduce an entire operator as tech preview. Our use of it, to introduce a specific resource for a tech-preview feature implemented in a GA operator, isn't exactly what the annotation was intended for. That said, it seems like a valid use. However, it does mean that we will need to remove the annotation at the same time that we graduate the featuregate to GA. (This seems to me like a major design flaw of the
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Right, that's a flaw (in CVO design) which prevents the atomic promotion and even worse it can block the promotion. If there are some tests for the feature gate in origin repository. @JoelSpeed started a thread for this with CVO people. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As Andrey points out, this is on our todo, but for now there's a slightly awkward dance to promote the manifest to GA separately from the gate being promoted
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Adding a comment regarding the workaround would be nice.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Can you explain which workaround you mean? For the moment, CVO does not have a way to set a specific feature gate for a manifest. |
||
| spec: | ||
| policyName: openshift-ingress-operator-gatewayapi-crd-admission | ||
| validationActions: [Deny] | ||
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -0,0 +1,36 @@ | ||||||
| apiVersion: admissionregistration.k8s.io/v1 | ||||||
| kind: ValidatingAdmissionPolicy | ||||||
| metadata: | ||||||
| name: openshift-ingress-operator-gatewayapi-crd-admission | ||||||
| annotations: | ||||||
| capability.openshift.io/name: Ingress | ||||||
| include.release.openshift.io/ibm-cloud-managed: "true" | ||||||
| include.release.openshift.io/self-managed-high-availability: "true" | ||||||
| include.release.openshift.io/single-node-developer: "true" | ||||||
| release.openshift.io/feature-set: "TechPreviewNoUpgrade,CustomNoUpgrade" | ||||||
| spec: | ||||||
| matchConstraints: | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The godocs on matchConstraints are not very good.
I looked into the Kubernetes reference and found in https://kubernetes.io/docs/reference/access-authn-authz/validating-admission-policy/#type-checking, this:
Does it apply to us? No issues with adding this VAP?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think that "matched CRD types" refers to the type that is defined by the CRD, not to the CRD type itself. That is, I expect that the VAP can block updates to, say, the gatewayclasses CRD, but it cannot at this time block updates to an arbitrary gatewayclass CR.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
The type checking will be missing for this VAP. Meaning that when the VAP will be updated and if it will contain any syntax mistakes in validation expressions (or variables) the update of VAP will not be blocked and the syntax error will not be reported in the VAP's status. I don't think it's a blocking gap because we will validate our VAP before it will hit any cluster (local testing, integration testing, QE testing).
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
You mean the part about MutatingAdmissionPolicy? Indeed, it's unclear but should not impact us in anyway as we are adding VAP.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What does that other part even mean? I can't understand the grammar.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
This part means that it's not possible to create ValidatingAdmissionPolicy which validates the admission of another ValidatingAdmissionPolicy or ValidatingAdmissionPolicyBinding. |
||||||
| # Consider only requests to CRD resources. | ||||||
| resourceRules: | ||||||
| - apiGroups: ["apiextensions.k8s.io"] | ||||||
| apiVersions: ["v1"] | ||||||
| operations: ["CREATE","UPDATE","DELETE"] | ||||||
| resources: ["customresourcedefinitions"] | ||||||
| matchConditions: | ||||||
| # Consider only request to Gateway API CRDs. | ||||||
| - name: "check-only-gateway-api-crds" | ||||||
| # When the operation is DELETE, the "object" variable is null. | ||||||
| expression: "(request.operation == 'DELETE' ? oldObject : object).spec.group == 'gateway.networking.k8s.io'" | ||||||
| # Validations are evaluated in the the order of their declaration. | ||||||
| validations: | ||||||
| # Verify that the request was sent by the ingress operator's service account. | ||||||
| - expression: "has(request.userInfo.username) && (request.userInfo.username == 'system:serviceaccount:openshift-ingress-operator:ingress-operator')" | ||||||
| message: "Gateway API Custom Resource Definitions are managed by the Ingress Operator and may not be modified" | ||||||
| reason: Forbidden | ||||||
| # Verify that the request was sent from a pod. The presence of both "node" and "pod" claims implies that the token is bound to a pod object. | ||||||
| # Ref: https://kubernetes.io/docs/reference/access-authn-authz/service-accounts-admin/#schema-for-service-account-private-claims. | ||||||
| - expression: "has(request.userInfo.extra) && ('authentication.kubernetes.io/node-name' in request.userInfo.extra) && ('authentication.kubernetes.io/pod-name' in request.userInfo.extra)" | ||||||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Would the node claim alone be enough? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I suspect so, but I think this is extra safe and I don't see a need to remove the pod name check |
||||||
| message: "this user must have both \"authentication.kubernetes.io/node-name\" and \"authentication.kubernetes.io/pod-name\" claims" | ||||||
| reason: Forbidden | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it's debatable. I followed the logic of the corresponding HTTP codes where
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It is debatable but probably not worth it. |
||||||
| # Fail the admission if any validation evaluates to false. | ||||||
| failurePolicy: Fail | ||||||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -83,7 +83,6 @@ func TestAll(t *testing.T) { | |
| t.Run("TestSetRouteResponseHeaders", TestSetRouteResponseHeaders) | ||
| t.Run("TestReconcileInternalService", TestReconcileInternalService) | ||
| t.Run("TestConnectTimeout", TestConnectTimeout) | ||
| t.Run("TestGatewayAPI", TestGatewayAPI) | ||
| t.Run("TestAWSLBSubnets", TestAWSLBSubnets) | ||
| t.Run("TestUnmanagedAWSLBSubnets", TestUnmanagedAWSLBSubnets) | ||
| t.Run("TestAWSEIPAllocationsForNLB", TestAWSEIPAllocationsForNLB) | ||
|
|
@@ -126,5 +125,16 @@ func TestAll(t *testing.T) { | |
| t.Run("TestRouteHardStopAfterEnableOnIngressControllerHasPriorityOverIngressConfig", TestRouteHardStopAfterEnableOnIngressControllerHasPriorityOverIngressConfig) | ||
| t.Run("TestHostNetworkPortBinding", TestHostNetworkPortBinding) | ||
| t.Run("TestDashboardCreation", TestDashboardCreation) | ||
| // TestGatewayAPI creates a test ServiceMeshControlPlane (SMCP) resource, | ||
| // which triggers the creation of a mutating webhook for all pods in the cluster. | ||
| // This introduces a race condition where any pod creation request between | ||
| // the webhook's creation and the SMCP pod becoming ready can fail with: | ||
| // | ||
| // failed calling webhook "sidecar-injector.istio.io": failed to call webhook: | ||
| // no endpoints available for service "istiod-openshift-gateway" | ||
| // | ||
| // Serializing the test ensures it runs in isolation with other tests, | ||
| // preventing any impact of the mutating webhook on pod creation in the cluster | ||
| t.Run("TestGatewayAPI", TestGatewayAPI) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We discussed that this test should be ok to be run parallel.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. SMCP is going away immediately after bumping to 3.0. If we have this same issue in 3.0, what is the game plan?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
For the test? Make the test serial as I did in this PR.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I still believe we should investigate and resolve this sooner rather than later, but I was outvoted. Please don't forget to follow up.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. https://issues.redhat.com/browse/NE-1982 tracks the follow-up. @alebedev87, I assigned myself to follow up once the OSSM 3 bump merges. If you want to take it, feel free to do so. |
||
| }) | ||
| } | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -10,14 +10,20 @@ import ( | |
| "testing" | ||
|
|
||
| "github.com/openshift/api/features" | ||
| operatorclient "github.com/openshift/cluster-ingress-operator/pkg/operator/client" | ||
| operatorcontroller "github.com/openshift/cluster-ingress-operator/pkg/operator/controller" | ||
| "github.com/openshift/cluster-ingress-operator/pkg/operator/controller/gatewayclass" | ||
|
|
||
| corev1 "k8s.io/api/core/v1" | ||
| apiextensionsv1 "k8s.io/apiextensions-apiserver/pkg/apis/apiextensions/v1" | ||
| "k8s.io/apimachinery/pkg/api/errors" | ||
| metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" | ||
| "k8s.io/apimachinery/pkg/types" | ||
| "k8s.io/apiserver/pkg/storage/names" | ||
| "k8s.io/client-go/rest" | ||
|
|
||
| "sigs.k8s.io/controller-runtime/pkg/client" | ||
| "sigs.k8s.io/controller-runtime/pkg/client/config" | ||
| gatewayapiv1 "sigs.k8s.io/gateway-api/apis/v1" | ||
| ) | ||
|
|
||
|
|
@@ -30,6 +36,8 @@ const ( | |
| expectedCatalogSourceNamespace = "openshift-marketplace" | ||
| // The test gateway name used in multiple places. | ||
| testGatewayName = "test-gateway" | ||
| // gwapiCRDVAPName is the name of the ingress operator's Validating Admission Policy (VAP). | ||
| gwapiCRDVAPName = "openshift-ingress-operator-gatewayapi-crd-admission" | ||
| ) | ||
|
|
||
| var crdNames = []string{ | ||
|
|
@@ -51,8 +59,6 @@ var defaultRoutename = "" | |
| // feature gate is still in effect, preface the test names with "TestGatewayAPI" | ||
| // so that they run via the openshift/release test configuration. | ||
| func TestGatewayAPI(t *testing.T) { | ||
| t.Parallel() | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We discussed in slack that making this serial should not be necessary.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
|
|
||
| // Skip if feature is not enabled | ||
| if gatewayAPIEnabled, err := isFeatureGateEnabled(features.FeatureGateGatewayAPI); err != nil { | ||
| t.Fatalf("error checking feature gate enabled status: %v", err) | ||
|
|
@@ -84,6 +90,7 @@ func TestGatewayAPI(t *testing.T) { | |
| } else { | ||
| t.Log("Gateway API Controller not enabled, skipping testGatewayAPIObjects and testGatewayAPIIstioInstallation") | ||
| } | ||
| t.Run("testGatewayAPIResourcesProtection", testGatewayAPIResourcesProtection) | ||
| } | ||
|
|
||
| // testGatewayAPIResources tests that Gateway API Custom Resource Definitions are available. | ||
|
|
@@ -167,6 +174,93 @@ func testGatewayAPIObjects(t *testing.T) { | |
| } | ||
| } | ||
|
|
||
| // testGatewayAPIResourcesProtection verifies that the ingress operator's Validating Admission Policy | ||
| // denies admission requests attempting to modify Gateway API CRDs on behalf of a user | ||
| // who is not the ingress operator's service account. | ||
| func testGatewayAPIResourcesProtection(t *testing.T) { | ||
| t.Helper() | ||
|
|
||
| // Get kube client which impersonates ingress operator's service account. | ||
| kubeConfig, err := config.GetConfig() | ||
| if err != nil { | ||
| t.Fatalf("failed to get kube config: %v", err) | ||
| } | ||
| kubeConfig.Impersonate = rest.ImpersonationConfig{ | ||
| UserName: "system:serviceaccount:openshift-ingress-operator:ingress-operator", | ||
| } | ||
| kubeClient, err := operatorclient.NewClient(kubeConfig) | ||
| if err != nil { | ||
| t.Fatalf("failed to to create kube client: %v", err) | ||
| } | ||
|
|
||
| // Create test CRDs. | ||
| var testCRDs []*apiextensionsv1.CustomResourceDefinition | ||
| for _, name := range crdNames { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Need to update
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I gave this thing another thought. I think that it's better to abstain from using the actual "running" code in the tests. We should explicitly set the list of CRDs we are testing. Otherwise we risk to duplicate (and hide) errors we have in the code.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm going to hold the rest of my reviews until after #1152 is merged. Based on the number of changes dependent on it, I feel like I don't want to have to review this again. |
||
| testCRDs = append(testCRDs, buildGWAPICRDFromName(name)) | ||
| } | ||
|
|
||
| testCases := []struct { | ||
| name string | ||
| kclient client.Client | ||
| expectedErrMsg string | ||
| }{ | ||
| { | ||
| name: "Ingress operator service account required", | ||
| kclient: kclient, | ||
| expectedErrMsg: "Gateway API Custom Resource Definitions are managed by the Ingress Operator and may not be modified", | ||
| }, | ||
| { | ||
| name: "Pod binding required", | ||
| kclient: kubeClient, | ||
| expectedErrMsg: "this user must have both \"authentication.kubernetes.io/node-name\" and \"authentication.kubernetes.io/pod-name\" claims", | ||
| }, | ||
| } | ||
|
|
||
| for _, tc := range testCases { | ||
| t.Run(tc.name, func(t *testing.T) { | ||
| // Verify that GatewayAPI CRD creation is forbidden. | ||
| for i := range testCRDs { | ||
| if err := tc.kclient.Create(context.Background(), testCRDs[i]); err != nil { | ||
| if !strings.Contains(err.Error(), tc.expectedErrMsg) { | ||
| t.Errorf("unexpected error received while creating CRD %q: %v", testCRDs[i].Name, err) | ||
| } | ||
| } else { | ||
| t.Errorf("admission error is expected while creating CRD %q but not received", testCRDs[i].Name) | ||
| } | ||
| } | ||
|
|
||
| // Verify that GatewayAPI CRD update is forbidden. | ||
| for i := range testCRDs { | ||
| crdName := types.NamespacedName{Name: testCRDs[i].Name} | ||
| crd := &apiextensionsv1.CustomResourceDefinition{} | ||
| if err := tc.kclient.Get(context.Background(), crdName, crd); err != nil { | ||
| t.Errorf("failed to get %q CRD: %v", crdName.Name, err) | ||
| continue | ||
| } | ||
| crd.Spec = testCRDs[i].Spec | ||
| if err := tc.kclient.Update(context.Background(), crd); err != nil { | ||
| if !strings.Contains(err.Error(), tc.expectedErrMsg) { | ||
| t.Errorf("unexpected error received while updating CRD %q: %v", testCRDs[i].Name, err) | ||
| } | ||
| } else { | ||
| t.Errorf("admission error is expected while updating CRD %q but not received", testCRDs[i].Name) | ||
| } | ||
| } | ||
|
|
||
| // Verify that GatewayAPI CRD deletion is forbidden. | ||
| for i := range testCRDs { | ||
| if err := tc.kclient.Delete(context.Background(), testCRDs[i]); err != nil { | ||
| if !strings.Contains(err.Error(), tc.expectedErrMsg) { | ||
| t.Errorf("unexpected error received while deleting CRD %q: %v", testCRDs[i].Name, err) | ||
| } | ||
| } else { | ||
| t.Errorf("admission error is expected while deleting CRD %q but not received", testCRDs[i].Name) | ||
| } | ||
| } | ||
| }) | ||
| } | ||
| } | ||
|
|
||
| // ensureCRDs tests that the Gateway API custom resource definitions exist. | ||
| func ensureCRDs(t *testing.T) { | ||
| t.Helper() | ||
|
|
@@ -182,6 +276,19 @@ func ensureCRDs(t *testing.T) { | |
| // deleteCRDs deletes Gateway API custom resource definitions. | ||
| func deleteCRDs(t *testing.T) { | ||
| t.Helper() | ||
|
|
||
| vm := newVAPManager(t, gwapiCRDVAPName) | ||
| // Remove the ingress operator's Validating Admission Policy (VAP) | ||
| // which prevents modifications of Gateway API CRDs | ||
| // by anything other than the ingress operator. | ||
| if err, recoverFn := vm.disable(); err != nil { | ||
| defer recoverFn() | ||
| t.Fatalf("failed to disable vap: %v", err) | ||
| } | ||
| // Put back the VAP to ensure that it does not prevent | ||
| // the ingress operator from managing Gateway API CRDs. | ||
| defer vm.enable() | ||
|
|
||
| for _, crdName := range crdNames { | ||
| err := deleteExistingCRD(t, crdName) | ||
| if err != nil { | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's investigate more about why this is needed since we decided it should be ok to run the tests in parallel.