Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ release-local:

.PHONY: test-e2e
test-e2e: generate
CGO_ENABLED=1 $(GO) test -timeout 1h -count 1 -v -tags e2e -run "$(TEST)" ./test/e2e
CGO_ENABLED=1 $(GO) test -timeout 1.5h -count 1 -v -tags e2e -run "$(TEST)" ./test/e2e
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's investigate more about why this is needed since we decided it should be ok to run the tests in parallel.


.PHONY: test-e2e-list
test-e2e-list: generate
Expand Down
13 changes: 13 additions & 0 deletions manifests/01-validating-admission-policy-binding.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingAdmissionPolicyBinding
metadata:
name: openshift-ingress-operator-gatewayapi-crd-admission
annotations:
capability.openshift.io/name: Ingress
include.release.openshift.io/ibm-cloud-managed: "true"
include.release.openshift.io/self-managed-high-availability: "true"
include.release.openshift.io/single-node-developer: "true"
release.openshift.io/feature-set: "TechPreviewNoUpgrade,CustomNoUpgrade"
Copy link
Contributor

@candita candita Mar 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't use the feature-set annotation in other manifest yaml. Is it temporary and will be removed before GA?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it will be included in GA.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, wait, I thought we could associate the manifest with a specific featuregate, which is what we really want to do.

Copy link
Contributor

@Miciah Miciah Mar 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After reviewing https://github.com/openshift/enhancements/blob/master/enhancements/update/cvo-techpreview-manifests.md (and looking through the CVO code), I see that there isn't an option to associate a manifest with a featuregate rather than a featureset.

It looks like the purpose of this annotation was to make it easier to introduce an entire operator as tech preview. Our use of it, to introduce a specific resource for a tech-preview feature implemented in a GA operator, isn't exactly what the annotation was intended for.

That said, it seems like a valid use. However, it does mean that we will need to remove the annotation at the same time that we graduate the featuregate to GA. (This seems to me like a major design flaw of the feature-set annotation because it means that the feature cannot be atomically promoted and cannot be completely turned off once it goes into the default featureset, but ¯\_(ツ)_/¯.)

Copy link
Contributor Author

@alebedev87 alebedev87 Mar 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, that's a flaw (in CVO design) which prevents the atomic promotion and even worse it can block the promotion. If there are some tests for the feature gate in origin repository. @JoelSpeed started a thread for this with CVO people.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As Andrey points out, this is on our todo, but for now there's a slightly awkward dance to promote the manifest to GA separately from the gate being promoted

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a comment regarding the workaround would be nice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a comment regarding the workaround would be nice.

Can you explain which workaround you mean? For the moment, CVO does not have a way to set a specific feature gate for a manifest.

spec:
policyName: openshift-ingress-operator-gatewayapi-crd-admission
validationActions: [Deny]
36 changes: 36 additions & 0 deletions manifests/01-validating-admission-policy.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingAdmissionPolicy
metadata:
name: openshift-ingress-operator-gatewayapi-crd-admission
annotations:
capability.openshift.io/name: Ingress
include.release.openshift.io/ibm-cloud-managed: "true"
include.release.openshift.io/self-managed-high-availability: "true"
include.release.openshift.io/single-node-developer: "true"
release.openshift.io/feature-set: "TechPreviewNoUpgrade,CustomNoUpgrade"
spec:
matchConstraints:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The godocs on matchConstraints are not very good.

// The MutatingAdmissionPolicy cares about a request if it matches all Constraint.
// However, in order to prevent clusters from being put into an unstable state that cannot be recovered from via the API
// ValidatingAdmissionPolicy cannot match ValidatingAdmissionPolicy and ValidatingAdmissionPolicyBinding.

I looked into the Kubernetes reference and found in https://kubernetes.io/docs/reference/access-authn-authz/validating-admission-policy/#type-checking, this:

Type Checking does not apply to CRDs, including matched CRD types and reference of paramKind. The support for CRDs will come in future release.

Does it apply to us? No issues with adding this VAP?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that "matched CRD types" refers to the type that is defined by the CRD, not to the CRD type itself. That is, I expect that the VAP can block updates to, say, the gatewayclasses CRD, but it cannot at this time block updates to an arbitrary gatewayclass CR.

Copy link
Contributor Author

@alebedev87 alebedev87 Mar 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it apply to us? No issues with adding this VAP?

The type checking will be missing for this VAP. Meaning that when the VAP will be updated and if it will contain any syntax mistakes in validation expressions (or variables) the update of VAP will not be blocked and the syntax error will not be reported in the VAP's status. I don't think it's a blocking gap because we will validate our VAP before it will hit any cluster (local testing, integration testing, QE testing).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The godocs on matchConstraints are not very good.

// The MutatingAdmissionPolicy cares about a request if it matches all Constraint.
// However, in order to prevent clusters from being put into an unstable state that cannot be recovered from via the API
// ValidatingAdmissionPolicy cannot match ValidatingAdmissionPolicy and ValidatingAdmissionPolicyBinding.

You mean the part about MutatingAdmissionPolicy? Indeed, it's unclear but should not impact us in anyway as we are adding VAP.
The otherpart about the ValidatingAdmissionPolicy is a known constraint which does not impact us either because we don't need to match VAPs or VAP bindings.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does that other part even mean? I can't understand the grammar.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// ValidatingAdmissionPolicy cannot match ValidatingAdmissionPolicy and ValidatingAdmissionPolicyBinding.

This part means that it's not possible to create ValidatingAdmissionPolicy which validates the admission of another ValidatingAdmissionPolicy or ValidatingAdmissionPolicyBinding.

# Consider only requests to CRD resources.
resourceRules:
- apiGroups: ["apiextensions.k8s.io"]
apiVersions: ["v1"]
operations: ["CREATE","UPDATE","DELETE"]
resources: ["customresourcedefinitions"]
matchConditions:
# Consider only request to Gateway API CRDs.
- name: "check-only-gateway-api-crds"
# When the operation is DELETE, the "object" variable is null.
expression: "(request.operation == 'DELETE' ? oldObject : object).spec.group == 'gateway.networking.k8s.io'"
# Validations are evaluated in the the order of their declaration.
validations:
# Verify that the request was sent by the ingress operator's service account.
- expression: "has(request.userInfo.username) && (request.userInfo.username == 'system:serviceaccount:openshift-ingress-operator:ingress-operator')"
message: "Gateway API Custom Resource Definitions are managed by the Ingress Operator and may not be modified"
reason: Forbidden
# Verify that the request was sent from a pod. The presence of both "node" and "pod" claims implies that the token is bound to a pod object.
# Ref: https://kubernetes.io/docs/reference/access-authn-authz/service-accounts-admin/#schema-for-service-account-private-claims.
- expression: "has(request.userInfo.extra) && ('authentication.kubernetes.io/node-name' in request.userInfo.extra) && ('authentication.kubernetes.io/pod-name' in request.userInfo.extra)"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would the node claim alone be enough?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect so, but I think this is extra safe and I don't see a need to remove the pod name check

message: "this user must have both \"authentication.kubernetes.io/node-name\" and \"authentication.kubernetes.io/pod-name\" claims"
reason: Forbidden
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
reason: Forbidden
reason: Unauthorized

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's debatable. I followed the logic of the corresponding HTTP codes where 401 Unauthorized means more I don’t recognize you while 403 Forbidden means more I know who you are, but you're not allowed here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is debatable but probably not worth it.

# Fail the admission if any validation evaluates to false.
failurePolicy: Fail
12 changes: 11 additions & 1 deletion test/e2e/all_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,6 @@ func TestAll(t *testing.T) {
t.Run("TestSetRouteResponseHeaders", TestSetRouteResponseHeaders)
t.Run("TestReconcileInternalService", TestReconcileInternalService)
t.Run("TestConnectTimeout", TestConnectTimeout)
t.Run("TestGatewayAPI", TestGatewayAPI)
t.Run("TestAWSLBSubnets", TestAWSLBSubnets)
t.Run("TestUnmanagedAWSLBSubnets", TestUnmanagedAWSLBSubnets)
t.Run("TestAWSEIPAllocationsForNLB", TestAWSEIPAllocationsForNLB)
Expand Down Expand Up @@ -126,5 +125,16 @@ func TestAll(t *testing.T) {
t.Run("TestRouteHardStopAfterEnableOnIngressControllerHasPriorityOverIngressConfig", TestRouteHardStopAfterEnableOnIngressControllerHasPriorityOverIngressConfig)
t.Run("TestHostNetworkPortBinding", TestHostNetworkPortBinding)
t.Run("TestDashboardCreation", TestDashboardCreation)
// TestGatewayAPI creates a test ServiceMeshControlPlane (SMCP) resource,
// which triggers the creation of a mutating webhook for all pods in the cluster.
// This introduces a race condition where any pod creation request between
// the webhook's creation and the SMCP pod becoming ready can fail with:
//
// failed calling webhook "sidecar-injector.istio.io": failed to call webhook:
// no endpoints available for service "istiod-openshift-gateway"
//
// Serializing the test ensures it runs in isolation with other tests,
// preventing any impact of the mutating webhook on pod creation in the cluster
t.Run("TestGatewayAPI", TestGatewayAPI)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed that this test should be ok to be run parallel.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SMCP is going away immediately after bumping to 3.0. If we have this same issue in 3.0, what is the game plan?

Copy link
Contributor Author

@alebedev87 alebedev87 Mar 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have this same issue in 3.0, what is the game plan?

For the test? Make the test serial as I did in this PR.
Overall for the feature? We should find (or implement if it doesn't exist) a knob to disable the creation of any webhook configuration related to the sidecar injection. Once this is done we can move the test back to Parallel.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still believe we should investigate and resolve this sooner rather than later, but I was outvoted. Please don't forget to follow up.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://issues.redhat.com/browse/NE-1982 tracks the follow-up. @alebedev87, I assigned myself to follow up once the OSSM 3 bump merges. If you want to take it, feel free to do so.

})
}
111 changes: 109 additions & 2 deletions test/e2e/gateway_api_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -10,14 +10,20 @@ import (
"testing"

"github.com/openshift/api/features"
operatorclient "github.com/openshift/cluster-ingress-operator/pkg/operator/client"
operatorcontroller "github.com/openshift/cluster-ingress-operator/pkg/operator/controller"
"github.com/openshift/cluster-ingress-operator/pkg/operator/controller/gatewayclass"

corev1 "k8s.io/api/core/v1"
apiextensionsv1 "k8s.io/apiextensions-apiserver/pkg/apis/apiextensions/v1"
"k8s.io/apimachinery/pkg/api/errors"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/types"
"k8s.io/apiserver/pkg/storage/names"
"k8s.io/client-go/rest"

"sigs.k8s.io/controller-runtime/pkg/client"
"sigs.k8s.io/controller-runtime/pkg/client/config"
gatewayapiv1 "sigs.k8s.io/gateway-api/apis/v1"
)

Expand All @@ -30,6 +36,8 @@ const (
expectedCatalogSourceNamespace = "openshift-marketplace"
// The test gateway name used in multiple places.
testGatewayName = "test-gateway"
// gwapiCRDVAPName is the name of the ingress operator's Validating Admission Policy (VAP).
gwapiCRDVAPName = "openshift-ingress-operator-gatewayapi-crd-admission"
)

var crdNames = []string{
Expand All @@ -51,8 +59,6 @@ var defaultRoutename = ""
// feature gate is still in effect, preface the test names with "TestGatewayAPI"
// so that they run via the openshift/release test configuration.
func TestGatewayAPI(t *testing.T) {
t.Parallel()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed in slack that making this serial should not be necessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


// Skip if feature is not enabled
if gatewayAPIEnabled, err := isFeatureGateEnabled(features.FeatureGateGatewayAPI); err != nil {
t.Fatalf("error checking feature gate enabled status: %v", err)
Expand Down Expand Up @@ -84,6 +90,7 @@ func TestGatewayAPI(t *testing.T) {
} else {
t.Log("Gateway API Controller not enabled, skipping testGatewayAPIObjects and testGatewayAPIIstioInstallation")
}
t.Run("testGatewayAPIResourcesProtection", testGatewayAPIResourcesProtection)
}

// testGatewayAPIResources tests that Gateway API Custom Resource Definitions are available.
Expand Down Expand Up @@ -167,6 +174,93 @@ func testGatewayAPIObjects(t *testing.T) {
}
}

// testGatewayAPIResourcesProtection verifies that the ingress operator's Validating Admission Policy
// denies admission requests attempting to modify Gateway API CRDs on behalf of a user
// who is not the ingress operator's service account.
func testGatewayAPIResourcesProtection(t *testing.T) {
t.Helper()

// Get kube client which impersonates ingress operator's service account.
kubeConfig, err := config.GetConfig()
if err != nil {
t.Fatalf("failed to get kube config: %v", err)
}
kubeConfig.Impersonate = rest.ImpersonationConfig{
UserName: "system:serviceaccount:openshift-ingress-operator:ingress-operator",
}
kubeClient, err := operatorclient.NewClient(kubeConfig)
if err != nil {
t.Fatalf("failed to to create kube client: %v", err)
}

// Create test CRDs.
var testCRDs []*apiextensionsv1.CustomResourceDefinition
for _, name := range crdNames {
Copy link
Contributor

@candita candita Mar 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to update crdNames with GRPCRoute. It would be beneficial to put the crdNames declaration in a shared location, or at least in util_gatewayapi_test. cc @Thealisyed @grzpiotrowski

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GRPCRoute will be added as part of #1152. It's not yet mentioned anywhere. However, I can do a helper function in pkg/manifests/assets/gatewayapi/ package which would return all "desired" CRDs. This way we can use a single function in all controllers and tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, I can do a helper function in pkg/manifests/assets/gatewayapi/ package which would return all "desired" CRDs. This way we can use a single function in all controllers and tests.

I gave this thing another thought. I think that it's better to abstain from using the actual "running" code in the tests. We should explicitly set the list of CRDs we are testing. Otherwise we risk to duplicate (and hide) errors we have in the code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to hold the rest of my reviews until after #1152 is merged. Based on the number of changes dependent on it, I feel like I don't want to have to review this again.

testCRDs = append(testCRDs, buildGWAPICRDFromName(name))
}

testCases := []struct {
name string
kclient client.Client
expectedErrMsg string
}{
{
name: "Ingress operator service account required",
kclient: kclient,
expectedErrMsg: "Gateway API Custom Resource Definitions are managed by the Ingress Operator and may not be modified",
},
{
name: "Pod binding required",
kclient: kubeClient,
expectedErrMsg: "this user must have both \"authentication.kubernetes.io/node-name\" and \"authentication.kubernetes.io/pod-name\" claims",
},
}

for _, tc := range testCases {
t.Run(tc.name, func(t *testing.T) {
// Verify that GatewayAPI CRD creation is forbidden.
for i := range testCRDs {
if err := tc.kclient.Create(context.Background(), testCRDs[i]); err != nil {
if !strings.Contains(err.Error(), tc.expectedErrMsg) {
t.Errorf("unexpected error received while creating CRD %q: %v", testCRDs[i].Name, err)
}
} else {
t.Errorf("admission error is expected while creating CRD %q but not received", testCRDs[i].Name)
}
}

// Verify that GatewayAPI CRD update is forbidden.
for i := range testCRDs {
crdName := types.NamespacedName{Name: testCRDs[i].Name}
crd := &apiextensionsv1.CustomResourceDefinition{}
if err := tc.kclient.Get(context.Background(), crdName, crd); err != nil {
t.Errorf("failed to get %q CRD: %v", crdName.Name, err)
continue
}
crd.Spec = testCRDs[i].Spec
if err := tc.kclient.Update(context.Background(), crd); err != nil {
if !strings.Contains(err.Error(), tc.expectedErrMsg) {
t.Errorf("unexpected error received while updating CRD %q: %v", testCRDs[i].Name, err)
}
} else {
t.Errorf("admission error is expected while updating CRD %q but not received", testCRDs[i].Name)
}
}

// Verify that GatewayAPI CRD deletion is forbidden.
for i := range testCRDs {
if err := tc.kclient.Delete(context.Background(), testCRDs[i]); err != nil {
if !strings.Contains(err.Error(), tc.expectedErrMsg) {
t.Errorf("unexpected error received while deleting CRD %q: %v", testCRDs[i].Name, err)
}
} else {
t.Errorf("admission error is expected while deleting CRD %q but not received", testCRDs[i].Name)
}
}
})
}
}

// ensureCRDs tests that the Gateway API custom resource definitions exist.
func ensureCRDs(t *testing.T) {
t.Helper()
Expand All @@ -182,6 +276,19 @@ func ensureCRDs(t *testing.T) {
// deleteCRDs deletes Gateway API custom resource definitions.
func deleteCRDs(t *testing.T) {
t.Helper()

vm := newVAPManager(t, gwapiCRDVAPName)
// Remove the ingress operator's Validating Admission Policy (VAP)
// which prevents modifications of Gateway API CRDs
// by anything other than the ingress operator.
if err, recoverFn := vm.disable(); err != nil {
defer recoverFn()
t.Fatalf("failed to disable vap: %v", err)
}
// Put back the VAP to ensure that it does not prevent
// the ingress operator from managing Gateway API CRDs.
defer vm.enable()

for _, crdName := range crdNames {
err := deleteExistingCRD(t, crdName)
if err != nil {
Expand Down
Loading