added github action to run the e2e tests in a kind cluster#3250
added github action to run the e2e tests in a kind cluster#3250CFSNM wants to merge 1 commit intoopendatahub-io:mainfrom
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a new GitHub Actions workflow at .github/workflows/test-kind-odh-e2e.yaml to run KinD-based end-to-end tests on PRs to main and via manual dispatch. Updates tests under tests/e2e: adds two boolean fields to TestContextConfig (dependantOperatorsManagementTest, dscManagementTest), binds new command-line flags/env vars, conditionally runs two test suites (including a new dependantOperatorsManagementTestSuite that initializes test context and sequences DSC-related test cases). Adjusts test helper error propagation in pkg/utils/test/testf and adds NoMatch handling in tests/e2e/test_context_test.go. Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 2✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. 📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/workflows/test-kind-odh-e2e.yaml:
- Around line 42-44: Keep the existing "Verify ODH Deployment" kubectl wait
precondition (the kubectl wait --for=condition=Available ... line) but add a
subsequent step that runs the repository's actual end-to-end test entrypoint so
the job exercises E2E flows rather than just a smoke check; update the workflow
to add a new run step after the Verify ODH Deployment block that invokes the
repo E2E entrypoint (replace with the correct command for your project, e.g.,
the Make target or script that triggers the full E2E suite).
- Around line 30-40: The workflow builds the operator image with make
image-build but never makes that local image available to the KinD cluster
before make deploy rewrites manifests to $(IMG); add a step after the Build
Operator Image step to either rebuild with buildx --load (adjust make
image-build to pass --load) or, more straightforwardly, run kind load
docker-image ${{ env.IMG }} (or kind load docker-image "$(IMG)" for the same
named cluster) so the locally-built image is present in the KinD nodes before
running make deploy; update the CI steps to call the chosen approach after
image-build and before make deploy so manifests referencing $(IMG) use the
just-built image.
- Around line 10-18: The job kind-odh-e2e-tests runs PR code with full
GITHUB_TOKEN permissions; add a job-level permissions block under
kind-odh-e2e-tests to restrict the token to least privilege (for example set
permissions: contents: read and any other minimal scopes your workflow actually
needs) so the job no longer inherits broad repository defaults; update the
.github/workflows/test-kind-odh-e2e.yaml job definition to include this
permissions stanza for the kind-odh-e2e-tests job.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Central YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: adfe5100-83e7-4137-b16d-89a982942082
📒 Files selected for processing (1)
.github/workflows/test-kind-odh-e2e.yaml
c89e4e2 to
4e9a966
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
tests/e2e/controller_test.go (1)
429-432: Renamedependanttodependentbefore this flag ships.These lines add a new user-facing CLI/env contract with a spelling mistake. If this lands as-is, fixing it later means carrying a permanent compatibility alias across flags, env vars, and code paths. Prefer
test-dependent-operators-management/E2E_TEST_DEPENDENT_OPERATORS_MANAGEMENTand rename the matching symbols in this PR.Also applies to: 496-497
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/e2e/controller_test.go` around lines 429 - 432, The CLI flag and env binding use the misspelled "dependant" in pflag.Bool("test-dependant-operators-management", ...) and viper.BindEnv(...+"_DEPENDANT_OPERATORS_MANAGEMENT"); change both to the correct spelling "dependent" (i.e. pflag.Bool("test-dependent-operators-management", ...) and viper.BindEnv(...+"_DEPENDENT_OPERATORS_MANAGEMENT")) and update the matching symbol names elsewhere in this PR (the other occurrences of the same flag/env binding) so the flag, env var, and any code paths consistently use "dependent" rather than "dependant".
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/workflows/test-kind-odh-e2e.yaml:
- Around line 49-51: The KinD E2E step currently runs "make e2e-test" with no
explicit DSC management flag, so tests/e2e/controller_test.go defaults tag to
All and causes dscManagementTestSuite (Tier3-only) to run; update the Run E2E
Tests step where the command invokes "make e2e-test" and add the environment
variable -e E2E_TEST_DSC_MANAGEMENT=false (or set a narrower tag) so the
dscManagementTestSuite is excluded—look for the "Run E2E Tests" block and the
make e2e-test invocation to apply the change.
- Around line 28-31: The Create KinD Cluster step uses
helm/kind-action@ef37e7f... but doesn't pin the KinD node image; update that
step to include the node_image input and set it to a specific node image digest
(sha256) to fix supply-chain drift and ensure reproducible CI — locate the
"Create KinD Cluster" step and add the node_image field with the full
image@sha256:... value.
---
Nitpick comments:
In `@tests/e2e/controller_test.go`:
- Around line 429-432: The CLI flag and env binding use the misspelled
"dependant" in pflag.Bool("test-dependant-operators-management", ...) and
viper.BindEnv(...+"_DEPENDANT_OPERATORS_MANAGEMENT"); change both to the correct
spelling "dependent" (i.e. pflag.Bool("test-dependent-operators-management",
...) and viper.BindEnv(...+"_DEPENDENT_OPERATORS_MANAGEMENT")) and update the
matching symbol names elsewhere in this PR (the other occurrences of the same
flag/env binding) so the flag, env var, and any code paths consistently use
"dependent" rather than "dependant".
ℹ️ Review info
⚙️ Run configuration
Configuration used: Central YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: 7ed27dc0-7342-4be6-a67a-22b1b078992b
📒 Files selected for processing (3)
.github/workflows/test-kind-odh-e2e.yamltests/e2e/controller_test.gotests/e2e/creation_test.go
82ae74a to
14fd4b8
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (5)
tests/e2e/controller_test.go (1)
86-87: Minor: Consider consistent spelling "dependent" vs "dependant"."Dependant" (British) vs "dependent" (American) spelling. Both are valid, but the codebase should be consistent. Most Go projects use American English conventions.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/e2e/controller_test.go` around lines 86 - 87, Rename the variable dependantOperatorsManagementTest to use American spelling dependentOperatorsManagementTest across the test file and any places that reference it (e.g., flag declarations, struct fields, helper functions, and test checks) to keep naming consistent; update all usages of dependantOperatorsManagementTest in tests/e2e/controller_test.go and related test files so the identifier matches the new name and run `go test` to ensure no references remain.tests/e2e/creation_test.go (1)
34-58: Consider extracting common test suite setup to reduce duplication.Both
dependantOperatorsManagementTestSuiteanddscManagementTestSuiteshare identical boilerplate:
skipUnless(t, Tier3)NewTestContext(t)with error handlingDSCTestCtxcreationThis pattern is repeated 4+ times across the file (also in
dscValidationTestSuite,dscWebhookTestSuite).♻️ Helper function to reduce duplication
func newDSCTestCtx(t *testing.T) *DSCTestCtx { t.Helper() tc, err := NewTestContext(t) require.NoError(t, err, "Failed to initialize test context") return &DSCTestCtx{TestContext: tc} }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/e2e/creation_test.go` around lines 34 - 58, Multiple test suites (dependantOperatorsManagementTestSuite, dscManagementTestSuite, dscValidationTestSuite, dscWebhookTestSuite) duplicate the same setup (skipUnless(t, Tier3); NewTestContext(t) error handling; DSCTestCtx creation); extract that into a helper like newDSCTestCtx(t *testing.T) that calls t.Helper(), performs skipUnless(t, Tier3) if needed (or keep skip in each suite), calls NewTestContext(t) with require.NoError, and returns DSCTestCtx (or *DSCTestCtx) so each suite replaces the duplicated block with a single call to newDSCTestCtx to reduce repetition and centralize initialization.pkg/utils/test/testf/testf_witht.go (1)
128-131: Removing StopErr from List() changes retry behavior.Previously, any error from
client.List()would wrap withStopErr, causingEventuallyto fail immediately. Now all errors trigger retries until timeout.For genuine failures (e.g., RBAC issues, API server unavailable), tests will now wait for the full Eventually timeout instead of failing fast. This could significantly increase CI feedback time on infrastructure failures.
Consider wrapping only transient/retryable errors and using
StopErrfor permanent failures likemeta.IsNoMatchErrorork8serr.IsForbidden.♻️ Suggested approach
err := t.Client().List(ctx, &items, option...) if err != nil { - return nil, err + // Stop immediately for permanent errors; retry for transient ones + if k8serr.IsForbidden(err) || k8serr.IsUnauthorized(err) { + return nil, StopErr(err, "permanent error listing resources: %s", gvk) + } + return nil, err // Retry on transient errors }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/utils/test/testf/testf_witht.go` around lines 128 - 131, The change removed StopErr wrapping from the t.Client().List call which causes Eventually to retry on all errors; instead detect permanent errors (e.g. meta.IsNoMatchError, k8serr.IsForbidden, API server unreachable) and wrap only those in StopErr so Eventually fails fast. Update the error handling around t.Client().List(ctx, &items, option...) in testf_witht.go to classify the returned err (use helpers or inline checks for meta.IsNoMatchError and k8serr.IsForbidden and any other non-transient conditions your infra considers permanent) and return StopErr(err) for those; return the raw err (so it can be retried) for transient/network errors.tests/e2e/test_context_test.go (2)
621-624: NoMatchError handling silently returns nil, losing error context.Returning
(nil, nil)forNoMatchErroris semantically identical to "resource not found" (subU == nil). Callers likeisOperatorHealthy(line 933) cannot distinguish between:
- OLM Subscription CRD missing (configuration/environment issue)
- Subscription resource simply doesn't exist yet
Consider returning a sentinel error or logging when NoMatchError occurs so environment misconfigurations are detectable during debugging.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/e2e/test_context_test.go` around lines 621 - 624, The current meta.IsNoMatchError(err) branch swallows the error by returning (nil, nil); change it to preserve context so callers like isOperatorHealthy can distinguish missing CRD vs absent resource — either return a sentinel error (e.g., ErrCRDMissing) or wrap and return the original error (e.g., fmt.Errorf("CRD missing: %w", err)), and/or emit a log entry before returning; update callers/tests to check for the sentinel/wrapped error instead of treating nil as "not found".
1311-1314: Consistent with FetchActualSubscription, but same caveat applies.Same concern as the Subscription variant above. Environment misconfiguration (missing OLM CRDs) will appear as "CSV not found" rather than a distinct error, potentially causing confusion in CI debugging.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/e2e/test_context_test.go` around lines 1311 - 1314, The current error handling swallows meta.IsNoMatchError by returning nil,nil which hides environment misconfiguration (missing OLM CRDs); change the branch in the CSV/CSV-fetching code that checks meta.IsNoMatchError(err) so it does not return a silent nil but instead returns a distinct, descriptive error (e.g. wrap the original err with a message like "OLM CRDs not installed or API resource missing") so callers can distinguish "CSV not found" from a cluster CRD discovery problem; update the branch that currently uses meta.IsNoMatchError(err) to return a wrapped error (preserve the original err) rather than nil,nil, referencing the meta.IsNoMatchError check in this file.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@pkg/utils/test/testf/testf_witht.go`:
- Around line 128-131: The change removed StopErr wrapping from the
t.Client().List call which causes Eventually to retry on all errors; instead
detect permanent errors (e.g. meta.IsNoMatchError, k8serr.IsForbidden, API
server unreachable) and wrap only those in StopErr so Eventually fails fast.
Update the error handling around t.Client().List(ctx, &items, option...) in
testf_witht.go to classify the returned err (use helpers or inline checks for
meta.IsNoMatchError and k8serr.IsForbidden and any other non-transient
conditions your infra considers permanent) and return StopErr(err) for those;
return the raw err (so it can be retried) for transient/network errors.
In `@tests/e2e/controller_test.go`:
- Around line 86-87: Rename the variable dependantOperatorsManagementTest to use
American spelling dependentOperatorsManagementTest across the test file and any
places that reference it (e.g., flag declarations, struct fields, helper
functions, and test checks) to keep naming consistent; update all usages of
dependantOperatorsManagementTest in tests/e2e/controller_test.go and related
test files so the identifier matches the new name and run `go test` to ensure no
references remain.
In `@tests/e2e/creation_test.go`:
- Around line 34-58: Multiple test suites
(dependantOperatorsManagementTestSuite, dscManagementTestSuite,
dscValidationTestSuite, dscWebhookTestSuite) duplicate the same setup
(skipUnless(t, Tier3); NewTestContext(t) error handling; DSCTestCtx creation);
extract that into a helper like newDSCTestCtx(t *testing.T) that calls
t.Helper(), performs skipUnless(t, Tier3) if needed (or keep skip in each
suite), calls NewTestContext(t) with require.NoError, and returns DSCTestCtx (or
*DSCTestCtx) so each suite replaces the duplicated block with a single call to
newDSCTestCtx to reduce repetition and centralize initialization.
In `@tests/e2e/test_context_test.go`:
- Around line 621-624: The current meta.IsNoMatchError(err) branch swallows the
error by returning (nil, nil); change it to preserve context so callers like
isOperatorHealthy can distinguish missing CRD vs absent resource — either return
a sentinel error (e.g., ErrCRDMissing) or wrap and return the original error
(e.g., fmt.Errorf("CRD missing: %w", err)), and/or emit a log entry before
returning; update callers/tests to check for the sentinel/wrapped error instead
of treating nil as "not found".
- Around line 1311-1314: The current error handling swallows meta.IsNoMatchError
by returning nil,nil which hides environment misconfiguration (missing OLM
CRDs); change the branch in the CSV/CSV-fetching code that checks
meta.IsNoMatchError(err) so it does not return a silent nil but instead returns
a distinct, descriptive error (e.g. wrap the original err with a message like
"OLM CRDs not installed or API resource missing") so callers can distinguish
"CSV not found" from a cluster CRD discovery problem; update the branch that
currently uses meta.IsNoMatchError(err) to return a wrapped error (preserve the
original err) rather than nil,nil, referencing the meta.IsNoMatchError check in
this file.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Central YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: 7510ee22-ebbf-4ae4-a4bc-55c7cb840fc3
📒 Files selected for processing (5)
.github/workflows/test-kind-odh-e2e.yamlpkg/utils/test/testf/testf_witht.gotests/e2e/controller_test.gotests/e2e/creation_test.gotests/e2e/test_context_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
- .github/workflows/test-kind-odh-e2e.yaml
291907e to
dd90c18
Compare
|
will this ever work? kind is xKS not ocp |
|
also the node provisioned by github has enough resource to run operator e2e? |
|
@zdtsw this is a new feature Platform team is working on to support ODH installation in non-OCP clusters. This is still a PR in development to support running the e2e tests in a kinD cluster |
8c97ea2 to
0c7eb27
Compare
0c7eb27 to
9730540
Compare
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
0a728ac to
ffc64b2
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3250 +/- ##
=======================================
Coverage 50.76% 50.76%
=======================================
Files 192 192
Lines 14027 14027
=======================================
Hits 7121 7121
Misses 6162 6162
Partials 744 744 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
valdar
left a comment
There was a problem hiding this comment.
I think that running the whole e2e test suit against a kind cluster won't work, a new way to deploy the operator in "xks mode" along with the cloud controller manager is needed ?
|
@CFSNM can you rebase on latest main? |
ffc64b2 to
0b81de7
Compare
|
@valdar the idea is just running the kserve tests in kinD, and we may need a new way to deploy it in KinD as well |
0b81de7 to
874f26f
Compare
874f26f to
eeac1e4
Compare
|
@CFSNM: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Description
How Has This Been Tested?
Screenshot or short clip
Merge criteria
E2E test suite update requirement
When bringing new changes to the operator code, such changes are by default required to be accompanied by extending and/or updating the E2E test suite accordingly.
To opt-out of this requirement:
E2E update requirement opt-out justificationsection belowE2E update requirement opt-out justification
Summary by CodeRabbit
Chores
Tests
Bug Fixes