feat: add support for WVA#3243
Conversation
|
Skipping CI for Draft Pull Request. |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds workload-variant-autoscaler (WVA) support across the operator: introduces WVASpec and Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Security and Code Quality Issues
🚥 Pre-merge checks | ✅ 2✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. 📝 Coding Plan
Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
internal/controller/components/modelcontroller/modelcontroller_actions.go (1)
63-65:⚠️ Potential issue | 🟡 MinorWVA manifests appended at line 48 bypass parameter substitution.
ApplyParamsprocesses onlyrr.Manifests[0](the base modelcontroller manifest). WVA manifests appended at line 48 becomerr.Manifests[1]and are never passed toApplyParams. While WVA manifests currently don't includeparams.env, this creates an architectural gap: if future versions require parameter substitution, they'll silently skip processing since the extraParamsMap is only applied to index 0.Apply
ApplyParamsto all manifest entries in the loop, not just the first one.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/controller/components/modelcontroller/modelcontroller_actions.go` around lines 63 - 65, The code only calls odhdeploy.ApplyParams on rr.Manifests[0], so any additional manifests in rr.Manifests (e.g., WVA entries) never get parameter substitution; change the logic to iterate over rr.Manifests and call odhdeploy.ApplyParams for each manifest string, passing "params.env" and extraParamsMap for every entry (use the same signature currently used for rr.Manifests[0]); ensure errors from ApplyParams on any manifest are handled/returned the same way as the current single-call case so failures on later manifests surface correctly.
🧹 Nitpick comments (2)
internal/controller/components/kserve/kserve_controller_actions_test.go (1)
313-354: Add a degraded-operator test for the new WVA path.These cases only cover CMA Subscription presence/absence. The new WVA behavior also depends on the operator/controller health path, so a present-but-degraded CMA/KEDA controller branch should be covered here as well.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/controller/components/kserve/kserve_controller_actions_test.go` around lines 313 - 354, Add a new subtest that covers the "present-but-degraded" operator path for WVA: create kserveWithWVA and include cmaOperatorSubscription plus a corresponding ClusterServiceVersion (or other operator health object) whose status indicates degraded (e.g., CSV.Status.Phase = "Failed" or a condition showing not available), then call checkPreConditions with that ReconciliationRequest and assert rr.Conditions.GetCondition(LLMWorkVariantAutoscallingDepedencies) yields the degraded/false result; use the same setup style as the existing tests (createKserveCR, fakeclient.New, conditions.NewManager) and reference cmaOperatorSubscription, checkPreConditions, and LLMWorkVariantAutoscallingDepedencies to locate where to add the test.internal/controller/components/kserve/kserve_controller_actions.go (1)
195-195: Typos in constant name: "Autoscalling" → "Autoscaling", "Depedencies" → "Dependencies".The constant
LLMWorkVariantAutoscallingDepedencies(defined inkserve.go:42) contains two spelling errors. Consider correcting toLLMWorkVariantAutoscalingDependenciesfor consistency withLLMInferenceServiceDependencies.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/controller/components/kserve/kserve_controller_actions.go` at line 195, The constant name LLMWorkVariantAutoscallingDepedencies contains typos; rename its declaration to LLMWorkVariantAutoscalingDependencies and update every usage (e.g., the call rr.Conditions.MarkUnknown(LLMWorkVariantAutoscallingDepedencies) in kserve_controller_actions.go and any other references) to the corrected identifier so the code compiles and matches the existing naming pattern like LLMInferenceServiceDependencies; ensure you update the original constant definition (formerly in kserve.go) and any tests or comments that reference the old misspelled name.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/controller/components/kserve/kserve_controller_actions.go`:
- Around line 267-289: Fix the typo in the comment: change "escalte" to
"escalate" on the line describing Severity. Also avoid allocating a new
dependency action on every reconciliation by extracting the instantiation of
dependency.NewAction(dependency.MonitorOperator(...)) into a package-level
variable (e.g., preconstructedAction) and have monitorCMAOperator call that
variable; reference the symbols monitorCMAOperator, dependency.NewAction,
dependency.MonitorOperator, gvk.CustomMetricsAutoscalerV1Alpha1,
common.ConditionSeverityError and cmaConditionFilter when locating the code to
change.
In `@internal/controller/components/kserve/kserve.go`:
- Around line 42-44: Update UpdateDSCStatus() to also propagate the new KServe
dependency condition LLMWorkVariantAutoscallingDepedencies into the DSC status
the same way LLMInferenceServiceDependencies and
LLMInferenceServiceWideEPDependencies are copied: when reading the KServe CR
conditions, look up the LLMWorkVariantAutoscallingDepedencies condition and set
or update the corresponding DSC condition (preserving status, reason, message,
and timestamps) so degraded CMA/WVA state becomes visible in the DSC conditions.
In `@README.md`:
- Around line 510-511: Add a prerequisite note to the example explaining that
setting "wva: Managed" requires the external CMA/KEDA controller path to be
installed and healthy; update the README example where "wva: Managed" appears to
include a short bullet or inline sentence stating the dependency, how to verify
the CMA/KEDA controllers are installed and running (e.g., check controller
pods/CRDs/health endpoints), and a brief remediation step if not present (switch
to Removed or install/enable the controller). Ensure the text references the
exact field name "wva: Managed" and the external components "CMA/KEDA
controller" so users see the dependency next to the example.
In `@tests/e2e/modelcontroller_test.go`:
- Around line 97-113: The test currently only flips
`.spec.components.kserve.wva.managementState` and then asserts the WVA
Deployment exists, which makes the outcome depend on prior suite state; change
the test to explicitly establish prerequisites by first ensuring the KServe
component is present and set to Managed (use the same DataScienceCluster object
mutated via tc.EventuallyResourcePatched with
WithMinimalObject(gvk.DataScienceCluster, tc.DataScienceClusterNamespacedName)
and WithMutateFunc using testf.TransformPipeline/testf.Transform to set
`.spec.components.kserve.managementState = "Managed"` or otherwise
create/configure the KServe component), then flip
`.spec.components.kserve.wva.managementState` and finally call
EnsureResourceExists for the Deployment as before; reference the existing
helpers EventuallyResourcePatched, WithMutateFunc, TransformPipeline, Transform,
and EnsureResourceExists to implement the setup so the test is self-contained.
---
Outside diff comments:
In `@internal/controller/components/modelcontroller/modelcontroller_actions.go`:
- Around line 63-65: The code only calls odhdeploy.ApplyParams on
rr.Manifests[0], so any additional manifests in rr.Manifests (e.g., WVA entries)
never get parameter substitution; change the logic to iterate over rr.Manifests
and call odhdeploy.ApplyParams for each manifest string, passing "params.env"
and extraParamsMap for every entry (use the same signature currently used for
rr.Manifests[0]); ensure errors from ApplyParams on any manifest are
handled/returned the same way as the current single-call case so failures on
later manifests surface correctly.
---
Nitpick comments:
In `@internal/controller/components/kserve/kserve_controller_actions_test.go`:
- Around line 313-354: Add a new subtest that covers the "present-but-degraded"
operator path for WVA: create kserveWithWVA and include cmaOperatorSubscription
plus a corresponding ClusterServiceVersion (or other operator health object)
whose status indicates degraded (e.g., CSV.Status.Phase = "Failed" or a
condition showing not available), then call checkPreConditions with that
ReconciliationRequest and assert
rr.Conditions.GetCondition(LLMWorkVariantAutoscallingDepedencies) yields the
degraded/false result; use the same setup style as the existing tests
(createKserveCR, fakeclient.New, conditions.NewManager) and reference
cmaOperatorSubscription, checkPreConditions, and
LLMWorkVariantAutoscallingDepedencies to locate where to add the test.
In `@internal/controller/components/kserve/kserve_controller_actions.go`:
- Line 195: The constant name LLMWorkVariantAutoscallingDepedencies contains
typos; rename its declaration to LLMWorkVariantAutoscalingDependencies and
update every usage (e.g., the call
rr.Conditions.MarkUnknown(LLMWorkVariantAutoscallingDepedencies) in
kserve_controller_actions.go and any other references) to the corrected
identifier so the code compiles and matches the existing naming pattern like
LLMInferenceServiceDependencies; ensure you update the original constant
definition (formerly in kserve.go) and any tests or comments that reference the
old misspelled name.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Central YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: 774b3326-9d66-4213-9518-84280710b4c1
📒 Files selected for processing (18)
README.mdapi/components/v1alpha1/kserve_types.goapi/components/v1alpha1/modelcontroller_types.goapi/components/v1alpha1/zz_generated.deepcopy.goconfig/rhoai/samples/datasciencecluster_v2_datasciencecluster.yamlconfig/samples/datasciencecluster_v2_datasciencecluster.yamldocs/DESIGN.mdget_all_manifests.shinternal/controller/components/kserve/kserve.gointernal/controller/components/kserve/kserve_controller.gointernal/controller/components/kserve/kserve_controller_actions.gointernal/controller/components/kserve/kserve_controller_actions_test.gointernal/controller/components/kserve/kserve_support.gointernal/controller/components/modelcontroller/modelcontroller.gointernal/controller/components/modelcontroller/modelcontroller_actions.gointernal/controller/components/modelcontroller/modelcontroller_support.gopkg/cluster/gvk/gvk.gotests/e2e/modelcontroller_test.go
|
|
||
| // monitorCMAOperator conditionally monitors CMA operator health only when WVA is enabled. | ||
| func monitorCMAOperator(ctx context.Context, rr *odhtypes.ReconciliationRequest) error { | ||
| k, ok := rr.Instance.(*componentApi.Kserve) | ||
| if !ok { | ||
| return fmt.Errorf("resource instance %v is not a componentApi.Kserve", rr.Instance) | ||
| } | ||
|
|
||
| if k.Spec.WVA.ManagementState != operatorv1.Managed { | ||
| return nil | ||
| } | ||
|
|
||
| // Reuse existing dependency monitoring infrastructure | ||
| monitorAction := dependency.NewAction( | ||
| dependency.MonitorOperator(dependency.OperatorConfig{ | ||
| OperatorGVK: gvk.CustomMetricsAutoscalerV1Alpha1, | ||
| Severity: common.ConditionSeverityError, // cause Kserve Ready: False and escalte to DSC | ||
| Filter: cmaConditionFilter, | ||
| }), | ||
| ) | ||
|
|
||
| return monitorAction(ctx, rr) | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify cmaConditionFilter is defined
rg -n 'cmaConditionFilter' --type goRepository: opendatahub-io/opendatahub-operator
Length of output: 460
Fix typo on line 283 and consider extracting action instantiation.
-
Line 283: "escalte" → "escalate"
-
Unlike LWS monitoring (configured statically in builder), this creates a new
dependency.NewActioninstance per reconciliation. While functionally correct, consider extracting to a package-level variable if reconciliation frequency makes allocation overhead measurable.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/controller/components/kserve/kserve_controller_actions.go` around
lines 267 - 289, Fix the typo in the comment: change "escalte" to "escalate" on
the line describing Severity. Also avoid allocating a new dependency action on
every reconciliation by extracting the instantiation of
dependency.NewAction(dependency.MonitorOperator(...)) into a package-level
variable (e.g., preconstructedAction) and have monitorCMAOperator call that
variable; reference the symbols monitorCMAOperator, dependency.NewAction,
dependency.MonitorOperator, gvk.CustomMetricsAutoscalerV1Alpha1,
common.ConditionSeverityError and cmaConditionFilter when locating the code to
change.
| wva: | ||
| managementState: Removed |
There was a problem hiding this comment.
Document the prerequisite for wva: Managed.
This is the first user-facing example of the new field, but it does not mention that enabling WVA depends on the external CMA/KEDA controller path being installed and healthy. As written, users can flip this to Managed and get a broken rollout with no hint why.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@README.md` around lines 510 - 511, Add a prerequisite note to the example
explaining that setting "wva: Managed" requires the external CMA/KEDA controller
path to be installed and healthy; update the README example where "wva: Managed"
appears to include a short bullet or inline sentence stating the dependency, how
to verify the CMA/KEDA controllers are installed and running (e.g., check
controller pods/CRDs/health endpoints), and a brief remediation step if not
present (switch to Removed or install/enable the controller). Ensure the text
references the exact field name "wva: Managed" and the external components
"CMA/KEDA controller" so users see the dependency next to the example.
This PR can't be merged just yet 😢Please run For more info: https://github.com/opendatahub-io/opendatahub-operator/actions/runs/22965964127 |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (2)
internal/controller/components/modelcontroller/modelcontroller_support.go (1)
49-55: Optional: Extract "wva" to a constant for pattern consistency.The string literal "wva" appears only once in the codebase, so extracting it to a constant won't prevent typos elsewhere. However,
manifestsPath()uses theComponentNameconstant forContextDir, while this function uses a hardcoded string. Adopting a uniform pattern would improve readability.♻️ Optional refactor
const ( ComponentName = componentApi.ModelControllerComponentName ReadyConditionType = componentApi.ModelControllerKind + status.ReadySuffix // LegacyComponentName is the name of the component that is assigned to deployments // via Kustomize. Since a deployment selector is immutable, we can't upgrade existing // deployment to the new component name, so keep it around till we figure out a solution. LegacyComponentName = "odh-model-controller" + + // WVAContextDir is the context directory for WVA manifests. + WVAContextDir = "wva" )func wvaManifestsPath() types.ManifestInfo { return types.ManifestInfo{ Path: odhdeploy.DefaultManifestPath, - ContextDir: "wva", + ContextDir: WVAContextDir, SourcePath: "default", } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/controller/components/modelcontroller/modelcontroller_support.go` around lines 49 - 55, Extract the literal "wva" used as the ContextDir inside wvaManifestsPath() into a shared constant (e.g., WvaComponentName) to match the pattern used by manifestsPath() which uses the ComponentName constant; update wvaManifestsPath() to reference that constant instead of the hardcoded "wva" and add the new constant near other component constants so the codebase is consistent.pkg/cluster/gvk/gvk.go (1)
450-455: Rename this GVK constant to match the resource kind.The exported name says
CustomMetricsAutoscaler..., but the actual resource here iskeda.sh/v1alpha1,Kind: KedaController. That mixes the operator/product name with the CR kind and makes later call sites and logs easy to misread. Use a symbol name that matches the API object, e.g.KedaControllerV1Alpha1, and keep the comment for the operator/product alias if needed. (docs.okd.io)Suggested diff
- // CustomMetricsAutoscaler uses KEDA's CRD. - CustomMetricsAutoscalerV1Alpha1 = schema.GroupVersionKind{ + // KedaController is the CR exposed by the Custom Metrics Autoscaler / KEDA operator. + KedaControllerV1Alpha1 = schema.GroupVersionKind{ Group: "keda.sh", Version: "v1alpha1", Kind: "KedaController", }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/cluster/gvk/gvk.go` around lines 450 - 455, Rename the exported constant CustomMetricsAutoscalerV1Alpha1 to match the API Kind by changing its identifier to KedaControllerV1Alpha1 and update the preceding comment to mention KEDA as an operator/product alias (keep the original comment text if useful). Ensure the schema.GroupVersionKind value stays the same (Group "keda.sh", Version "v1alpha1", Kind "KedaController") and update all references/usages of CustomMetricsAutoscalerV1Alpha1 throughout the codebase to the new symbol name KedaControllerV1Alpha1 to avoid mismatched logs and call sites.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/api-overview.md`:
- Line 193: Update the docs entry for `wva`/`WVASpec` to explicitly state
ownership semantics: clarify that the "Managed" status means the operator
deploys and monitors the WVA integration path (installation, health of the
integration components) but does NOT claim ownership or watch
`VariantAutoscaling` resources — those remain user-managed and must be
created/edited by users; update the descriptive text wherever `wva`, `WVASpec`,
or `VariantAutoscaling` are mentioned (including the other listed occurrences)
so it warns readers that troubleshooting autoscaling should be done in the
user's control plane/VariantAutoscaling resources, not assumed to be controlled
by the operator.
- Line 1266: Rewrite the ModelControllerKerveSpec description to be a clear,
grammatical sentence that explicitly states this spec is a minimal subset of
DSCKserve preserving only the public API fields: management, NIM, and WVA
specifications; update the text for ModelControllerKerveSpec to list those
preserved fields (e.g., "management", "nim", "wva") and clarify that other
DSCKserve fields are omitted while these retained fields keep their original
semantics and public API surface.
In `@internal/controller/components/kserve/kserve.go`:
- Around line 42-44: Rename the exported constant
LLMWorkVariantAutoscallingDepedencies to the correctly spelled
LLMWorkVariantAutoscalingDependencies and update every reference (definition
plus all usages in production and test code) to the new identifier; ensure the
corrected constant value is used where it forms condition/type strings so
user-visible status conditions reflect the fix, update imports/usages in kserve
controller functions and tests that reference the old name, and run tests/build
to validate no remaining references to the misspelled symbol.
---
Nitpick comments:
In `@internal/controller/components/modelcontroller/modelcontroller_support.go`:
- Around line 49-55: Extract the literal "wva" used as the ContextDir inside
wvaManifestsPath() into a shared constant (e.g., WvaComponentName) to match the
pattern used by manifestsPath() which uses the ComponentName constant; update
wvaManifestsPath() to reference that constant instead of the hardcoded "wva" and
add the new constant near other component constants so the codebase is
consistent.
In `@pkg/cluster/gvk/gvk.go`:
- Around line 450-455: Rename the exported constant
CustomMetricsAutoscalerV1Alpha1 to match the API Kind by changing its identifier
to KedaControllerV1Alpha1 and update the preceding comment to mention KEDA as an
operator/product alias (keep the original comment text if useful). Ensure the
schema.GroupVersionKind value stays the same (Group "keda.sh", Version
"v1alpha1", Kind "KedaController") and update all references/usages of
CustomMetricsAutoscalerV1Alpha1 throughout the codebase to the new symbol name
KedaControllerV1Alpha1 to avoid mismatched logs and call sites.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Central YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: 5586afc7-8203-4d5d-bf86-f6cb4afa7617
📒 Files selected for processing (19)
README.mdapi/components/v1alpha1/kserve_types.goapi/components/v1alpha1/modelcontroller_types.goapi/components/v1alpha1/zz_generated.deepcopy.goconfig/rhoai/samples/datasciencecluster_v2_datasciencecluster.yamlconfig/samples/datasciencecluster_v2_datasciencecluster.yamldocs/DESIGN.mddocs/api-overview.mdget_all_manifests.shinternal/controller/components/kserve/kserve.gointernal/controller/components/kserve/kserve_controller.gointernal/controller/components/kserve/kserve_controller_actions.gointernal/controller/components/kserve/kserve_controller_actions_test.gointernal/controller/components/kserve/kserve_support.gointernal/controller/components/modelcontroller/modelcontroller.gointernal/controller/components/modelcontroller/modelcontroller_actions.gointernal/controller/components/modelcontroller/modelcontroller_support.gopkg/cluster/gvk/gvk.gotests/e2e/modelcontroller_test.go
🚧 Files skipped from review as they are similar to previous changes (8)
- internal/controller/components/kserve/kserve_controller_actions_test.go
- internal/controller/components/kserve/kserve_controller.go
- internal/controller/components/kserve/kserve_controller_actions.go
- api/components/v1alpha1/modelcontroller_types.go
- internal/controller/components/modelcontroller/modelcontroller.go
- internal/controller/components/modelcontroller/modelcontroller_actions.go
- config/rhoai/samples/datasciencecluster_v2_datasciencecluster.yaml
- get_all_manifests.sh
| | `rawDeploymentServiceConfig` _[RawServiceConfig](#rawserviceconfig)_ | Configures the type of service that is created for InferenceServices using RawDeployment.<br />The values for RawDeploymentServiceConfig can be "Headless" (default value) or "Headed".<br />Headless: to set "ServiceClusterIPNone = true" in the 'inferenceservice-config' configmap for Kserve.<br />Headed: to set "ServiceClusterIPNone = false" in the 'inferenceservice-config' configmap for Kserve. | Headless | Enum: [Headless Headed] <br /> | | ||
| | `nim` _[NimSpec](#nimspec)_ | Configures and enables NVIDIA NIM integration | | | | ||
| | `modelsAsService` _[DSCModelsAsServiceSpec](#dscmodelsasservicespec)_ | Configures and enables Models as a Service integration | | | | ||
| | `wva` _[WVASpec](#wvaspec)_ | Configures and enables workload-variant-autoscaler (WVA) integration | | | |
There was a problem hiding this comment.
Document WVA ownership semantics explicitly.
These descriptions currently read as if Managed means the operator fully manages WVA, but this PR’s behavior is narrower: the operator deploys/monitors the integration path while VariantAutoscaling resources remain user-managed and are not watched or owned. That distinction is important for supportability and will otherwise send users to the wrong control plane when autoscaling does not behave as expected.
Suggested doc fix
-| `wva` _[WVASpec](`#wvaspec`)_ | Configures and enables workload-variant-autoscaler (WVA) integration | | |
+| `wva` _[WVASpec](`#wvaspec`)_ | Configures workload-variant-autoscaler (WVA) integration. When managed, the operator deploys and monitors the integration prerequisites, but `VariantAutoscaling` resources remain user-managed and are not watched or owned by the operator. | | |-WVASpec enables workload-variant-autoscaler integration
+WVASpec configures workload-variant-autoscaler integration.
+When `managementState` is `Managed`, the operator manages the WVA integration path and related health checks.
+`VariantAutoscaling` resources remain user-managed and are not watched or owned by the operator.Also applies to: 888-888, 924-924, 1277-1277, 1983-2000
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/api-overview.md` at line 193, Update the docs entry for `wva`/`WVASpec`
to explicitly state ownership semantics: clarify that the "Managed" status means
the operator deploys and monitors the WVA integration path (installation, health
of the integration components) but does NOT claim ownership or watch
`VariantAutoscaling` resources — those remain user-managed and must be
created/edited by users; update the descriptive text wherever `wva`, `WVASpec`,
or `VariantAutoscaling` are mentioned (including the other listed occurrences)
so it warns readers that troubleshooting autoscaling should be done in the
user's control plane/VariantAutoscaling resources, not assumed to be controlled
by the operator.
| // For WVA. | ||
| LLMWorkVariantAutoscallingDepedencies = componentApi.KserveKind + "LLMWorkVariantAutoscallingDepedencies" | ||
| cmaOperatorSubscription = "openshift-custom-metrics-autoscaler-operator" |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
rg -n "LLMWorkVariantAutoscall" --type goRepository: opendatahub-io/opendatahub-operator
Length of output: 1255
Typo in exported constant propagates to 8 locations.
LLMWorkVariantAutoscallingDepedencies contains two spelling errors:
- "Autoscalling" → "Autoscaling" (double 'l')
- "Depedencies" → "Dependencies" (missing 'n')
The constant appears in 8 locations (1 definition + 7 usages in kserve_controller_actions.go, kserve_controller_actions_test.go). Since this is exported and used for condition types, the misspelling propagates to user-visible status conditions. All locations require synchronized fixes.
Proposed fix
- LLMWorkVariantAutoscallingDepedencies = componentApi.KserveKind + "LLMWorkVariantAutoscallingDepedencies"
+ LLMWorkVariantAutoscalingDependencies = componentApi.KserveKind + "LLMWorkVariantAutoscalingDependencies"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/controller/components/kserve/kserve.go` around lines 42 - 44, Rename
the exported constant LLMWorkVariantAutoscallingDepedencies to the correctly
spelled LLMWorkVariantAutoscalingDependencies and update every reference
(definition plus all usages in production and test code) to the new identifier;
ensure the corrected constant value is used where it forms condition/type
strings so user-visible status conditions reflect the fix, update imports/usages
in kserve controller functions and tests that reference the old name, and run
tests/build to validate no remaining references to the misspelled symbol.
There was a problem hiding this comment.
♻️ Duplicate comments (1)
tests/e2e/modelcontroller_test.go (1)
95-111:⚠️ Potential issue | 🟡 MinorMake this test establish KServe as a prerequisite itself.
This still only flips
.spec.components.kserve.wva.managementState, so the result depends on some earlier case having already left KServeManaged. If this case runs alone or after a cleanup path, it can fail without exercising WVA at all.Suggested change
tc.EventuallyResourcePatched( WithMinimalObject(gvk.DataScienceCluster, tc.DataScienceClusterNamespacedName), WithMutateFunc( testf.TransformPipeline( + testf.Transform(`.spec.components.%s.managementState = "%s"`, componentApi.KserveComponentName, operatorv1.Managed), testf.Transform(`.spec.components.%s.wva.managementState = "%s"`, componentApi.KserveComponentName, operatorv1.Managed), ), ), ) + + tc.verifyResourcesDeployed() // Ensure WVA deployment exists. tc.EnsureResourceExists(🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/e2e/modelcontroller_test.go` around lines 95 - 111, The test currently only flips `.spec.components.kserve.wva.managementState` (via tc.EventuallyResourcePatched and componentApi.KserveComponentName) so it depends on KServe already being Managed; change the test to explicitly establish KServe as a prerequisite by also patching `.spec.components.kserve.managementState = "Managed"` on the DataScienceCluster (add a second Transform in testf.TransformPipeline or extend the existing Transform to set both `.spec.components.%s.managementState` for the KServe component and `.spec.components.%s.wva.managementState`), and ensure the KServe resource is present before asserting the WVA deployment (use tc.EnsureResourceExists for the KServe resource if necessary) so the WVA deployment check (wvaDeploymentName, tc.AppsNamespace) does not depend on prior tests.
🧹 Nitpick comments (1)
tests/e2e/modelcontroller_test.go (1)
105-111: Assert rollout health, not just object creation.
EnsureResourceExistspasses as soon as the Deployment is present, so this test will miss broken rollouts where WVA is created but never becomes available. Gate this on Deployment status to prove the feature is actually usable.Suggested change
tc.EnsureResourceExists( WithMinimalObject(gvk.Deployment, types.NamespacedName{ Name: wvaDeploymentName, Namespace: tc.AppsNamespace, }), + WithCondition( + And( + jq.Match(`.status.availableReplicas >= 1`), + jq.Match(`(.status.unavailableReplicas // 0) == 0`), + ), + ), )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/e2e/modelcontroller_test.go` around lines 105 - 111, The test currently only ensures the Deployment object exists via EnsureResourceExists (WithMinimalObject, gvk.Deployment, wvaDeploymentName, tc.AppsNamespace) but does not verify the rollout completed; update the test to wait for the Deployment to become healthy by asserting its status rather than presence: after ensuring the object exists, poll the Deployment status (e.g., check Status.AvailableReplicas equals Spec.Replicas and/or the Available condition is True) until it reports available or timeout, or use an existing helper like WaitForDeploymentAvailable/EnsureDeploymentRollout if present; fail the test if the rollout never becomes healthy within the timeout.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@tests/e2e/modelcontroller_test.go`:
- Around line 95-111: The test currently only flips
`.spec.components.kserve.wva.managementState` (via tc.EventuallyResourcePatched
and componentApi.KserveComponentName) so it depends on KServe already being
Managed; change the test to explicitly establish KServe as a prerequisite by
also patching `.spec.components.kserve.managementState = "Managed"` on the
DataScienceCluster (add a second Transform in testf.TransformPipeline or extend
the existing Transform to set both `.spec.components.%s.managementState` for the
KServe component and `.spec.components.%s.wva.managementState`), and ensure the
KServe resource is present before asserting the WVA deployment (use
tc.EnsureResourceExists for the KServe resource if necessary) so the WVA
deployment check (wvaDeploymentName, tc.AppsNamespace) does not depend on prior
tests.
---
Nitpick comments:
In `@tests/e2e/modelcontroller_test.go`:
- Around line 105-111: The test currently only ensures the Deployment object
exists via EnsureResourceExists (WithMinimalObject, gvk.Deployment,
wvaDeploymentName, tc.AppsNamespace) but does not verify the rollout completed;
update the test to wait for the Deployment to become healthy by asserting its
status rather than presence: after ensuring the object exists, poll the
Deployment status (e.g., check Status.AvailableReplicas equals Spec.Replicas
and/or the Available condition is True) until it reports available or timeout,
or use an existing helper like
WaitForDeploymentAvailable/EnsureDeploymentRollout if present; fail the test if
the rollout never becomes healthy within the timeout.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Central YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: 6ac5cb1e-f1a5-4427-947e-cbdb42c2582b
📒 Files selected for processing (2)
internal/controller/components/modelcontroller/modelcontroller_support.gotests/e2e/modelcontroller_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
- internal/controller/components/modelcontroller/modelcontroller_support.go
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@build/operands-map.yaml`:
- Around line 236-237: The image reference for
RELATED_IMAGE_ODH_WORKLOAD_VARIANT_AUTOSCALER_CONTROLLER_IMAGE uses an invalid
`@v0.5.1` tag; replace this with the immutable published digest for
workload-variant-autoscaler v0.5.1 (e.g., change the value from
`quay.io/opendatahub/workload-variant-autoscaler@v0.5.1` to
`quay.io/opendatahub/workload-variant-autoscaler@sha256:<published-digest>`),
ensuring the entry uses the sha256 digest like the other entries so OCI syntax
is correct and supply-chain pinning is preserved.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Central YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: 949edb48-1cd7-4c4d-9bac-aeeb6f206047
📒 Files selected for processing (2)
build/manifests-config.yamlbuild/operands-map.yaml
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
tests/e2e/modelcontroller_test.go (1)
96-104:⚠️ Potential issue | 🟠 MajorMake each WVA test establish KServe/WVA state explicitly.
Both cases still depend on a previous test having already driven KServe to
Managed. With WVA defaulting toRemoved, these assertions are order-dependent and can fail or hang when run in isolation. Set both.spec.components.kserve.managementStateand.spec.components.kserve.wva.managementStatein each WVA-specific test before checking the Deployment or ConfigMap.Also applies to: 134-143
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/e2e/modelcontroller_test.go` around lines 96 - 104, Tests that assert WVA behavior are order-dependent because they only set .spec.components.kserve.wva.managementState and assume kserve is already Managed; update each WVA-specific test (the blocks using tc.EventuallyResourcePatched with WithMinimalObject/WithMutateFunc and testf.TransformPipeline/testf.Transform) to explicitly set both .spec.components.kserve.managementState = "Managed" and .spec.components.kserve.wva.managementState = "<desiredState>" before proceeding to check Deployments/ConfigMaps (apply the same change to the other occurrence noted at lines 134-143).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/e2e/modelcontroller_test.go`:
- Around line 128-153: The test currently hardcodes queueSpareTriggerOriginal =
"3" which will break if the cluster default changes; instead read the live
ConfigMap value and derive the original string before patching. Update the test
to GET the ConfigMap named wvaConfigMapName (use the test helper that retrieves
resources, e.g. tc.GetResource or tc.KubeClient.Get) and read cm.Data["default"]
to compute the current queueSpareTriggerOriginal value, then call
tc.EventuallyResourcePatched with testf.TransformPipeline/testf.Transform using
that derived original value and the desired queueSpareTriggerModified to perform
the substitution; keep the rest of the assertions (jq.Match, wvaConfigMapName,
queueSpareTriggerModified) the same.
- Around line 145-181: Register the ConfigMap rollback before performing the
first mutation: move or invoke the existing restore call (the
tc.EventuallyResourcePatched block that uses WithMinimalObject(gvk.ConfigMap,
namespaced name wvaConfigMapName) and the WithMutateFunc that transforms
queueSpareTriggerModified back to queueSpareTriggerOriginal) so it is installed
prior to the initial tc.EventuallyResourcePatched that sets
queueSpareTriggerOriginal→queueSpareTriggerModified (or register it as a cleanup
via tc.RegisterCleanup/defer), ensuring the restore will run on every failure
path.
---
Duplicate comments:
In `@tests/e2e/modelcontroller_test.go`:
- Around line 96-104: Tests that assert WVA behavior are order-dependent because
they only set .spec.components.kserve.wva.managementState and assume kserve is
already Managed; update each WVA-specific test (the blocks using
tc.EventuallyResourcePatched with WithMinimalObject/WithMutateFunc and
testf.TransformPipeline/testf.Transform) to explicitly set both
.spec.components.kserve.managementState = "Managed" and
.spec.components.kserve.wva.managementState = "<desiredState>" before proceeding
to check Deployments/ConfigMaps (apply the same change to the other occurrence
noted at lines 134-143).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Central YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: 1de8b2e2-39b8-441c-bbeb-90c188d5450e
📒 Files selected for processing (2)
build/operands-map.yamltests/e2e/modelcontroller_test.go
✅ Files skipped from review due to trivial changes (1)
- build/operands-map.yaml
- add WVA under DSC kserve - WVA will be deployment via modelcontroller - checks on WVA e.g CMA (we dont take Keda as its operator is deprecated) only when WVA is Managed: 1. subscritption: if not exist, we warn user to install it - build: some new konflux build related config - test: add more test for WVA also check CM can be user configable - update: add missing owns() on secret Signed-off-by: Wen Zhou <wenzhou@redhat.com>
- wont be any degradate status on kedacontroller - only show subscription miss warning when wva is Managed - some updates on the e2e tests Signed-off-by: Wen Zhou <wenzhou@redhat.com>
Signed-off-by: Wen Zhou <wenzhou@redhat.com>
- the check should be done on the resource recreated after deletion not the one before gets deleted - since no call is on this check but only if the resource gets recreated or not, it is not gonna break behvaior Signed-off-by: Wen Zhou <wenzhou@redhat.com>
|
/override "Red Hat Konflux / odh-operator-ci-e2e-its-manual / odh-operator-ci" |
|
@davidebianchi: Overrode contexts on behalf of davidebianchi: Red Hat Konflux / odh-operator-ci-e2e-its-manual / odh-operator-ci, Red Hat Konflux / odh-operator-ci-on-pull-request DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/test opendatahub-operator-rhoai-e2e |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: davidebianchi The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/override ci/prow/opendatahub-operator-rhoai-e2e as this failed only in RHOAI and on another PR, manifest pointed by branch might caused issue from kserve |
|
@zdtsw: Overrode contexts on behalf of zdtsw: ci/prow/opendatahub-operator-rhoai-e2e DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #3243 +/- ##
==========================================
- Coverage 50.78% 50.59% -0.20%
==========================================
Files 194 194
Lines 14179 14235 +56
==========================================
+ Hits 7201 7202 +1
- Misses 6241 6295 +54
- Partials 737 738 +1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
4af9290
into
opendatahub-io:main
DO NOT MERGE TILL opendatahub-io/workload-variant-autoscaler#36 IS IN, this will unblocked the new e2e test on configable configmapDescription
ref :https://issues.redhat.com/browse/RHOAIENG-52198
How Has This Been Tested?
Screenshot or short clip
Merge criteria
E2E test suite update requirement
When bringing new changes to the operator code, such changes are by default required to be accompanied by extending and/or updating the E2E test suite accordingly.
To opt-out of this requirement:
E2E update requirement opt-out justificationsection belowE2E update requirement opt-out justification
Summary by CodeRabbit
New Features
Documentation
Tests
Chores