-
Notifications
You must be signed in to change notification settings - Fork 51
Prevent autoscaler panic when panicThresholdPercent is unset #731
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Prevent autoscaler panic when panicThresholdPercent is unset #731
Conversation
Signed-off-by: WHOIM1205 <[email protected]>
Summary of ChangesHello @WHOIM1205, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses a critical stability issue in the autoscaler controller by resolving a nil pointer dereference. Previously, the heterogeneous autoscaler optimizer could crash if the optional Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
|
/assign @hzxuzhonghu |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request effectively resolves a potential nil pointer dereference in the heterogeneous autoscaler optimizer. By adding a nil check for the optional panicThresholdPercent field, it prevents the autoscaler controller from crashing with valid user configurations. The fix is minimal and well-targeted. I have one suggestion to improve the readability and robustness of the conditional check by using an idiomatic Go pattern and preventing potential integer overflows.
| return nil, nil | ||
| } | ||
| if recommendedInstances*100 >= instancesCountSum*(*autoscalePolicy.Spec.Behavior.ScaleUp.PanicPolicy.PanicThresholdPercent) { | ||
| if autoscalePolicy.Spec.Behavior.ScaleUp.PanicPolicy.PanicThresholdPercent != nil && recommendedInstances*100 >= instancesCountSum*(*autoscalePolicy.Spec.Behavior.ScaleUp.PanicPolicy.PanicThresholdPercent) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For better readability and to avoid repeating the long expression, you can use a short-lived variable within the if statement. This is a common Go idiom.
Additionally, to prevent potential integer overflow when multiplying int32 values, it's safer to cast the operands to int64 before multiplication. While an overflow is unlikely with typical replica counts, this change makes the code more robust against large numbers of instances.
| if autoscalePolicy.Spec.Behavior.ScaleUp.PanicPolicy.PanicThresholdPercent != nil && recommendedInstances*100 >= instancesCountSum*(*autoscalePolicy.Spec.Behavior.ScaleUp.PanicPolicy.PanicThresholdPercent) { | |
| if p := autoscalePolicy.Spec.Behavior.ScaleUp.PanicPolicy.PanicThresholdPercent; p != nil && int64(recommendedInstances)*100 >= int64(instancesCountSum)*int64(*p) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the suggestion
for this pr I’d prefer to keep the change strictly scoped to fixing the nil dereference and align with the existing pattern used in the homogeneous scaler introducing additional variables or type changes would broaden the scope beyond the original issue
happy to revisit readability or overflow hardening separately if needed
Regenerate helm-chart-values.md to match helm-docs output. Local sub-chart dependencies don't have repository URLs. Signed-off-by: WHOIM1205 <[email protected]>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
The test/e2e/utils, test/e2e/framework, and test/e2e/router/context packages contain helper code but no _test.go files, causing go test to fail on empty packages. Exclude these from the e2e test execution. Signed-off-by: WHOIM1205 <[email protected]>
255552a to
ea1f7d2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets wait for review of maintainers
| | | networking | 1.0.0 | | ||
| | | workload | 1.0.0 | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wdum by this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was only to fix a gen check failure due to a markdown table column mismatch no semantic change intended
| @./test/e2e/setup.sh | ||
| @echo "Running E2E tests sequentially..." | ||
| @KUBECONFIG=/tmp/kubeconfig-e2e go test -p 1 $$(go list ./... | grep /test/e2e) -v -timeout=15m | ||
| @KUBECONFIG=/tmp/kubeconfig-e2e go test -p 1 $$(go list ./... | grep /test/e2e | grep -v /utils | grep -v /framework | grep -v /context) -v -timeout=15m |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it to exclude auxiliary packages to avoid executing go test on packages without test packages?
just optional i think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes exactly these packages only contain helper code and no test.go files excluding them avoids go test failing on empty packages while keeping all actual e2e test packages executed
| return nil, nil | ||
| } | ||
| if recommendedInstances*100 >= instancesCountSum*(*autoscalePolicy.Spec.Behavior.ScaleUp.PanicPolicy.PanicThresholdPercent) { | ||
| if autoscalePolicy.Spec.Behavior.ScaleUp.PanicPolicy.PanicThresholdPercent != nil && recommendedInstances*100 >= instancesCountSum*(*autoscalePolicy.Spec.Behavior.ScaleUp.PanicPolicy.PanicThresholdPercent) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
acceptable. Scaler is more robust
| if autoscalePolicy.Spec.Behavior.ScaleUp.PanicPolicy.PanicThresholdPercent != nil && recommendedInstances*100 >= currentInstancesCount*(*autoscalePolicy.Spec.Behavior.ScaleUp.PanicPolicy.PanicThresholdPercent) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for confirming
Fix nil pointer panic in heterogeneous autoscaler optimizer
Description
This PR fixes a nil pointer dereference in the heterogeneous autoscaler optimizer that could crash the autoscaler controller when an optional field is not configured.
panicThresholdPercentis defined as an optional field in theAutoscalingPolicyAPI. While the homogeneous scaler already handles this correctly, the heterogeneous optimizer assumed the field was always set and dereferenced it unconditionally.This mismatch could cause the autoscaler to panic at runtime for valid user configurations.
What’s Fixed
panicThresholdPercentin the heterogeneous optimizerImpact
panicThresholdPercenttruly optional, as intended by the APICode Changes
pkg/autoscaler/autoscaler/optimizer.go(*Optimizer).OptimizePanicThresholdPercentThe fix is intentionally minimal and localized to avoid any unintended side effects.
Test Verification
No new tests were added.
Existing autoscaler tests already cover both scenarios:
panicThresholdPercentconfiguredpanicThresholdPercentis omittedPreviously, the latter case could panic at runtime. With this change, the same code paths execute safely without altering behavior.