-
Notifications
You must be signed in to change notification settings - Fork 498
[FLINK-36863][autoscaler] Use the maximum parallelism in the past scale-down.interval window when scaling down #922
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
345af37 to
db578c8
Compare
| while (!recommendedParallelisms.isEmpty() | ||
| && recommendedParallelisms.peekLast().getParallelism() <= parallelism) { | ||
| recommendedParallelisms.pollLast(); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good optimization 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @gyfora for the comment in advance, I'm still adding more tests and comments for this PR.
I have some other works to do this week, so I expect this PR will be ready next week.
ec694b0 to
4166c97
Compare
20d8552 to
49478e4
Compare
| List.of( | ||
| 790, 200, 200, 200, 200, 200, 790, 200, 200, 200, 200, 200, 790, 200, 200, | ||
| 200, 200, 200, 790, 200, 200, 200, 200, 200, 790, 200, 200, 200, 200, 200, | ||
| 790), | ||
| List.of(790, 200, 790, 200, 790, 200, 790, 200, 790, 200, 790, 200, 790, 200, 790)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the RecommendedParallelism class does not include outsideUtilizationBound, this test will be failed for the last 2 inputs.
The scale down interval is 60 minutes, and metric window is 10 minutes, so the recommended parallelism will be 790 for the last 2 inputs. When we check outsideUtilizationBound, we need to check the outsideUtilizationBound corresponding to the parallelism of 790.
That's why RecommendedParallelism class includes outsideUtilizationBound. If it doesn't include outsideUtilizationBound, the latest outsideUtilizationBound may come from parallelism 200
| if (parallelismChange.isOutsideUtilizationBound()) { | ||
| anyVertexOutsideBound.set(true); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We check the OutsideUtilizationBound of the final recommended parallelism instead of the OutsideUtilizationBound of latest recommended parallelism.
If the Utilization of all tasks is within range, we can skip scaling.
1996fanrui
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy new year @gyfora @mxm , this PR is ready for review. Please help review in your free time, thank you in advance~
I introduced the DelayedScaleDownEndToEndTest in the last commit to test delayed scale down in detail, so the code changes are more than 1 K lines, but about 80% of code changes are tests.
gyfora
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
@1996fanrui it may make sense to merge #921 first and then adjust and merge this one. As it may need a rebase as you also refactored some utilization related bits |
Thanks @gyfora for the review and reminder! I will merge #921 tomorrow if no more comments~ |
…le-down.interval window when scaling down
49478e4 to
c6151f3
Compare
c6151f3 to
486330a
Compare
huyuanfeng2018
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
|
Thanks for the review, merging~ |
What is the purpose of the change
[FLINK-36863][autoscaler] Use the maximum parallelism in the past scale-down.interval window when scaling down instead of in the entire past
See more details from FLINK-36863.
Brief change log
VertexDelayedScaleDownInfo maintain all recommended parallelisms at each time within the past scale-down.interval window period.
Note1 : It is a scenario that calculates the max value within a sliding window.
Note2: Check the
OutsideUtilizationBoundof the final recommended parallelism instead of theOutsideUtilizationBoundof latest recommended parallelismSee more details from these 2 comments: #922 (comment) and #922 (comment)
Verifying this change
DelayedScaleDownTestScalingExecutorTest#testUtilizationBoundariesbased on the latest logicDelayedScaleDownEndToEndTestto test delayed scale down in detailDoes this pull request potentially affect one of the following parts:
CustomResourceDescriptors: noDocumentation