[FLINK-36527][autoscaler] Introduce a parameter to support autoscaler adopt a more radical strategy when source vertex or upstream shuffle is keyBy #904

huyuanfeng2018 · 2024-10-17T07:09:25Z

What is the purpose of the change

Introduce a parameter to support autoscaler adopt a more radical strategy when source vertex or upstream shuffle is keyBy

Brief change log

Add a new option: scaling.key-group.partitions.adjust.mode
Support use a more aggressive strategy（Resource utilization takes priority） to determine the degree of parallelism after Source or after keyby without first considering balanced consumption.

Verifying this change

in org.apache.flink.autoscaler.JobVertexScalerTest.JobVertexScalerTest

testParallelismComputationWithAdjustment and testNumPartitionsAdjustment add logic to test

Does this pull request potentially affect one of the following parts:

Dependencies (does it add or upgrade a dependency): (yes / no)
The public API, i.e., is any changes to the CustomResourceDescriptors: (yes / no)
Core observer or reconciler logic that is regularly executed: (yes / no)

Documentation

Does this pull request introduce a new feature? (no)
If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)

SamBarker

I'm lacking context for the change so my comments could be missing the mark but I think that means I represent a lot of people trying to consume the proposed config, if the naming confuses me its likely to confuse others.

SamBarker · 2024-10-17T08:19:17Z

docs/layouts/shortcodes/generated/auto_scaler_configuration.html

            <td>Time interval to resend the identical event</td>
        </tr>
+        <tr>
+            <td><h5>job.autoscaler.scaling.radical.enabled</h5></td>


Coming at this cold, its not at all clear to me what radical means. While the description goes some way towards clarifying the intent it doesn't feel like a great term, additionally following through the JIRA links radical feels like a very off term for a default (assuming I'm following properly). I wonder if job.autoscaler.scaling.maximizeUtilisation.enabled would make things more explicit?

SamBarker · 2024-10-17T08:22:39Z

flink-autoscaler/src/main/java/org/apache/flink/autoscaler/JobVertexScaler.java

+                    (scalingRadical
+                            && numKeyGroupsOrPartitions / p
+                                    < numKeyGroupsOrPartitions / newParallelism)) {


I think extracting this as a method canMaximiseUtilisation the intent is to make the intent of the condition easier to understand when working through the code.

SamBarker · 2024-10-17T08:23:41Z

flink-autoscaler/src/main/java/org/apache/flink/autoscaler/JobVertexScaler.java

+        // When adjust the parallelism after rounding up cannot be
+        // find the right degree of parallelism to meet requirements,
+        // Try to find the smallest parallelism that can satisfy the current consumption rate.


nits

Suggested change

// When adjust the parallelism after rounding up cannot be

// find the right degree of parallelism to meet requirements,

// Try to find the smallest parallelism that can satisfy the current consumption rate.

// When adjusting the parallelism after rounding up cannot

// find the right degree of parallelism to meet requirements.

// Try to find the smallest parallelism that can satisfy the current consumption rate.

SamBarker · 2024-10-17T08:25:04Z

flink-autoscaler/src/main/java/org/apache/flink/autoscaler/config/AutoScalerOptions.java

+                    .withFallbackKeys(oldOperatorConfigKey("scaling.radical.enabled"))
+                    .withDescription(
+                            "If this option is enabled, The determination of parallelism will be more radical, which"
+                                    + " will maximize resource utilization, but may also cause data skew in some vertex.");


I think it might be helpful to give consumers/users some more context as to how/why it would potentially cause skew.

huyuanfeng2018 · 2024-10-17T09:07:48Z

I'm lacking context for the change so my comments could be missing the mark but I think that means I represent a lot of people trying to consume the proposed config, if the naming confuses me its likely to confuse others.

@SamBarker Thank you very much for your review. scaling.radical.enabled is really not easy to understand. I think your concern is warranted. I will think about the naming of this option.

...r/src/main/java/org/apache/flink/autoscaler/NumKeyGroupsOrPartitionsParallelismAdjuster.java

SamBarker · 2024-10-23T00:42:36Z

flink-autoscaler/src/test/java/org/apache/flink/autoscaler/JobVertexScalerTest.java

                        context));
+
+        assertEquals(
+                32,


It would be nice to tie this answer to something so it was clear why it was picked.
e.g.

Suggested change

32,

EXPECTED_KEY_GROUPS,

SamBarker · 2024-10-23T00:43:18Z

flink-autoscaler/src/test/java/org/apache/flink/autoscaler/JobVertexScalerTest.java

                        context));
+
+        assertEquals(
+                199,


Suggested change

199,

PARTITION_COUNT,

SamBarker · 2024-10-23T00:44:01Z

flink-autoscaler/src/test/java/org/apache/flink/autoscaler/JobVertexScalerTest.java

+                AutoScalerOptions.SCALING_KEY_GROUP_PARTITIONS_ADJUST_MODE,
+                NumKeyGroupsOrPartitionsParallelismAdjuster.Mode.MAXIMIZE_UTILISATION);
+        assertEquals(
+                100,


Suggested change

100,

MAXIMUM_UTILISATION_PARALLELISM,

Its not the number that matters but what it means.

SamBarker

LGTM

Thanks for listening @huyuanfeng2018

huyuanfeng2018 · 2024-10-28T08:11:27Z

Hey @1996fanrui @mxm , Would you help reviewing this PR ?

1996fanrui

Thanks @huyuanfeng2018 for the ping, I will review it this week.

mxm

Thanks for the PR! This is an interesting idea. See the comments inline. I would prefer to do the refactoring separately, as it distracts from the actual changes at hand (which are relatively small). There is also some value in preserving the Git history of the changes.

mxm · 2024-10-28T16:28:50Z

flink-autoscaler/src/main/java/org/apache/flink/autoscaler/ParallelismAdjuster.java

+                    (mode == MAXIMIZE_UTILISATION
+                            && numKeyGroupsOrPartitions / p
+                                    < numKeyGroupsOrPartitions / newParallelism)) {


From what I can tell, this is the only change in this PR, apart from the refactoring. The assumption here is that for cases where a parallelism such that numKeyGroupsOrPartitions % parallelism == 0 cannot be found, we at least pick a parallelism which leads to fewer state/partition imbalance.

I understand the idea behind this but I wonder whether it has the desired effect. The autoscaling algorithm isn't aware of any state/partition imbalance, it assumes linear scale through state/partition balance can be achieved. A slight adjustment to the parallelism won't drastically improve the situation.

It looks like this change could help in situations where the number of partition / key groups do not have many divisors, but its also kind of hard to reason about.

Sorry for replying so late.

I understand the idea behind this but I wonder whether it has the desired effect. The autoscaling algorithm isn't aware of any state/partition imbalance, it assumes linear scale through state/partition balance can be achieved. A slight adjustment to the parallelism won't drastically improve the situation.

When the partition or keygroups is evenly distributed and there are relatively many divisors such as (60, 120, 720), we can indeed easily achieve linear scaling through state/partition balancing, which is the most ideal situation.

If the number of partitions is unreasonable, such as 35, hope adjust the parallelism to linearly increase the consumption rate, Although it is not possible to consume evenly, it is meaningful.

Imagine a situation: when running with 7 parallelism degrees, the busy value of the operator is large. No matter what the scaleFactor , it can be expected that the autoscaler will expand the parallelism degree to 35, which may cause the operator to be very idle, but we know that the linear assumption may not be completely linear when changing 7->35, and it may not be possible to obtain a 5 (35/7) times increase in processing speed (this is related to many factors), it may be in the next cycle Trigger scale down again, and then the cycle repeats

In addition, due to the limitations of the scale-down.max-factor parameter, we may never be able to reduce the degree of parallelism

If use the MAXIMIZE_UTILISATION mode, can significantly improve this phenomenon.

It looks like this change could help in situations where the number of partition / key groups do not have many divisors, but its also kind of hard to reason about.

Any suggestions?

1996fanrui

I would prefer to do the refactoring separately

I agree with @mxm , refactoring should at least be a separate commit.

flink-autoscaler/src/main/java/org/apache/flink/autoscaler/ParallelismAdjuster.java

huyuanfeng2018 · 2024-10-30T09:36:08Z

Thanks for the PR! This is an interesting idea. See the comments inline. I would prefer to do the refactoring separately, as it distracts from the actual changes at hand (which are relatively small). There is also some value in preserving the Git history of the changes.

I would prefer to do the refactoring separately

I agree with @mxm , refactoring should at least be a separate commit.

ok, I will revert the changes

flink-autoscaler/src/main/java/org/apache/flink/autoscaler/ParallelismAdjuster.java

mxm

Looks good to me overall, but I would like to ask you to drop the refactoring commit. The reason is that this code won't be reused via an extra class. It isn't more easily testable. It also doesn't make the code easier to read, and it removes the valuable Git history from the JobVertexScaler file. I hope that makes sense.

mxm · 2024-11-15T08:55:23Z

flink-autoscaler/src/main/java/org/apache/flink/autoscaler/ParallelismAdjuster.java

+
+    /** The mode of the key group or parallelism adjustment. */
+    public enum KeyGroupOrPartitionsAdjustMode implements DescribedEnum {
+        ABSOLUTELY_EVENLY_DISTRIBUTION(


The name is confusing because this doesn't guarantee absolutely even distribution.

How about EVENLY_SPREAD?

The name is confusing because this doesn't guarantee absolutely even distribution.

How about EVENLY_SPREAD?

Looks like EVENLY_SPREAD is more reasonable, I'm fine with that
@1996fanrui What do you think about this ?

Sounds good to me.

huyuanfeng2018 · 2024-11-18T10:09:06Z

Looks good to me overall, but I would like to ask you to drop the refactoring commit. The reason is that this code won't be reused via an extra class. It isn't more easily testable. It also doesn't make the code easier to read, and it removes the valuable Git history from the JobVertexScaler file. I hope that makes sense.

Thanks for the review. Regarding this part of the refactoring, I originally thought that there might be some other logic to modify the parallelism in the future, just like fine-tuning based on numpartitionOrKeygroup here, so I refactored it, But I think your concerns are valid (this would change the git history).

So, i have no problem with canceling this part of the refactoring.

mxm · 2024-11-18T15:37:26Z

Thanks! Let's get the PR ready to be merged then.

… adopt a more radical strategy when source vertex or upstream shuffle is keyBy

1996fanrui · 2024-11-19T14:00:02Z

flink-autoscaler/src/main/java/org/apache/flink/autoscaler/JobVertexScaler.java

-        var upperBoundForAlignment =
-                Math.min(
-                        // Optimize the case where newParallelism <= maxParallelism / 2
-                        newParallelism > numKeyGroupsOrPartitions / 2
-                                ? numKeyGroupsOrPartitions
-                                : numKeyGroupsOrPartitions / 2,
-                        upperBound);
+
+        KeyGroupOrPartitionsAdjustMode mode =
+                context.getConfiguration().get(SCALING_KEY_GROUP_PARTITIONS_ADJUST_MODE);
+
+        var upperBoundForAlignment = Math.min(numKeyGroupsOrPartitions, upperBound);


Why the upperBoundForAlignment logic is updated? Would you mind sharing one case?

The optimization to use half of the keyspace when the parallelism is less/equal to the keyspace doesn't work for the new parallelism adjustment mode. It was anyways just a shortcut, to avoid checking all divisors until the maximum. No harm removing it.

Why the upperBoundForAlignment logic is updated? Would you mind sharing one case?

When I ran the test case, i found this logic is wrong when keygroups or partitionnum are not even numbers.

Let me give you an example:

partition=199, newParallelism=96
At this time, the original logical upperBoundForAlignment will be calculated as 99,

But under EVENLY_SPREAD, I think the result is 199 as expected

Under MAXIMIZE_UTILISATION, the result is 100 as expected,

But due to this wrong logic, we cannot get the expected results. This is a bug. If it is not appropriate to fix it in this PR, is it necessary to fix it in another PR?

It's seems it's wrong logic when numKeyGroupsOrPartitions is odd number.

Thanks for the clarification! This change makes sense to me.

mxm · 2024-11-19T14:14:29Z

flink-autoscaler/src/main/java/org/apache/flink/autoscaler/JobVertexScaler.java

-        var upperBoundForAlignment =
-                Math.min(
-                        // Optimize the case where newParallelism <= maxParallelism / 2
-                        newParallelism > numKeyGroupsOrPartitions / 2
-                                ? numKeyGroupsOrPartitions
-                                : numKeyGroupsOrPartitions / 2,
-                        upperBound);
+
+        KeyGroupOrPartitionsAdjustMode mode =
+                context.getConfiguration().get(SCALING_KEY_GROUP_PARTITIONS_ADJUST_MODE);
+
+        var upperBoundForAlignment = Math.min(numKeyGroupsOrPartitions, upperBound);


The optimization to use half of the keyspace when the parallelism is less/equal to the keyspace doesn't work for the new parallelism adjustment mode. It was anyways just a shortcut, to avoid checking all divisors until the maximum. No harm removing it.

flink-autoscaler/src/main/java/org/apache/flink/autoscaler/JobVertexScaler.java

…ition remains unchanged

1996fanrui

Thanks @huyuanfeng2018 for the update!

LGTM

mxm · 2024-11-20T16:49:04Z

Thanks @huyuanfeng2018!

huyuanfeng2018 · 2024-11-21T01:54:34Z

Thanks for the review @mxm @1996fanrui @SamBarker

SamBarker reviewed Oct 17, 2024

View reviewed changes

huyuanfeng2018 marked this pull request as draft October 17, 2024 09:07

huyuanfeng2018 marked this pull request as ready for review October 21, 2024 10:43

SamBarker reviewed Oct 23, 2024

View reviewed changes

huyuanfeng2018 requested a review from SamBarker October 24, 2024 03:36

SamBarker approved these changes Oct 24, 2024

View reviewed changes

1996fanrui reviewed Oct 28, 2024

View reviewed changes

1996fanrui self-assigned this Oct 28, 2024

mxm reviewed Oct 28, 2024

View reviewed changes

1996fanrui reviewed Oct 30, 2024

View reviewed changes

flink-autoscaler/src/main/java/org/apache/flink/autoscaler/ParallelismAdjuster.java Outdated Show resolved Hide resolved

flink-autoscaler/src/main/java/org/apache/flink/autoscaler/ParallelismAdjuster.java Outdated Show resolved Hide resolved

huyuanfeng2018 force-pushed the FLINK-36527 branch 2 times, most recently from a9b7555 to 5570270 Compare November 13, 2024 11:03

1996fanrui reviewed Nov 13, 2024

View reviewed changes

huyuanfeng2018 force-pushed the FLINK-36527 branch from 5570270 to 0aa6795 Compare November 15, 2024 02:47

mxm reviewed Nov 15, 2024

View reviewed changes

[FLINK-36527][autoscaler] Introduce a parameter to support autoscaler…

a339203

… adopt a more radical strategy when source vertex or upstream shuffle is keyBy

huyuanfeng2018 force-pushed the FLINK-36527 branch from 0aa6795 to a339203 Compare November 19, 2024 02:17

huyuanfeng2018 requested review from 1996fanrui and mxm November 19, 2024 12:27

1996fanrui reviewed Nov 19, 2024

View reviewed changes

mxm reviewed Nov 19, 2024

View reviewed changes

[FLINK-36527][autoscaler] keep var upperBoundForAlignment = ... pos…

f18634a

…ition remains unchanged

huyuanfeng2018 force-pushed the FLINK-36527 branch from 5c543f1 to f18634a Compare November 19, 2024 14:56

1996fanrui approved these changes Nov 20, 2024

View reviewed changes

mxm merged commit acc04da into apache:main Nov 20, 2024
104 checks passed

[FLINK-36527][autoscaler] Introduce a parameter to support autoscaler adopt a more radical strategy when source vertex or upstream shuffle is keyBy #904

[FLINK-36527][autoscaler] Introduce a parameter to support autoscaler adopt a more radical strategy when source vertex or upstream shuffle is keyBy #904

Uh oh!

Conversation

huyuanfeng2018 commented Oct 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What is the purpose of the change

Brief change log

Verifying this change

Does this pull request potentially affect one of the following parts:

Documentation

Uh oh!

SamBarker left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

huyuanfeng2018 commented Oct 17, 2024

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SamBarker left a comment

Choose a reason for hiding this comment

Uh oh!

huyuanfeng2018 commented Oct 28, 2024

Uh oh!

1996fanrui left a comment

Choose a reason for hiding this comment

Uh oh!

mxm left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

1996fanrui left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

huyuanfeng2018 commented Oct 30, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mxm left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

huyuanfeng2018 commented Nov 18, 2024

Uh oh!

mxm commented Nov 18, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

huyuanfeng2018 commented Oct 17, 2024 •

edited

Loading