-
Notifications
You must be signed in to change notification settings - Fork 141
rework on poller auto scaler #1411
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rework on poller auto scaler #1411
Conversation
82e6bb6
to
636c433
Compare
Codecov ReportAttention: Patch coverage is
Continue to review full report in Codecov by Sentry.
🚀 New features to boost your workflow:
|
to stick it in here too: overall looks pretty good. simpler and the overall goal (and why it achieves it) is clearer too. seems like just minor tweaks (many optional) and it's probably good to go |
8438696
to
69f7e3f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good, but I left some nits
internal/internal_worker_base.go
Outdated
|
||
if bw.pollerAutoScaler != nil { | ||
if bw.concurrencyAutoScaler != nil { | ||
if pErr := bw.concurrency.PollerPermit.Acquire(bw.limiterContext); pErr == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this looks like a leaking abstraction. This should be handled inside concurrencyAutoScaler.
I suggest moving all
concurrencyAutoScaler != nil checks inside methods where it is required.
This code should be simpler. Just calling methods on autoscaler. If it is nil, do nothing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make sense
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we are guarding calls againts bw.concurrency
based on nilness of bw.concurrencyAutoScaler
which indicates that these two should be abstracted behind a single interface to avoid additional complexity in this file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i've removed this check in all places. Regarding the comment hese two should be abstracted behind a single interface to avoid additional complexity in this file
, I still think this is two separate entities. Client still needs concurrency whether autoscaler is enabled or not.
a9d3781
to
52dd229
Compare
1d733a5
to
e7776be
Compare
internal/internal_worker_base.go
Outdated
|
||
if bw.pollerAutoScaler != nil { | ||
if bw.concurrencyAutoScaler != nil { | ||
if pErr := bw.concurrency.PollerPermit.Acquire(bw.limiterContext); pErr == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we are guarding calls againts bw.concurrency
based on nilness of bw.concurrencyAutoScaler
which indicates that these two should be abstracted behind a single interface to avoid additional complexity in this file
internal/internal_worker_base.go
Outdated
return t.autoConfigHint | ||
default: | ||
return nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of this switch case (which is not future proof), we can cast the task to autoConfigHintAwareTask
interface and get the auto config hint
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've removed this to use autoConfigHintAwareTask
lowerPollerWaitTime = 16 * time.Millisecond | ||
upperPollerWaitTime = 256 * time.Millisecond |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it looks like we would want to iterate on these to adjust sensitivity. consider exposing these to worker config
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The poller wait time is an invariant. User doesn't need to tune it. The sensitivity control (time-to-react) is actually controlled by the Cooldown which is already in the parameter
}, | ||
}, | ||
{ | ||
"idl pollers waiting for tasks", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: typo idle. same in other cases below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
coverage failed due to deprecation changes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dropping notes for now, while reading tests carefully 👍
overall looks pretty good I think - fairly easy to follow, behavior looks good (e.g. up to 4x growth when "instant", 0.5x shrink when slow, one scale change every 10 seconds sounds reasonable), everything's pretty close.
so just a small pile of minor stuff, some nits some not.
autoScalerEventStart autoScalerEvent = "auto-scaler-start" | ||
autoScalerEventStop autoScalerEvent = "auto-scaler-stop" | ||
autoScalerEventLogMsg string = "concurrency auto scaler event" | ||
testTimeFormat string = "15:04:05" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
testTimeFormat string = "15:04:05" |
"busy pollers, scale up to maximum", | ||
[]*shared.AutoConfigHint{ | ||
{common.PtrOf(true), common.PtrOf(int64(0))}, // <- tick, in cool down | ||
{common.PtrOf(true), common.PtrOf(int64(0))}, // <- tick, scale down to minimum | ||
}, | ||
[]eventLog{ | ||
{autoScalerEventStart, false, 100, "00:00:00"}, | ||
{autoScalerEventEnable, true, 100, "00:00:00"}, | ||
{autoScalerEventPollerSkipUpdateCooldown, true, 100, "00:00:01"}, | ||
{autoScalerEventPollerScaleUp, true, 200, "00:00:02"}, | ||
{autoScalerEventStop, true, 200, "00:00:02"}, | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might be easier to follow the actual behavior of this one with a less-than-1/2-maximum set of values, e.g. start with 10 rather than 100 -> it won't scale to maximum, it'll scale to 42.
kinda similar for others below, e.g. pollers, scale up and down multiple times
becomes:
{autoScalerEventStart, false, 10, "00:00:00"},
{autoScalerEventEnable, true, 10, "00:00:00"},
{autoScalerEventPollerSkipUpdateCooldown, true, 10, "00:00:01"},
{autoScalerEventPollerScaleUp, true, 42, "00:00:02"},
{autoScalerEventPollerSkipUpdateCooldown, true, 42, "00:00:03"},
{autoScalerEventPollerScaleDown, true, 25, "00:00:04"},
{autoScalerEventPollerSkipUpdateCooldown, true, 25, "00:00:05"},
{autoScalerEventPollerScaleUp, true, 104, "00:00:06"},
{autoScalerEventPollerSkipUpdateCooldown, true, 104, "00:00:07"},
{autoScalerEventPollerScaleDown, true, 63, "00:00:08"},
{autoScalerEventStop, true, 63, "00:00:08"},
which seems a bit more informative than "to max, down, back to max, back to same down value"
…ep backward compatibility and fix test cases and other comments
eda2abf
to
7895fd4
Compare
Detailed Description
Improve performance of poller auto scaler by using more accurate scaling signals and several implementation changes.
Changes
AutoScalerOptions
is introduced.Impact Analysis
Testing Plan
Rollout Plan
What is the rollout plan?
For Uber services, standard client release steps
For OSS users, turn off autoscaler feature first before the client upgrade.
Does the order of deployment matter? No
Is it safe to rollback? Does the order of rollback matter? Yes
Is there a kill switch to mitigate the impact immediately? Yes, the new autoscaler feature is an opt-in feature.