Skip to content

Conversation

shijiesheng
Copy link
Member

@shijiesheng shijiesheng commented Jul 17, 2025

What changed?

  • rename pollerCount to less confusing name pollerCountWithoutAutoscaling
  • ensure starting enough goroutines for pollers when autoscaler is enabled

Why?

Consider a case when autoscaler is enabled with 40 max, 10 initial, 2 min. We need to start 40 poller goroutines first and only allow 10 to pass initially.

This is achieved by starting 40 go bw.runPoller() goroutines and set PollerPermit to 10.

The bug we have now is we only start 10 goroutines at the beginning so it just won't scale up.

How did you test it?

bench test with stable tasklist traffic.

Pollers are 2 intially with 20 instances with autoscaler enabled.
After scaling down 20 -> 10 and 10 -> 2. We saw the poller quota increased to cover the loss of hosts. Schedule to start latency is maintained.

Screenshot 2025-07-17 at 4 04 02 PM

Potential risks

Additional goroutines will be created but it's ok because we are already creating a max of 10k goroutines by default for sticky executions.

@shijiesheng shijiesheng merged commit 1785f78 into cadence-workflow:master Jul 18, 2025
9 of 10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants