Performance Regression from 1.45.1 to 1.46

When initially updating Tokio from 1.45 to 1.47 (later problem bisected to PR: #7342), I experienced a very large performance regression in a custom thread pool designed to process CPU heavy tasks.

The tread pool's code is fully available [here](https://github.com/getsentry/relay/tree/master/relay-threading).

A high level overview:

- The thread pool manages N threads (`std::thread::`)
- Tasks between threads are shared with a `flume` mpmc channel, which is bounded to `N*2`.
- Each thread invokes Tokio's [`Runtime::block_on`](https://github.com/getsentry/relay/blob/master/relay-threading/src/pool.rs#L186-L203)
- Each thread [runs an async task which](https://github.com/getsentry/relay/blob/master/relay-threading/src/multiplexing.rs#L155-L206):
  1. Polls a list of tasks stored in an `FuturesUnordered` until the `FuturesUnordered` returns `Pending`
  2. Polls a task from the flume channel
  3. Inserts the task into the `FuturesUnrodered` and starts again

Profiles taken (with `perf`):
- Good (1.45): https://share.firefox.dev/3INrj08
- Bad (1.47): https://share.firefox.dev/3WK0MUw

Unfortunately I did not get a small reproduction done, I bisected tokio commits and load testing the full binary.

A random guess, there are now spurious (or constant) wake ups. As it looks like the pool still gets all of its work done, but when it should be idle it is burning through CPU.

### Real Data

Bisected Tokio commits (in the commit message) and the commit hash of the associated builds: 
<img width="1274" height="171" alt="Image" src="https://github.com/user-attachments/assets/c082e5d1-d6e5-4d57-80b1-4ecff071ec11" />
<img width="1236" height="191" alt="Image" src="https://github.com/user-attachments/assets/ea6b9e8d-95d2-42d9-a6c2-9820b5da08f9" />

Transition from one version to the next:
<img width="520" height="186" alt="Image" src="https://github.com/user-attachments/assets/1fba9752-6d34-4609-8fdc-2471a1abb5f7" />

CPU Usage:
<img width="533" height="189" alt="Image" src="https://github.com/user-attachments/assets/7354c7ff-eaf4-46f0-bb92-a56dcb544707" />

Internal Service and Thread Pool Utilization (measured on Time spent polling tasks), the top two lines are affected thread pools:
<img width="534" height="195" alt="Image" src="https://github.com/user-attachments/assets/53a5e1c5-d33d-4f67-924e-5999aacfa229" />

Only Thread pool utilization (again measured on each thread time spent in poll over time), two thread pools:
<img width="533" height="195" alt="Image" src="https://github.com/user-attachments/assets/f740112f-fc54-4e81-a8ab-d6f97dba4272" />

Average time spent per task in the bigger thread pool does not change (ignore the small section that increases, this is normal). Conclusion from this, time spent is not because the tasks changed:
<img width="544" height="193" alt="Image" src="https://github.com/user-attachments/assets/6fed7a32-21ea-494b-b913-c74c1c11eb1b" />


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Performance Regression from 1.45.1 to 1.46 #7692

Real Data

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Performance Regression from 1.45.1 to 1.46 #7692

Description

Real Data

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions