It appears that a large amount of time in async-executor is spent shuffling around tasks between different queues. It would be interesting if there are ways to make such stealing of half of a queue faster, perhaps by making such a function part of this crate and utilize knowledge about internals to make such "mass-moves" faster.