-
Notifications
You must be signed in to change notification settings - Fork 30
Description
(Moved to swarm from sociomantic-tsunami/dhtproto#183.)
An application that has been recently converted to use DHT neo Put instead of legacy Put is reportedly performing noticeably worse than before.
- The app works by assigning a large batch of Put requests once, periodically.
- In the legacy client, these would be pushed into the request queues (which were configured to be exceptionally large).
- A Task is associated with each Put request, leading to a large pool of tasks.
- When using neo Put, the application's Task pool overflows.
Tests with the affected app indicate that the task scheduler's number of worker fibers seems to be the key thing here: it was set high, and if it is reduced to 10 or 20 the time per Put request is greatly reduced. The likely reason is that with too high a number of worker fibers, it is hard to resume the send and receive fibers of the neo swarm client, hence data throughput is greatly reduced.
A simple test that spawns a Task for each request confirms this: the number of worker fibers definitely impacts the speed at which neo requests complete:
- 5 worker fibers: average Put completion time ~85μs.
- 10 worker fibers: average Put completion time ~175μs.
- 20 worker fibers: average Put completion time ~400μs.
- 100 worker fibers: average Put completion time ~2,000μs.
- 1,000 worker fibers: average Put completion time ~18,000μs.
This problem is not specific to the DHT or to Put requests, of course, but indicates a general interaction between the connection send/receive fibers in swarm and the scheduler's worker fibers.
In practice, this only has a significant impact in test systems where the number of nodes in the DHT (hence the number of connections and send/receive fibers) is small relative to the number of worker fibers.