Suppose you have a CPU bound task: "Count how many times you can calculate the square root in 10 secs"
When I tried a serial implementation, it was faster than 2 workers, 3 workers, 4 workers of doing the same task. Did not try with more workers.
Is this expected? I supposed I would see a linear speed up since these works do not send a message until the 10secs pass so I expected almost 0 overhead.