Instead of sizing the local transfer thread pool and bookeeper thread pool at 4096, they should be sized dynamically based on the formula that @stagraqubole outlined here:
rubix.pool.size.max=P
number-of-nodes=N
max-threads=P*N
So in a 100 node cluster, with rubix.pool.size.max=4, you can keep lower this value to 400.
You could introduce a config instead that expresses a percentage increase/decrease from this dynamically calculated size.
Having two thread pools of 4096 threads on top of the work already being done by a worker node leads worker nodes becoming unresponsive.