Conversation
--raw -- num-iovs=2 uccl_benchmark_readwrite.py --async-api --num-iovs=2 uccl_benchmark_readwrite.py |
|
uccl/p2p/benchmarks/benchmark_ray_p2p.py Lines 142 to 172 in b4247a1 For each iteration in
uccl/p2p/benchmarks/benchmark_ray_p2p.py Lines 79 to 197 in b4247a1 the server:
|
|
Nice work @zhongjiechen! Is this benchmark done on AMD servers? Is the performance gap (compared to 50G) because of out-of-band communication? |
Yes, AMD servers. Because of OOB for exchanging descriptors and synchronizing iterations. And it should also contain the overhead of |
|
Got you! Is it possible to also measure the pure RDMA transfer performance in this benchmark? |
Description
Integration with Ray
Type of Change
How Has This Been Tested?
Include any tests here.
Checklist
format.sh.build_and_install.shto verify compilation.