-
Notifications
You must be signed in to change notification settings - Fork 363
Description
Hi,
while working with ib_write_bw, I noticed that it fails to reach high throughputs if the message size (given by -s) is not divisible by a certain small power of 2.
Towards the end I tested with these parameters: --rate_limit 60 --rate_limit_type SW -D 20 --burst_size 1 -s 65600, same for server and client. With -s 65600 (multiple of 64), it is able to reach 60Gbit/s, but with -s 65568 (multiple of 32, not 64) the throughput is consistently significantly lower (around 53Gbit/s). And with -s 65600 on the server and -s 65568 on the client, it is also able to reach 60Gbit/s, while message sizes on the wire are still 65568.
After looking through the code, the issue seems to be because the RDMA receive memory address is not cache-aligned. In perftest_communication.c:914, the sge vaddr given to the client is the second half of ctx->buf, but without the cache line size alignment present in other parts of the code. Changing
my_dest[i].vaddr = (uintptr_t)ctx->buf[0] + (user_param->num_of_qps + i)*BUFF_SIZE(ctx->size,ctx->cycle_buffer);to
my_dest[i].vaddr = (uintptr_t)ctx->buf[0] + (user_param->num_of_qps + i)*INC(BUFF_SIZE(ctx->size,ctx->cycle_buffer), ctx->cache_line_size);resolves the issue. I'm not very familiar with the code, so there are probably other places where this alignment is missing that I'm not aware of.
Have a nice day!