Rebase and merge min_bw from downstream by liayan · Pull Request #4 · coreweave/perftest

liayan · 2025-08-21T14:39:53Z

No description provided.

…itialization" This reverts commit e4cd0f7.

…shment with specific interface

Issue: When send_rcredit flag is set to 1 then credit_mr, ctrl_buf, ctrl_sge_list and ctrl_wr are allocated. However all these entities are deallocated under additional conditions, like work_rdma_cm == ON. This is not always true, ib_send_bw can be run without rdma cm and that leads to error message during PD deallocation: "Failed to deallocate PD - Device or resource busy" Fix: If send_rcredit flag was used during entities creation it is also used during deallocation without additional conditions.

Modify --source_ip to --bind_sounce_ip to fix init connection establishment with specific interface

Revert "Perftest: replace rand() with getrandom() during MR buffer initialization"

Fix issue with PD deallocation.

When using send verb with shared queue in perftest the WQE length is being set as message size, this is harmful to the scatter entry caching feature. This commit align WQE length to MTU so it will enhance the caching and will result in better performance. Signed-off-by: Hassan Khadour <hkhadour@nvidia.com>

Signed-off-by: Hassan Khadour <hkhadour@nvidia.com>

Currenty initial negotiation performed via ipv4 which is not suitable for modern ipv6 only topology. This patch allow to specify which address family to use, default behaviour not changed. New option: --ipv6-addr Usage example: ./ib_write_bw -d mlx5_0 --ipv6-addr ./ib_write_bw -d mlx5_0 --ipv6-addr 2a02:6b8:c0e:97f:0:441d:9fbd:3f1e signed-off-by: Dmitry Monakhov <dmonakhov@gmail.com> Signed-off-by: Hassan Khadour <hkhadour@nvidia.com>

Fix and optimize some code sections in initial communication functions. Signed-off-by: Hassan Khadour <hkhadour@nvidia.com>

Add ipv6 address support for initial communication.

When rdmacm is not used, ctx_close_connection() does a handshake and then sends "done" over the socket before closing it. This is racy and can lead to a spurious error: HOST A HOST B ctx_close_connection() ctx_close_connection() ctx_hand_shake() [ succeeds ] ctx_hand_shake() [ succeeds ] write(sockfd, "done") close(sockfd) write(sockfd, "done") <-- fails since HOST B has closed the socket and replies with a TCP RST Fix this simply by deleting the write(). The ctx_hand_shake() already ensures the two sides are in sync and can proceed to the close(). Signed-off-by: Roland Dreier <roland@enfabrica.net>

Neuron introduced an API for exporting DMA-buffers for allocated tensor address. The API introduced on Neuron runtime library 2.13.6. Add Neuron dmabuf support and an additional flag to specify whether to use DMA buffers or peer to peer communication. Signed-off-by: Yonatan Nachum <ynachum@amazon.com>

Neuron and Habana HW accelerator flags doesn't appear in the man page, add them. Signed-off-by: Yonatan Nachum <ynachum@amazon.com>

Fix ib_send_bw bidir duration mode case to check if the bandwidth really crossed limit_bw. Fixes: 3528004 Signed-off-by: Hassan Khadour <hkhadour@nvidia.com>

Add support for Neuron HW accelerator DMA buffers

Fix race in non-rdmacm ctx_close_connection()

Signed-off-by: Hassan Khadour <hkhadour@nvidia.com>

For RoCE, the udp sport is randomly selected when flow_label is 0. This makes the traffic go through different path when trying to run test. Add flow_label option to let it use same udp sport to go through same path when running test. Signed-off-by: Liu, Changcheng <jerrliu@nvidia.com>

perftest: support set flow_label through env variable FLOW_LABEL

Update cuda_memory_init to error out if CuDeviceGetByPCIBusId fails. Otherwise, it will silently pick up device 0 (ie, value taken from perftest_parameters which is memset to 0 initially).

Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com>

Error messages should not be printed to stdout. Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com>

Error out if CuDeviceGetByPCIBusId fails

avoid unnecessary memory allocation when "mr_per_qp" flag is not set

Fix output of error messages

Update perftest_resources.c

In perform_warm_up mode, if the length of post_list is 1 and the message size is less than or equal to 8192, all send_flags in WRs are 0 and CQEs will not be generated since IBV_SEND_SIGNALED is not set. As a result, the perform_warm_up process will stuck in an infinite poll-CQ loop. Set IBV_SEND_SIGNALED in this case to requiring CQE, and clear the flag after post_send_method to avoid affecting subsequent tests. Fixes: 56d025e ("Allow overriding CQ moderation on post list mode (linux-rdma#58)") Signed-off-by: Guofeng Yue <yueguofeng@h-partners.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com>

Two bugfixes

Perftest: Do not align SRQ recv length to MTU for hns

Cuda: Use pcie mapping regardless of data direct

To build with MLU DMA-BUF support, use: ./configure --enable-mlu --with-mlu=</usr/local/neuware> To run with MLU DMA-BUF enabled, use: ib_write_bw --use_mlu=<device id> --use_mlu_dmabuf Signed-off-by: hancheng <hancheng@cambricon.com>

Signed-off-by: hancheng <hancheng@cambricon.com>

Add GitHub actions support. Add a test to build perftest on top of ubuntu24.04 and cuda12.9 to verify build process. Signed-off-by: Shmuel Shaul <sshaul@nvidia.com>

Perftest: Add GitHub actions support

Currently perftest support null-mr over client only (sender). this commit add support over server side (receiver). Signed-off-by: Shmuel Shaul <sshaul@nvidia.com>

The device IDs of some of our equipment overlap with Mellanox's, so we need to add vendor ID and dev types for proper differentiation. signed-off-by: tianx@yunsilicon.com

Add support for DMA-buffers in Cambricon devices

Added TCU support

add Yunsilicon dev types

mlx5 driver enables SCATETR2CQE feature by default – up to 64B payloads. Meaning, messages that are up-to 64B, can be scattered to CQE on the responder side. Peer memory in general, and specifically here with GPUDirect enabled doesn’t work with this feature, and it must be disabled. Signed-off-by: Shmuel Shaul <sshaul@nvidia.com>

Enabling the HAVE_MLX5DV and HAVE_OOO_RECV_WRS flags during perftest compilation leads to an issue when testing non-Mellanox devices. Specifically, the create_qp function will invoke mlx5dv_query_device, which is intended for Mellanox devices. This call will cause the test to terminate prematurely, as third-party devices do not support the mlx5dv interface This commit bypass the mlx5dv_query_device if using non-mlnx device. Signed-off-by: Shmuel Shaul <sshaul@nvidia.com>

Warn if blueflame is not supported, as it can impact latency. Adding the print to all latency tests. Signed-off-by: Maor Gottlieb <maorg@nvidia.com>

Signed-off-by: Shmuel Shaul <sshaul@nvidia.com>

This commit refactors the CUDA integration in Perftest by dynamically loading the CUDA library (`libcuda.so`) instead of linking it statically. Changes include: - Introduced `cuda_loader.c` to handle dynamic loading of CUDA functions. - Modified `cuda_memory.c` to use dynamically loaded function pointers instead of direct CUDA API calls. - Ensured proper cleanup of resources by introducing `unload_cuda_library()`. - Find CUDA header path automatically and set related defines if exists. This change increases flexibility, allowing Perftest to be compiled over systems with cuda and run on both systems with/without CUDA. Signed-off-by: Shmuel Shaul <sshaul@nvidia.com>

…er X iterations

…ng on really slow results

liayan · 2025-08-22T15:27:00Z

Only last five commit are the change, others from new perftest version.
It passed all ib tests from roce environment.
Will test it on non-roce nodes later.

antgun42 · 2025-09-08T09:49:46Z

src/perftest_parameters.h

 #define USEC	"usec"
 /* The format of the results */

+#define RESULT_FMT		" #bytes     #iterations    BW peak[MB/sec]    BW average[MB/sec]   MsgRate[Mpps]       BW min[MB/sec]"


This seems redundant

Huai-En, Tseng and others added 30 commits May 18, 2023 10:57

add printf

ba4580a

Revert "Perftest: replace rand() with getrandom() during MR buffer in…

4b9c639

…itialization" This reverts commit e4cd0f7.

modify --source_ip to --bind_sounce_ip to fix init connection establi…

8ff29c1

…shment with specific interface

Merge pull request linux-rdma#210 from w180112/master

9711d16

Modify --source_ip to --bind_sounce_ip to fix init connection establishment with specific interface

Merge pull request linux-rdma#213 from HassanKhadour/master

4ad453f

Revert "Perftest: replace rand() with getrandom() during MR buffer initialization"

Merge pull request linux-rdma#215 from pim-pesochek/master

0fc987c

Fix issue with PD deallocation.

Perftest: Version increase to 6.16

7d00c4b

Signed-off-by: Hassan Khadour <hkhadour@nvidia.com>

Perftest: Fix and optimize initial communication functions.

abf6dd9

Fix and optimize some code sections in initial communication functions. Signed-off-by: Hassan Khadour <hkhadour@nvidia.com>

Merge pull request linux-rdma#218 from HassanKhadour/master

56a53ec

Add ipv6 address support for initial communication.

Add missing HW accelerator flags to perftest's man

9c4f8ed

Neuron and Habana HW accelerator flags doesn't appear in the man page, add them. Signed-off-by: Yonatan Nachum <ynachum@amazon.com>

Perftest: Fix limit_bw in ib_send_bw bidir traffic duration mode

48c0974

Fix ib_send_bw bidir duration mode case to check if the bandwidth really crossed limit_bw. Fixes: 3528004 Signed-off-by: Hassan Khadour <hkhadour@nvidia.com>

Merge pull request linux-rdma#222 from YonatanNachum/neuron

47a7de5

Add support for Neuron HW accelerator DMA buffers

Merge pull request linux-rdma#220 from rolandd/fix-shutdown

1e211ce

Fix race in non-rdmacm ctx_close_connection()

Perftest: Version increase to 6.17

c1f8d3b

Signed-off-by: Hassan Khadour <hkhadour@nvidia.com>

Perftest: changing spec file version to 23.10.0.

e3f0700

Signed-off-by: Hassan Khadour <hkhadour@nvidia.com>

Merge pull request linux-rdma#224 from changchengx/flow_label

5856a7f

perftest: support set flow_label through env variable FLOW_LABEL

Error out if CuDeviceGetByPCIBusId fails

59fb505

Update cuda_memory_init to error out if CuDeviceGetByPCIBusId fails. Otherwise, it will silently pick up device 0 (ie, value taken from perftest_parameters which is memset to 0 initially).

Perftest: Add missing newline characters for error messages

5b7407d

Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com>

Perftest: Print an error message to stderr

0b5bfb6

Error messages should not be printed to stdout. Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com>

Minor update to error log

8c0f1cc

Merge pull request linux-rdma#227 from jithinjosepkl/patch-1

c1a401f

Error out if CuDeviceGetByPCIBusId fails

Update perftest_resources.c

8eef5cb

avoid unnecessary memory allocation when "mr_per_qp" flag is not set

Merge pull request linux-rdma#228 from ddmatsu/newlines

4d05679

Fix output of error messages

Merge pull request linux-rdma#229 from ecjtusbs/master

5607b0b

Update perftest_resources.c

Guofeng Yue and others added 27 commits May 20, 2025 19:13

Merge pull request linux-rdma#323 from hginjgerx/td

640b064

Two bugfixes

Merge pull request linux-rdma#324 from hginjgerx/srq

0b26bc3

Perftest: Do not align SRQ recv length to MTU for hns

Merge pull request linux-rdma#325 from dkkranz/use_pcie_mapping

abc99f2

Cuda: Use pcie mapping regardless of data direct

Add support for DMA-buffers in Cambricon devices

0bdfc10

To build with MLU DMA-BUF support, use: ./configure --enable-mlu --with-mlu=</usr/local/neuware> To run with MLU DMA-BUF enabled, use: ib_write_bw --use_mlu=<device id> --use_mlu_dmabuf Signed-off-by: hancheng <hancheng@cambricon.com>

Perftest supports MLU latency tests with read/send verbs only

0490645

Signed-off-by: hancheng <hancheng@cambricon.com>

Perftest: Add GitHub actions support

9569a3f

Add GitHub actions support. Add a test to build perftest on top of ubuntu24.04 and cuda12.9 to verify build process. Signed-off-by: Shmuel Shaul <sshaul@nvidia.com>

Merge pull request linux-rdma#329 from sshaulnv/master

64330bb

Perftest: Add GitHub actions support

Perftest: Add null-mr support over server side

f961e40

Currently perftest support null-mr over client only (sender). this commit add support over server side (receiver). Signed-off-by: Shmuel Shaul <sshaul@nvidia.com>

Added TCU support

ae68798

add Yunsilicon dev types

2af4110

The device IDs of some of our equipment overlap with Mellanox's, so we need to add vendor ID and dev types for proper differentiation. signed-off-by: tianx@yunsilicon.com

Merge pull request linux-rdma#320 from hc235280/support_ib_send_lat

4d645d4

Add support for DMA-buffers in Cambricon devices

Merge pull request linux-rdma#330 from iyangsj/master

ce4f20c

Added TCU support

Merge branch 'master' into master

e503581

Merge pull request linux-rdma#331 from tianx666/master

9600494

add Yunsilicon dev types

Perftest: Add print in case blueflame is not supported

5be2b4e

Warn if blueflame is not supported, as it can impact latency. Adding the print to all latency tests. Signed-off-by: Maor Gottlieb <maorg@nvidia.com>

Perftest: Version increase to 6.26

4cef9b5

Signed-off-by: Shmuel Shaul <sshaul@nvidia.com>

Perftest: changing spec file version to 25.07.0

14ae7a0

Signed-off-by: Shmuel Shaul <sshaul@nvidia.com>

feat(perftest): add --report-min-bw=X to measure the min bandwidth ov…

89d0d54

…er X iterations

feat(perftest): report_min_bw_cycles must be uint64 to prevent wrappi…

c5c58ba

…ng on really slow results

fix(perftest): more simple and robust logic for measuring batch duration

97510f1

feat(perftest): add report-min-bw to man page

d21ca8d

feat(perftest): check dependencies for report-min-bw

0adb524

Merge branch 'master' into ly/merge-min-bw-from-downstream

c7e9508

liayan requested a review from antgun42 August 22, 2025 15:25

antgun42 requested changes Sep 8, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rebase and merge min_bw from downstream#4

Rebase and merge min_bw from downstream#4
liayan wants to merge 200 commits intomasterfrom
ly/merge-min-bw-from-downstream

liayan commented Aug 21, 2025

Uh oh!

liayan commented Aug 22, 2025

Uh oh!

antgun42 Sep 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Conversation

liayan commented Aug 21, 2025

Uh oh!

liayan commented Aug 22, 2025

Uh oh!

antgun42 Sep 8, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants