Open
Conversation
…itialization" This reverts commit e4cd0f7.
…shment with specific interface
Issue: When send_rcredit flag is set to 1 then credit_mr, ctrl_buf, ctrl_sge_list and ctrl_wr are allocated.
However all these entities are deallocated under additional conditions, like work_rdma_cm == ON.
This is not always true, ib_send_bw can be run without rdma cm and that leads to error message during PD deallocation:
"Failed to deallocate PD - Device or resource busy"
Fix: If send_rcredit flag was used during entities creation it is also used during deallocation without additional conditions.
Modify --source_ip to --bind_sounce_ip to fix init connection establishment with specific interface
Revert "Perftest: replace rand() with getrandom() during MR buffer initialization"
Fix issue with PD deallocation.
When using send verb with shared queue in perftest the WQE length is being set as message size, this is harmful to the scatter entry caching feature. This commit align WQE length to MTU so it will enhance the caching and will result in better performance. Signed-off-by: Hassan Khadour <hkhadour@nvidia.com>
Signed-off-by: Hassan Khadour <hkhadour@nvidia.com>
Currenty initial negotiation performed via ipv4 which is not suitable for modern ipv6 only topology. This patch allow to specify which address family to use, default behaviour not changed. New option: --ipv6-addr Usage example: ./ib_write_bw -d mlx5_0 --ipv6-addr ./ib_write_bw -d mlx5_0 --ipv6-addr 2a02:6b8:c0e:97f:0:441d:9fbd:3f1e signed-off-by: Dmitry Monakhov <dmonakhov@gmail.com> Signed-off-by: Hassan Khadour <hkhadour@nvidia.com>
Fix and optimize some code sections in initial communication functions. Signed-off-by: Hassan Khadour <hkhadour@nvidia.com>
Add ipv6 address support for initial communication.
When rdmacm is not used, ctx_close_connection() does a handshake and
then sends "done" over the socket before closing it. This is racy and
can lead to a spurious error:
HOST A HOST B
ctx_close_connection()
ctx_close_connection()
ctx_hand_shake() [ succeeds ]
ctx_hand_shake() [ succeeds ]
write(sockfd, "done")
close(sockfd)
write(sockfd, "done") <-- fails since HOST B has closed the
socket and replies with a TCP RST
Fix this simply by deleting the write(). The ctx_hand_shake() already
ensures the two sides are in sync and can proceed to the close().
Signed-off-by: Roland Dreier <roland@enfabrica.net>
Neuron introduced an API for exporting DMA-buffers for allocated tensor address. The API introduced on Neuron runtime library 2.13.6. Add Neuron dmabuf support and an additional flag to specify whether to use DMA buffers or peer to peer communication. Signed-off-by: Yonatan Nachum <ynachum@amazon.com>
Neuron and Habana HW accelerator flags doesn't appear in the man page, add them. Signed-off-by: Yonatan Nachum <ynachum@amazon.com>
Fix ib_send_bw bidir duration mode case to check if the bandwidth really crossed limit_bw. Fixes: 3528004 Signed-off-by: Hassan Khadour <hkhadour@nvidia.com>
Add support for Neuron HW accelerator DMA buffers
Fix race in non-rdmacm ctx_close_connection()
Signed-off-by: Hassan Khadour <hkhadour@nvidia.com>
Signed-off-by: Hassan Khadour <hkhadour@nvidia.com>
For RoCE, the udp sport is randomly selected when flow_label is 0. This makes the traffic go through different path when trying to run test. Add flow_label option to let it use same udp sport to go through same path when running test. Signed-off-by: Liu, Changcheng <jerrliu@nvidia.com>
perftest: support set flow_label through env variable FLOW_LABEL
Update cuda_memory_init to error out if CuDeviceGetByPCIBusId fails. Otherwise, it will silently pick up device 0 (ie, value taken from perftest_parameters which is memset to 0 initially).
Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com>
Error messages should not be printed to stdout. Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com>
Error out if CuDeviceGetByPCIBusId fails
avoid unnecessary memory allocation when "mr_per_qp" flag is not set
Fix output of error messages
Update perftest_resources.c
In perform_warm_up mode, if the length of post_list is 1 and the message size is less than or equal to 8192, all send_flags in WRs are 0 and CQEs will not be generated since IBV_SEND_SIGNALED is not set. As a result, the perform_warm_up process will stuck in an infinite poll-CQ loop. Set IBV_SEND_SIGNALED in this case to requiring CQE, and clear the flag after post_send_method to avoid affecting subsequent tests. Fixes: 56d025e ("Allow overriding CQ moderation on post list mode (linux-rdma#58)") Signed-off-by: Guofeng Yue <yueguofeng@h-partners.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com>
Two bugfixes
Perftest: Do not align SRQ recv length to MTU for hns
Cuda: Use pcie mapping regardless of data direct
To build with MLU DMA-BUF support, use: ./configure --enable-mlu --with-mlu=</usr/local/neuware> To run with MLU DMA-BUF enabled, use: ib_write_bw --use_mlu=<device id> --use_mlu_dmabuf Signed-off-by: hancheng <hancheng@cambricon.com>
Signed-off-by: hancheng <hancheng@cambricon.com>
Add GitHub actions support. Add a test to build perftest on top of ubuntu24.04 and cuda12.9 to verify build process. Signed-off-by: Shmuel Shaul <sshaul@nvidia.com>
Perftest: Add GitHub actions support
Currently perftest support null-mr over client only (sender). this commit add support over server side (receiver). Signed-off-by: Shmuel Shaul <sshaul@nvidia.com>
The device IDs of some of our equipment overlap with Mellanox's, so we need to add vendor ID and dev types for proper differentiation. signed-off-by: tianx@yunsilicon.com
Add support for DMA-buffers in Cambricon devices
Added TCU support
add Yunsilicon dev types
mlx5 driver enables SCATETR2CQE feature by default – up to 64B payloads. Meaning, messages that are up-to 64B, can be scattered to CQE on the responder side. Peer memory in general, and specifically here with GPUDirect enabled doesn’t work with this feature, and it must be disabled. Signed-off-by: Shmuel Shaul <sshaul@nvidia.com>
Enabling the HAVE_MLX5DV and HAVE_OOO_RECV_WRS flags during perftest compilation leads to an issue when testing non-Mellanox devices. Specifically, the create_qp function will invoke mlx5dv_query_device, which is intended for Mellanox devices. This call will cause the test to terminate prematurely, as third-party devices do not support the mlx5dv interface This commit bypass the mlx5dv_query_device if using non-mlnx device. Signed-off-by: Shmuel Shaul <sshaul@nvidia.com>
Warn if blueflame is not supported, as it can impact latency. Adding the print to all latency tests. Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Shmuel Shaul <sshaul@nvidia.com>
Signed-off-by: Shmuel Shaul <sshaul@nvidia.com>
This commit refactors the CUDA integration in Perftest by dynamically loading the CUDA library (`libcuda.so`) instead of linking it statically. Changes include: - Introduced `cuda_loader.c` to handle dynamic loading of CUDA functions. - Modified `cuda_memory.c` to use dynamically loaded function pointers instead of direct CUDA API calls. - Ensured proper cleanup of resources by introducing `unload_cuda_library()`. - Find CUDA header path automatically and set related defines if exists. This change increases flexibility, allowing Perftest to be compiled over systems with cuda and run on both systems with/without CUDA. Signed-off-by: Shmuel Shaul <sshaul@nvidia.com>
…ng on really slow results
Collaborator
Author
|
Only last five commit are the change, others from new perftest version. |
antgun42
requested changes
Sep 8, 2025
| #define USEC "usec" | ||
| /* The format of the results */ | ||
|
|
||
| #define RESULT_FMT " #bytes #iterations BW peak[MB/sec] BW average[MB/sec] MsgRate[Mpps] BW min[MB/sec]" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.