Refactor and enhance RDMA implementation (verbs and tcp providers)#400
Merged
Refactor and enhance RDMA implementation (verbs and tcp providers)#400
Conversation
Collaborator
moleksy
commented
Jun 17, 2025
- Introduced Verbs RDMA provider
- User can choose provider and number of endpoints used for transmission
- Introduced a new test suite for libfabric development, including comprehensive tests for RDMA initialisation and teardown.
- Added custom fake functions for RDMA address info handling to improve test isolation and reliability.
- Created a new header file for libfabric development tests to encapsulate test setup and teardown logic.
- Implemented performance testing for RDMA receiver and transmitter, focusing on TTLB and bandwidth metrics across varying payload and queue sizes.
- Enhanced the mcm_rdma_args structure to include the provider and the number of endpoints for improved configuration flexibility.
- Updated existing tests to remove unnecessary calls to fi_getinfo, ensuring tests focus on relevant functionality.
- Improved memory management in tests to prevent leaks and ensure proper cleanup of allocated resources.
- Introduced a new test suite for libfabric development, including comprehensive tests for RDMA initialization and teardown. - Added custom fake functions for RDMA address info handling to improve test isolation and reliability. - Created a new header file for libfabric development tests to encapsulate test setup and teardown logic. - Implemented performance testing for RDMA receiver and transmitter, focusing on TTLB and bandwidth metrics across varying payload and queue sizes. - Enhanced the mcm_rdma_args structure to include provider and number of endpoints for better configuration flexibility. - Updated existing tests to remove unnecessary calls to fi_getinfo, ensuring tests focus on relevant functionality. - Improved memory management in tests to prevent leaks and ensure proper cleanup of allocated resources.
Contributor
There was a problem hiding this comment.
Pull Request Overview
Refactors and enhances the RDMA provider implementation, adds configurable providers and multiple endpoints support, and bolsters test coverage with new physical-RDMA tests.
- Extend
mcm_rdma_argswithproviderandnum_endpoints - Refactor
Rdmato support multiple endpoints, round-robin CQ handling, and both "verbs" and "tcp" providers - Add dedicated physical-RDMA TX/RX test executables and expand unit tests for endpoint and device initialization
Reviewed Changes
Copilot reviewed 20 out of 20 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| sdk/include/mcm_dp.h | Add provider and num_endpoints to RDMA arguments |
| media-proxy/src/mesh/conn_rdma.cc | Full rewrite of Rdma to handle multi-EP, providers, ports |
| media-proxy/src/mesh/conn_rdma_tx.cc | Update TX CQ/thread logic for multi-EP and error handling |
| media-proxy/src/mesh/conn_rdma_rx.cc | Update RX CQ/thread logic with reorder buffer and back-off |
| media-proxy/tests/{rdma_tx,rdma_rx}.cpp | New physical-RDMA TX/RX performance tests |
| media-proxy/tests/libfabric_* | Revise and consolidate libfabric initialization tests |
| media-proxy/tests/conn_rdma_tests.cc | Update unit tests for multi-EP behaviors |
| media-proxy/tests/CMakeLists.txt | Add executables for physical-RDMA tests |
| media-proxy/include/mesh/conn_rdma.h | Introduce fields for multi-EP support |
| media-proxy/include/libfabric_* | Extend device/context types for provider, ports, kind |
| media-proxy/src/libfabric_{dev,ep}.c | Update libfabric calls for fi_getinfo v2.0 and verbs, CMA |
Comments suppressed due to low confidence (1)
media-proxy/include/mesh/conn_rdma.h:17
- Header adds a
std::vector<ep_ctx_t*>member but does not#include <vector>, leading to compilation errors; please add#include <vector>.
#include <array>
tszumski
reviewed
Jun 18, 2025
tszumski
reviewed
Jun 18, 2025
tszumski
reviewed
Jun 18, 2025
tszumski
reviewed
Jun 18, 2025
tszumski
reviewed
Jun 18, 2025
tszumski
reviewed
Jun 23, 2025
tszumski
reviewed
Jun 23, 2025
ko80
reviewed
Jun 25, 2025
ko80
reviewed
Jun 25, 2025
ko80
reviewed
Jun 25, 2025
ko80
reviewed
Jun 25, 2025
ko80
reviewed
Jun 25, 2025
ko80
reviewed
Jun 25, 2025
ko80
reviewed
Jun 25, 2025
ko80
reviewed
Jun 25, 2025
ko80
reviewed
Jun 25, 2025
- Moved variables `next_tx_idx` and `next_rx_idx` from `Rdma` constructor to tx and rx respectively. - Updated default value for `rdma_num_eps` to 1 in `Rdma::configure` method. - Changed endpoint kind definitions from `KIND_RECEIVER` and `KIND_TRANSMITTER` to `FI_KIND_RECEIVER` and `FI_KIND_TRANSMITTER`. - Introduced cleanup helper function to manage memory for endpoint clones in `Rdma::on_establish`. - Improved error handling and resource cleanup during RDMA endpoint initialization. - Added `metrics.h` to define structures for wire headers and statistics messages. - Enhanced `PerfReceiver` class to track latency and throughput metrics. - Updated RDMA receiver and transmitter tests to incorporate new metrics and validate performance. - Implemented a new test structure for measuring latency and bandwidth across varying payload sizes and queue sizes.
…y in connection kind enumeration. Remove unnecessary double allocation of certain attributes in hints.
tszumski
approved these changes
Jul 1, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.