You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Refactor and enhance RDMA implementation (verbs and tcp providers) (#400)
* Refactor and enhance RDMA implementation (verbs and tcp providers)
- Introduced a new test suite for libfabric development, including comprehensive tests for RDMA initialization and teardown.
- Added custom fake functions for RDMA address info handling to improve test isolation and reliability.
- Created a new header file for libfabric development tests to encapsulate test setup and teardown logic.
- Implemented performance testing for RDMA receiver and transmitter, focusing on TTLB and bandwidth metrics across varying payload and queue sizes.
- Enhanced the mcm_rdma_args structure to include provider and number of endpoints for better configuration flexibility.
- Updated existing tests to remove unnecessary calls to fi_getinfo, ensuring tests focus on relevant functionality.
- Improved memory management in tests to prevent leaks and ensure proper cleanup of allocated resources.
* Remove unused reorder buffer and window size constants from RDMA processing threads
* Refactor RDMA connection handling and enhance performance metrics
- Moved variables `next_tx_idx` and `next_rx_idx` from `Rdma` constructor to tx and rx respectively.
- Updated default value for `rdma_num_eps` to 1 in `Rdma::configure` method.
- Changed endpoint kind definitions from `KIND_RECEIVER` and `KIND_TRANSMITTER` to `FI_KIND_RECEIVER` and `FI_KIND_TRANSMITTER`.
- Introduced cleanup helper function to manage memory for endpoint clones in `Rdma::on_establish`.
- Improved error handling and resource cleanup during RDMA endpoint initialization.
- Added `metrics.h` to define structures for wire headers and statistics messages.
- Enhanced `PerfReceiver` class to track latency and throughput metrics.
- Updated RDMA receiver and transmitter tests to incorporate new metrics and validate performance.
- Implemented a new test structure for measuring latency and bandwidth across varying payload sizes and queue sizes.
* Fix memory leak by freeing duplicated RDMA provider string in create_bridge
* Rename KIND_UNDEFINED to FI_KIND_UNDEFINED for consistency and clarity in connection kind enumeration.
Remove unnecessary double allocation of certain attributes in hints.
0 commit comments