Skip to content

Commit 12ce58c

Browse files
dstaay-fbmeta-codesync[bot]
authored andcommitted
support concurrency (#1943)
Summary: Pull Request resolved: #1943 TL;DR: BEFORE: controlled flow by requiring python caller to obtain a QP ownership and hold for duration of call (.read_from/.write_into) AFTER: now we can cheaply clone QPs, and just use atomics to generate wr_id, and rely on ibverbs internal locks (ibv_post_send is thread-safe). Complexity introduced by Work completion events which may be returned out of order and only delivered once, so need to store any WC in seperate cache. ### Atomic Counters in rdmaxcel_qp_t for Lock-Free Operations The rdmaxcel_qp_t wrapper uses atomic counters to enable concurrent, lock-free work request posting: ``` typedef struct rdmaxcel_qp { struct ibv_qp* ibv_qp; struct ibv_cq* send_cq; struct ibv_cq* recv_cq; // Atomic counters for lock-free concurrent access _Atomic uint64_t send_wqe_idx; // Next send WQE slot _Atomic uint64_t send_db_idx; // Last doorbell rung _Atomic uint64_t recv_wqe_idx; // Next recv WQE slot _Atomic uint64_t recv_db_idx; // Last recv doorbell _Atomic uint64_t rts_timestamp; // Ready-to-send timestamp // Completion caches for efficient polling completion_cache_t* send_completion_cache; completion_cache_t* recv_completion_cache; } rdmaxcel_qp_t; ``` Key Benefits: Multiple threads can post work requests concurrently using fetch_add on atomic indices No locks needed for the hot path (posting operations) Each thread gets a unique WQE slot atomically Completion polling uses cached results to avoid redundant CQ polls ### Mutex-Protected Queue Pair Creation While operations are lock-free, QP creation is serialized using Rust Arc<Mutex<HashSet>>: ``` pub struct RdmaManagerActor { // Track QPs currently being created to prevent duplicate creation pending_qp_creation: Arc<Mutex<HashSet<(String, ActorId, String)>>>, // ... } ``` Creation Flow: Thread checks if QP exists (lock-free read from HashMap) If not, acquires mutex and checks pending_qp_creation set If another thread is creating it, waits without holding lock Otherwise, inserts key into set, releases lock, and creates QP After creation, removes key from set This prevents race conditions where multiple threads try to create the same QP simultaneously while keeping the common path (using existing QPs) lock-free. ### Resource Lifecycle Management Simplified cleanup via rdmaxcel_qp_destroy: Previously: Rust manually destroyed ibv_qp and CQs separately (error-prone with concurrent access) Now: Single C function destroys all resources atomically Changed register_segments(pd, rdmaxcel_qp_t*) to work with wrapper instead of raw ibv_qp Reviewed By: casteryh Differential Revision: D87021168 fbshipit-source-id: c8e5fbca0c2a775801dc37e4e154b24daaddfa2a
1 parent b57ad38 commit 12ce58c

File tree

9 files changed

+1339
-544
lines changed

9 files changed

+1339
-544
lines changed

monarch_rdma/Cargo.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@ edition = "2024"
1414
anyhow = "1.0.98"
1515
async-trait = "0.1.86"
1616
cuda-sys = { path = "../cuda-sys" }
17+
futures = { version = "0.3.31", features = ["async-await", "compat"] }
1718
hyperactor = { version = "0.0.0", path = "../hyperactor" }
1819
rand = { version = "0.8", features = ["small_rng"] }
1920
rdmaxcel-sys = { path = "../rdmaxcel-sys" }

monarch_rdma/src/rdma_components.rs

Lines changed: 294 additions & 217 deletions
Large diffs are not rendered by default.

monarch_rdma/src/rdma_manager_actor.rs

Lines changed: 244 additions & 207 deletions
Large diffs are not rendered by default.

0 commit comments

Comments
 (0)