Add EFA support to rust-ibverbs with working example.#81
Add EFA support to rust-ibverbs with working example.#81e-ngo wants to merge 12 commits intojonhoo:mainfrom
Conversation
57a1b57 to
65d3ba9
Compare
ibverbs/src/lib.rs
Outdated
| wr_id: u64, | ||
| imm_data: Option<u32>, | ||
| ) -> io::Result<()> { | ||
| let qp_ex = unsafe { ffi::ibv_qp_to_qp_ex(self.qp) }; |
There was a problem hiding this comment.
check that the result is not nullptr, then dereference once and use below, then you likely dont need the unsafe after this line
There was a problem hiding this comment.
hmm, not able to remove the full unsafe blocks., Since inner functions like wr_rdma_write_imm is unsafe as well.
ibverbs/src/lib.rs
Outdated
| _imm_data: Option<u32>, | ||
| ) -> io::Result<()> { | ||
|
|
||
| let qp_ex = unsafe { ffi::ibv_qp_to_qp_ex(self.qp) }; |
There was a problem hiding this comment.
same here, check this succeeded, then dereference
ibverbs/src/lib.rs
Outdated
| } else { | ||
| self.qp.ah | ||
| }; | ||
| self.qp.ah = ah; |
There was a problem hiding this comment.
seems a bit roundabout to store this with if else in a variable, why not just set
qp.ah = test_ah; in the inf branch, and delete the else branch and let statement?
There was a problem hiding this comment.
Gotcha. Yea i was trying to be too clever. Simplified it. Handshakes should only be called once per qp is the assumption i will make here. i think it's fair to assume?
ibverbs-sys/build.rs
Outdated
| println!("cargo:rustc-link-search=native={manifest_dir}/vendor/rdma-core/build/lib"); | ||
| println!("cargo:rustc-link-lib=ibverbs"); | ||
| // Only link EFA if the library exists (i.e., on EFA-capable hosts) | ||
| if Path::new("/opt/amazon/efa/lib/").exists() { |
There was a problem hiding this comment.
instead of relying on a path, maybe the right thing is to feature gate this?
| use std::time::Duration; | ||
|
|
||
| const PORT_NUM: u8 = 1; | ||
| const UD_QKEY: u32 = 0x12345678; |
There was a problem hiding this comment.
QKEY is used to validate communication between two QPs, specifically for UD QPs. In this case I am using a random u32.
ibverbs-sys/build.rs
Outdated
| println!("cargo:rustc-link-search=native={manifest_dir}/vendor/rdma-core/build/lib"); | ||
| println!("cargo:rustc-link-lib=ibverbs"); | ||
|
|
||
| if env::var("CARGO_FEATURE_EFA").is_ok() && Path::new("/opt/amazon/efa/lib/").exists() { |
There was a problem hiding this comment.
since this is now feature gated, i would remove:
Path::new("/opt/amazon/efa/lib/").exists()
| qp_type: self.qp_type, | ||
| sq_sig_all: 0, | ||
| }; | ||
| unsafe { ffi::ibv_create_qp(self.pd.pd, &mut attr as *mut _) } |
There was a problem hiding this comment.
I think these codepaths can be further unified by using Ibv_create_qp_ex in the non-efa case, then you can use the same ibv_qp_init_attr_ex for both cases
| eprintln!("run bindgen"); | ||
| let bindings = bindgen::Builder::default() | ||
| .header("vendor/rdma-core/libibverbs/verbs.h") | ||
| .header("vendor/rdma-core/providers/efa/efadv.h") |
There was a problem hiding this comment.
maybe the bindings for the efa headers and efa specific methods should only be created if the efa feature is enabled
| /// Remote RDMA write with EFA. | ||
| /// immediate data can be used to signal the completion of the write operation | ||
| /// the other side uses post_recv on a dummy buffer and get the imm data from the work completion | ||
| pub fn post_write_efa( |
There was a problem hiding this comment.
I read more about the ibv ex api documentation, this is actually not EFA specific. Is using the wr-api with function calls instead of the post-api a requirement for EFA? We might be able to unify the codepaths for EFA and non EFA to both use the wr-api since it seems to be the more efficient and powerful API. Is it supported on all normal hardware these days?
There was a problem hiding this comment.
Is using the wr-api with function calls instead of the post-api a requirement for EFA
I believe so, primarily we need to set the ud_addr in the wr_id which seems to only be available in the extended API.
Is it supported on all normal hardware these days?
I would assume so, the ibv_wr api seems to date back 7 years https://github.com/linux-rdma/rdma-core/blob/0d977c57c486bc13e3a6be0f17c29eace5657633/libibverbs/man/ibv_wr_post.3.md?plain=1#L29. I think that api should have backwards compatibility, as long as we try to make sure we're not setting too many other fields.
I can try to merge it. Do you mind if I do it in a follow up PR?
There was a problem hiding this comment.
ud_addr is actually part of ibv_send_wr (it's just called "ud" there), so we can use the existing ibv_post_send API in the implementation of post_{send,write,read}:
struct ibv_send_wr {
uint64_t wr_id;
...
union {
...
struct {
struct ibv_ah *ah;
uint32_t remote_qpn;
uint32_t remote_qkey;
} ud;
} wr;
ibverbs/src/lib.rs
Outdated
| (*qp_ex).wr_set_sge_list.unwrap()(qp_ex, local.len(), local.as_ptr() as *mut ffi::ibv_sge); | ||
| // TODO(Eric): Consider creating and caching AH for each remote_endpoint... | ||
| // This way a single QP can be shared for multiple remote endpoints. | ||
| (*qp_ex).wr_set_ud_addr.unwrap()(qp_ex, self.ah, remote_endpoint.num, UD_QKEY); |
There was a problem hiding this comment.
Since UD QueuePair works with any other UD QP, the handshake API does not make much sense for this. I'm wondering how we should change the API. Should we distinguish between ConnectedQueuePair and DatagramQueuePair?
There was a problem hiding this comment.
That's a good point. My initial reaction was that I wanted to merge the API points as much as possible so that there's less branching off in user code. What APIs would you reckon that would be different here or that we would want to distinguish?
Adding EFA Support to
rust-ibverbsTested on: AWS instances with EFA NICs
Key Parity Functions Added
build_efa()Mirrors
build(), but uses the extendedlibibverbsAPI to create an SRD QP.handshake_efa()Unlike RC QP handshakes, EFA is connectionless. This function:
INIT → RTR → RTSpost_write_efa()Uses extended API to build a work request.
ud_data.addrmust point to the target AH.These functions are all hidden behind a prop passed down from the QueuePairBuilder (QueuePairBuilder.enable_efa(true)). By enabling this, the proper efa version of the method will be called.
This keeps the user's interfaces nearly the exact same.