Skip to content

Add EFA support to rust-ibverbs with working example.#81

Open
e-ngo wants to merge 12 commits intojonhoo:mainfrom
e-ngo:erngo/add-efa-support
Open

Add EFA support to rust-ibverbs with working example.#81
e-ngo wants to merge 12 commits intojonhoo:mainfrom
e-ngo:erngo/add-efa-support

Conversation

@e-ngo
Copy link

@e-ngo e-ngo commented Nov 5, 2025

Adding EFA Support to rust-ibverbs

Tested on: AWS instances with EFA NICs

Key Parity Functions Added

  • build_efa()
    Mirrors build(), but uses the extended libibverbs API to create an SRD QP.

  • handshake_efa()
    Unlike RC QP handshakes, EFA is connectionless. This function:

    • Transitions QP state: INIT → RTR → RTS
    • Creates and binds the required Address Handle (AH) to authorize sends to the remote peer.
  • post_write_efa()
    Uses extended API to build a work request.
    ud_data.addr must point to the target AH.

These functions are all hidden behind a prop passed down from the QueuePairBuilder (QueuePairBuilder.enable_efa(true)). By enabling this, the proper efa version of the method will be called.

This keeps the user's interfaces nearly the exact same.

@e-ngo e-ngo force-pushed the erngo/add-efa-support branch from 57a1b57 to 65d3ba9 Compare November 5, 2025 23:59
wr_id: u64,
imm_data: Option<u32>,
) -> io::Result<()> {
let qp_ex = unsafe { ffi::ibv_qp_to_qp_ex(self.qp) };
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check that the result is not nullptr, then dereference once and use below, then you likely dont need the unsafe after this line

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, not able to remove the full unsafe blocks., Since inner functions like wr_rdma_write_imm is unsafe as well.

_imm_data: Option<u32>,
) -> io::Result<()> {

let qp_ex = unsafe { ffi::ibv_qp_to_qp_ex(self.qp) };
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here, check this succeeded, then dereference

} else {
self.qp.ah
};
self.qp.ah = ah;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems a bit roundabout to store this with if else in a variable, why not just set
qp.ah = test_ah; in the inf branch, and delete the else branch and let statement?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha. Yea i was trying to be too clever. Simplified it. Handshakes should only be called once per qp is the assumption i will make here. i think it's fair to assume?

println!("cargo:rustc-link-search=native={manifest_dir}/vendor/rdma-core/build/lib");
println!("cargo:rustc-link-lib=ibverbs");
// Only link EFA if the library exists (i.e., on EFA-capable hosts)
if Path::new("/opt/amazon/efa/lib/").exists() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of relying on a path, maybe the right thing is to feature gate this?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, how does this look?

use std::time::Duration;

const PORT_NUM: u8 = 1;
const UD_QKEY: u32 = 0x12345678;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QKEY is used to validate communication between two QPs, specifically for UD QPs. In this case I am using a random u32.

println!("cargo:rustc-link-search=native={manifest_dir}/vendor/rdma-core/build/lib");
println!("cargo:rustc-link-lib=ibverbs");

if env::var("CARGO_FEATURE_EFA").is_ok() && Path::new("/opt/amazon/efa/lib/").exists() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since this is now feature gated, i would remove:
Path::new("/opt/amazon/efa/lib/").exists()

qp_type: self.qp_type,
sq_sig_all: 0,
};
unsafe { ffi::ibv_create_qp(self.pd.pd, &mut attr as *mut _) }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these codepaths can be further unified by using Ibv_create_qp_ex in the non-efa case, then you can use the same ibv_qp_init_attr_ex for both cases

eprintln!("run bindgen");
let bindings = bindgen::Builder::default()
.header("vendor/rdma-core/libibverbs/verbs.h")
.header("vendor/rdma-core/providers/efa/efadv.h")
Copy link
Collaborator

@xmakro xmakro Nov 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe the bindings for the efa headers and efa specific methods should only be created if the efa feature is enabled

/// Remote RDMA write with EFA.
/// immediate data can be used to signal the completion of the write operation
/// the other side uses post_recv on a dummy buffer and get the imm data from the work completion
pub fn post_write_efa(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read more about the ibv ex api documentation, this is actually not EFA specific. Is using the wr-api with function calls instead of the post-api a requirement for EFA? We might be able to unify the codepaths for EFA and non EFA to both use the wr-api since it seems to be the more efficient and powerful API. Is it supported on all normal hardware these days?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is using the wr-api with function calls instead of the post-api a requirement for EFA
I believe so, primarily we need to set the ud_addr in the wr_id which seems to only be available in the extended API.

Is it supported on all normal hardware these days?
I would assume so, the ibv_wr api seems to date back 7 years https://github.com/linux-rdma/rdma-core/blob/0d977c57c486bc13e3a6be0f17c29eace5657633/libibverbs/man/ibv_wr_post.3.md?plain=1#L29. I think that api should have backwards compatibility, as long as we try to make sure we're not setting too many other fields.

I can try to merge it. Do you mind if I do it in a follow up PR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ud_addr is actually part of ibv_send_wr (it's just called "ud" there), so we can use the existing ibv_post_send API in the implementation of post_{send,write,read}:

struct ibv_send_wr {
	uint64_t		wr_id;
    ...
	union {
        ...
		struct {
			struct ibv_ah  *ah;
			uint32_t	remote_qpn;
			uint32_t	remote_qkey;
		} ud;
	} wr;

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will give it a try

(*qp_ex).wr_set_sge_list.unwrap()(qp_ex, local.len(), local.as_ptr() as *mut ffi::ibv_sge);
// TODO(Eric): Consider creating and caching AH for each remote_endpoint...
// This way a single QP can be shared for multiple remote endpoints.
(*qp_ex).wr_set_ud_addr.unwrap()(qp_ex, self.ah, remote_endpoint.num, UD_QKEY);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since UD QueuePair works with any other UD QP, the handshake API does not make much sense for this. I'm wondering how we should change the API. Should we distinguish between ConnectedQueuePair and DatagramQueuePair?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point. My initial reaction was that I wanted to merge the API points as much as possible so that there's less branching off in user code. What APIs would you reckon that would be different here or that we would want to distinguish?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants