Skip to content

Conversation

@bwbarrett
Copy link
Member

When the ob1 PML was not eligible for selection (such as when the user sets --mca pml cm), the BML and BTL frameworks are not initialized and the rdma osc component will later fail as there are no BTLs available. This patch resolves the issue by having the rdma osc component initialize the BML interface.

Making this change required two additional, related changes. First, since the BTLs use the modex, the rdma initialization must be moved before the modex point, so that putting data in the modex works as expected. Second, BTLs can require loading the entire world during init (such as TCP when there are multiple threads and multiple NICs or usnic), so we extend the world loading checks to include OSC.

Since the other Portals4 components say that they do require world loading, we also assume the Portals4 osc component also requires world loading.

When the ob1 PML was not eligible for selection (such as when the user
sets --mca pml cm), the BML and BTL frameworks are not initialized and
the rdma osc component will later fail as there are no BTLs available.
This patch resolves the issue by having the rdma osc component
initialize the BML interface.

Making this change required two additional, related changes.  First,
since the BTLs use the modex, the rdma initialization must be moved
before the modex point, so that putting data in the modex works as
expected.  Second, BTLs can require loading the entire world during
init (such as TCP when there are multiple threads and multiple NICs or
usnic), so we extend the world loading checks to include OSC.

Since the other Portals4 components say that they do require world
loading, we also assume the Portals4 osc component also requires
world loading.

Signed-off-by: Brian Barrett <[email protected]>
@bwbarrett bwbarrett merged commit 6af82f0 into open-mpi:main Jun 30, 2025
17 of 19 checks passed
@bwbarrett bwbarrett deleted the bugfix/rdma-when-mtl-fix branch June 30, 2025 20:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants