Skip to content

Conversation

@hjelmn
Copy link
Member

@hjelmn hjelmn commented Sep 19, 2025

… or failure

The MCA_PML_OB1_ADD_ACK_TO_PENDING method creates a mca_pml_ob1_pckt_pending_t to hold an ack to be sent later. This method builds the pending packet then puts it on the mca_pml_ob1.pckt_pending list for later transmission. It does not, however, set the required hdr_size field on the struct. This leads to issues when the packet is later sent because it could contain any value. With some btls this will lead to memory corruption (if the size is not checked against btl_max_send_size) or just allocation failure because the size is too big. In other situations it could lead to a truncated packet being send (if the size previously in hdr_size is smaller than an ack).

To fix the issue this commit gets rid of the macro entirely and replaces it with a new inline helper method that does the same thing. This helper uses the existing mca_pml_ob1_add_to_pending helper (which sets hdr_size) to reduce duplicated code.

Tested and verified this fixes a critical issue triggered on our hardware.

(cherry picked from commit 48490b9)

… or failure

The MCA_PML_OB1_ADD_ACK_TO_PENDING method creates a mca_pml_ob1_pckt_pending_t
to hold an ack to be sent later. This method builds the pending packet then puts
it on the mca_pml_ob1.pckt_pending list for later transmission. It does not,
however, set the required hdr_size field on the struct. This leads to issues
when the packet is later sent because it could contain any value. With some btls
this will lead to memory corruption (if the size is not checked against
btl_max_send_size) or just allocation failure because the size is too big. In
other situations it could lead to a truncated packet being send (if the size
previously in hdr_size is smaller than an ack).

To fix the issue this commit gets rid of the macro entirely and replaces it with
a new inline helper method that does the same thing. This helper uses the
existing mca_pml_ob1_add_to_pending helper (which sets hdr_size) to reduce
duplicated code.

Tested and verified this fixes a critical issue triggered on our hardware.

Signed-off-by: Nathan Hjelm <[email protected]>
(cherry picked from commit 48490b9)
@hjelmn hjelmn requested a review from bosilca September 19, 2025 21:27
@github-actions github-actions bot added this to the v5.0.8 milestone Sep 19, 2025
Copy link
Member

@bosilca bosilca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could have been a single line change in MCA_PML_OB1_ADD_ACK_TO_PENDING (adding _pck->hdr_size = sizeof(hdr.hdr_ack);

@hjelmn
Copy link
Member Author

hjelmn commented Sep 23, 2025

True, but figured it would be ideal to reduce duplicated code in case something about pending list items change (not that I expect that will ever happen).

@janjust janjust merged commit 359a19f into open-mpi:v5.0.x Sep 23, 2025
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants