Skip to content

Conversation

@devreal
Copy link
Contributor

@devreal devreal commented Nov 5, 2019

Adds a variable whose declaration was missing in #6641.

Signed-off-by: Joseph Schuchart [email protected]

@awlauria
Copy link
Contributor

awlauria commented Nov 5, 2019

Note: it actually looks like ret was removed in this commit:

421a7fd#diff-ae1fde1f4c92b725fdcf638c9bd2b73b

@awlauria
Copy link
Contributor

awlauria commented Nov 5, 2019

It was this PR for reference:

#6693

Note that I am not quite sure how that passed the build checks...

@devreal
Copy link
Contributor Author

devreal commented Nov 5, 2019

@awlauria

Note that I am not quite sure how that passed the build checks...

It requires a version of UCX to be built with devel headers, see openucx/ucx#4388. Maybe such a UCX version can be added to the build checks?

@jsquyres
Copy link
Member

jsquyres commented Nov 5, 2019

@devreal I'm not sure I understand your comment. How are we building against UCX if we don't have the UCX devel headers installed / available?

@jsquyres
Copy link
Member

jsquyres commented Nov 5, 2019

@devreal Are you saying that we can't build Open MPI without internal UCX header files? (i.e., header files that are not normally installed)

That seems... odd...

@devreal
Copy link
Contributor Author

devreal commented Nov 6, 2019

@jsquyres The particular feature that was introduced with #6641 (shmemx_malloc_with_hint) requires the UCX devel headers to be installed. If the (internal) UCX function uct_ib_md_alloc_device_mem is not detected during configure the code in question is not compiled (guarded by HAVE_UCX_DEVICE_MEM). Unfortunately, this was not documented anywhere...

@yosefe
Copy link
Contributor

yosefe commented Nov 6, 2019

@jsquyres The particular feature that was introduced with #6641 (shmemx_malloc_with_hint) requires the UCX devel headers to be installed. If the (internal) UCX function uct_ib_md_alloc_device_mem is not detected during configure the code in question is not compiled (guarded by HAVE_UCX_DEVICE_MEM). Unfortunately, this was not documented anywhere...

Indeed, in order to get a performance benefit with shmemx_malloc_with_hint() running over spml/ucx component, one needs to configure UCX with installing internal header files.
@devreal what do you think is the best place to document this? configure.m4/spml_ucx/other.. ?

@devreal
Copy link
Contributor Author

devreal commented Nov 6, 2019

@yosefe You're right, my description was not exact. The function shmemx_malloc_with_hint itself is of course available even without the devel headers but the enhanced functionality behind it is only available with the UCX devel headers.

I am not sure what's the best place to document this dependency. As a start, maybe a comment in the source code is sufficient for developers? That won't help users though. Printing a warning for unsupported hints might be too intrusive. Adding the status of UCX devel headers in the configure summary could be helpful to point out that this dependency exists. Is there precedence in Open MPI, @jsquyres?

@rhc54
Copy link
Contributor

rhc54 commented Nov 6, 2019

Not to be snarky, but it seems a trifle uneven to criticize/complain about the btl/uct component's reliance on UCX internals (which it uses to gain performance) and then turn around and do the exact same thing here for the same reason. To a somewhat disinterested party, it feels like the UCX team has an issue that it should address regarding what it needs to publicly expose for adopters to optimize performance. Perhaps that is where this PR needs to start (i.e., delay adoption here until UCX resolves this recurring problem)?

@jsquyres
Copy link
Member

jsquyres commented Nov 6, 2019

@devreal No one reads configure output except us. 😄

Meaning: even if you put something there, no one will read it unless they realize they have a problem and (likely) we instructed them to go back and look at their configure output.

@rhc54 is right, though -- if this is an undocumented / unpublished interface from UCX, then it's pretty much in exactly the same situation as BTL/UCT. If this is something that the UCX community feels is an important optimization, then it should be part of the public interface and then this discussion becomes moot (i.e., there should be no need for internal UCX headers to be used by anyone).

@devreal
Copy link
Contributor Author

devreal commented Nov 7, 2019

Apologies if things got out of bounds here. I'm aware that the used internal UCX functions are an optimization that is not required for the interface implemented in #6641 to work but rather offers a potential optimization.

This PR merely fixes an issue that occurs if these UCX internal interfaces are available. I'm not arguing for or against using them, I think that is a broader argument to have elsewhere.

@rhc54
Copy link
Contributor

rhc54 commented Nov 7, 2019

I believe the question we are raising is: should the correct fix for the issue be to remove the use of the internal UCX functions?

I don't know where else in our code we rely on internal interfaces of a software package - certainly not in libevent, hwloc, PMIx, or libfabric. It feels a tad uncomfortable to be introducing such dependencies.

I'm not sure if UCT also falls in that category - that is a separate issue that should be investigated but is outside the scope of this discussion.

@awlauria awlauria added this to the v5.0.0 milestone Mar 19, 2020
@awlauria
Copy link
Contributor

Should we create an issue and migrate this discussion over there?

@awlauria awlauria merged commit c9d0517 into open-mpi:master Apr 8, 2020
@devreal devreal deleted the fix_oshmem_dev_mem branch October 3, 2022 15:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants