Skip to content

Conversation

@bosilca
Copy link
Member

@bosilca bosilca commented Feb 19, 2022

Fixes #10028

Signed-off-by: George Bosilca [email protected]

Copy link
Member

@jsquyres jsquyres left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In principle, I don't have a problem with this PR. But is it mandated by the MPI spec, or is this an extension?

In the description for MPI_TYPE_GET_NAME in MPI-4.0 (p385 -- or anywhere in section 7.8, "Naming Object" starting on p381), it doesn't explicitly mention that you can pass invalid handles (like MPI_DATATYPE_NULL) to the MPI_*_GET_NAME functions. Indeed, MPI-4.0 p383:33-36 somewhat kinda sorta implies that you can't:

Thus, the names of MPI_COMM_WORLD, MPI_COMM_SELF, and the communicator returned by MPI_COMM_GET_PARENT (if not MPI_COMM_NULL) will have the default of "MPI_COMM_WORLD", "MPI_COMM_SELF", and "MPI_COMM_PARENT".

That being said, if this issue is ambiguous in the MPI standard text, I'm in favor of this functionality because it's one less check that the application program needs to make before calling MPI_*_GET_NAME.

What do we do for the other MPI_*_GET_NAME functions?

Do we know what MPICH does here?

@bosilca
Copy link
Member Author

bosilca commented Feb 20, 2022

The MPI standard define the name for all predefined datatypes, and MPI_DATATYPE_NULL is part of that group. Thus, I see no reason for the current in MPI_TYPE_GET_NAME that prevents the retrieval of its name.

@jsquyres
Copy link
Member

I guess I'm a little confused. Can you cite where in MPI-4.0 it says that invalid handles such as MPI_DATATYPE_NULL should have a name returned from MPI_TYPE_GET_NAME? The only description I see under MPI_TYPE_GET_NAME (MPI-4.0 p385:23-24) says:

Named predefined datatypes have the default names of the datatype name. For example, MPI_WCHAR has the default name of "MPI_WCHAR".

It doesn't say that invalid handles are accepted by MPI_TYPE_GET_NAME. What am I missing?

@bosilca
Copy link
Member Author

bosilca commented Feb 20, 2022

MPI_DATATYPE_NULL is not an invalid datatype, it is a NULL handle, and as such it is a predefined constant. Thus, I believe that the naming rule cited above applies.

Forcing a library writer to check for MPI_DATATYPE_NULL before calling MPI_TYPE_GET_NAME is as sensical having printf not allowing the printing of NULL pointers.

@bosilca
Copy link
Member Author

bosilca commented Feb 22, 2022

The other get_name functions do not suffer from the same problem. This PR is good to be merged.

@jsquyres
Copy link
Member

The other get_name functions do not suffer from the same problem. This PR is good to be merged.

Yes they do. MPI_COMM_GET_NAME calls:

if ( ompi_comm_invalid ( comm ) )

which is

/**
* Is this a valid communicator? This is a complicated question.
* :-)
*
* According to MPI-1:5.2.4 (p137):
*
* "The predefined constant MPI_COMM_NULL is the value used for
* invalid communicator handles."
*
* Hence, MPI_COMM_NULL is not valid. However, MPI-2:4.12.4 (p50)
* clearly states that the MPI_*_C2F and MPI_*_F2C functions
* should treat MPI_COMM_NULL as a valid communicator -- it
* distinctly differentiates between "invalid" handles and
* "MPI_*_NULL" handles. Some feel that the MPI-1 definition
* still holds for all other MPI functions; others feel that the
* MPi-2 definitions trump the MPI-1 definition. Regardless of
* who is right, there is ambiguity here. So we have left
* ompi_comm_invalid() as originally coded -- per the MPI-1
* definition, where MPI_COMM_NULL is an invalid communicator.
* The MPI_Comm_c2f() function, therefore, calls
* ompi_comm_invalid() but also explictily checks to see if the
* handle is MPI_COMM_NULL.
*/
static inline int ompi_comm_invalid (const ompi_communicator_t* comm)
{
if ((NULL == comm) || (MPI_COMM_NULL == comm) ||
(OMPI_COMM_IS_FREED(comm)) || (OMPI_COMM_IS_INVALID(comm)) )
return true;
else
return false;
}

which explicitly checks for MPI_COMM_NULL. There's a lengthy comment there explaining that there is ambiguity about MPI_COMM_NULL -- it is "invalid" or is it just a NULL (but valid) handle? It looks like that issue has not been resolved in the MPI standard.

Additionally, MPI_WIN_GET_NAME invokes ompi_win_invald():

if (ompi_win_invalid(win)) {

which is

ompi/ompi/win/win.h

Lines 152 to 164 in 580984b

/* Note that the defintion of an "invalid" window is closely related
to the defintion of an "invalid" communicator. See a big comment
in ompi/communicator/communicator.h about this. */
static inline int ompi_win_invalid(ompi_win_t *win) {
if (NULL == win ||
MPI_WIN_NULL == win ||
(OMPI_WIN_INVALID & win->w_flags) ||
(OMPI_WIN_FREED & win->w_flags)) {
return true;
} else {
return false;
}
}

and explicitly checks for MPI_WIN_NULL.

Just to be 100% sure, I tested this with a trivial program:

#include <stdio.h>
#include <mpi.h>

int main()
{
    int len;
    char name[MPI_MAX_OBJECT_NAME];

    MPI_Init(NULL, NULL);
#if 0
    MPI_Type_get_name(MPI_DATATYPE_NULL, name, &len);
#elif 0
    MPI_Comm_get_name(MPI_COMM_NULL, name, &len);
#else
    MPI_Win_get_name(MPI_WIN_NULL, name, &len);
#endif
    printf("Got name: %s\n", name);
    MPI_Finalize();
    return 0;
}

I ran it with all 3 possibilities and Open MPI -- and MPICH 4.0 -- generate an error for all 3 MPI_*_GET_NAME with MPI_*_NULL cases.

Given that there is some ambiguity here, I think we should have the same behavior for all of the MPI_*_GET_NAME functions: they should all either accept the MPI_*_NULL handles or they should all disallow the MPI_*_NULL handles. Currently, they do the latter. If we want to change to the former, that's probably fine, but a) we should be consistent, and b) we should probably also document this behavior.

@bosilca if you fix up all the MPI_*_GET_NAME functions, I'll fix up the docs.

This applies to MPI_DATATYPE_NULL, MPI_COMM_NULL and MPI_WIN_NULL.

MPI 4.0 states that "The predefined constant MPI_COMM_NULL is the
value used for invalid communicator handles." Hence, we might assume
that MPI_COMM_NULL is an invalid communicator, but that is only
partially true because in 4.12.4 it is clearly stated that the
translation function, MPI_*_C2F and MPI_*_F2C, should treat
MPI_COMM_NULL as a valiid communicator. So, the real role of these
MPI_*_NULL objects might be to point to valid MPI objects used as
bounds, but they represent valid MPI objects that cannot be used for
communications. In any case, as predefined handles they have an
associated name, and the user _must_ be able to get this name.

Signed-off-by: George Bosilca <[email protected]>
@bosilca
Copy link
Member Author

bosilca commented Feb 23, 2022

Our implementation of ompi_*_invalid is, well, invalid. There is a lengthy comment in communicator.h on why we're doing the wrong thing (according to MPI 2.0), but we're fine with it because we have implemented shortcuts in the few places where it was needed (basically f2c and c2f functions). Anyway, if we assume that invalid means, improper for communications, then we should be good for now.

Copy link
Member

@jsquyres jsquyres left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bosilca and I chatted on the phone about this this morning. Here's the conclusions we came to:

  1. We both agree that the MPI_*_GET_NAME functions should accept the MPI_*_NULL handles. It's just better for users.
  2. From that perspective, we should accept this PR.
  3. From the portability perspective, we should encourage MPICH to adopt this behavior as well. Because it's better for users.
  4. For simplicity (assuming this PR is accepted), let's bring it to v5.0. But let's not backport it (i.e., do not bring it to v4.1.x or v4.0.x).
  5. If the MPI Forum accepts mpi-forum/mpi-issues#544, 🎉 . If not, we can revert this PR.


/* Setup MPI_WIN_NULL */
OBJ_CONSTRUCT(&ompi_mpi_win_null.win, ompi_win_t);
ompi_mpi_win_null.win.w_flags = OMPI_WIN_INVALID;
Copy link
Member

@jsquyres jsquyres Feb 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this safe to do? Are there other places that rely on this flag being set upon construction?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only use I could find is in https://github.com/open-mpi/ompi/blob/master/ompi/win/win.h#L155 where the window is explicitly checked for MPI_WIN_NULL as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All functions check the validity of the window using ompi_win_invalid, a functions that specifically checks for MPI_WIN_NULL (in addition to checking the OMPI_WIN_INVALID flag). No other usages of OMPI_WIN_INVALID exists in the code.

It is in fact the same story with all the _INVALID flags so it would be legitimate to wonder why do we need the _INVALID flag at all, but I don't think we want to have this discussion in the context of this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If OMPI_WIN_INVALID is only used in 1) the constructor and 2) in ompi_win_invalid(), and you just removed it out of the constructor, then we should also remove it out of ompi_win_invalid(), too. Otherwise it's a dangling check for something that can never be true.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's exactly what I said above (and it is true for all uses of _INVALID). But I don't think it should be part of this PR.

@bwbarrett
Copy link
Member

I disagree with this PR. I think our current behavior is required by the standard and don't believe we should commit this without clarification from the MPI Forum. I think this is the desired behavior, but I also believe it violates the MPI standard (meaning we shouldn't do it without clarification on the standard).

MPI_COMM_GET_NAME (and friends) operate on a communicator. MPI_COMM_NULL is not a communicator; it's the NULL handle. The first sentence of MPI_COMM_GET_NAME is:

MPI_COMM_GET_NAME returns the last name which has previously been associated 22 with the given communicator.

Unless we all agree that you can call MPI_COMM_SET_NAME on MPI_COMM_NULL, and I would claim that the definition of MPI_COMM_SET_NAME:

MPI_COMM_SET_NAME allows a user to associate a name string with a communicator.

prohibits calling SET_NAME on MPI_COMM_NULL because, again, MPI_COMM_NULL is not a communicator, then we can't have GET_NAME handle the NULL handles and conform to the MPI specification.

The bit in MPI_COMM_GET_NAME about predefined communicators is also not ambiguous today. It says:

The three predefined communicators will have predefined names associated with them. Thus, the names of MPI_COMM_WORLD, MPI_COMM_SELF, and the communicator returned by MPI_COMM_GET_PARENT (if not MPI_COMM_NULL) will have the default of 35 "MPI_COMM_WORLD", "MPI_COMM_SELF", and "MPI_COMM_PARENT".

Note (1) the explicit callout of MPI_COMM_NULL as an exception and (2) that it says predefined communicators, not predefined communicator handles. Again, MPI_COMM_NULL is not a communicator, so this section would not apply.

The same argument applies to the invalid/valid flag. MPI_COMM_NULL is not a valid communicator, because it isn't a communicator. At one point, we used to mark objects (comm, win, etc.) invalid once they were freed, because the pointer to them was still "valid", from the sense that nothing prohibited a caller from using that pointer to a MPI_Send and it was likely still valid memory full of a semi-sane looking communicator that wouldn't actually work. George's comment makes me think that code got removed at some point. But I don't think removing the invalid is the right choice here, either.

@gpaulsen
Copy link
Member

Rescinded my review based on further MPI standard evidence presented above.

@jsquyres
Copy link
Member

Rescinded my review based on further MPI standard evidence presented above.

@gpaulsen Did you look at mpi-forum/mpi-issues#544?

@dalcinl
Copy link
Contributor

dalcinl commented Sep 28, 2025

@jsquyres @bosilca Could you please move forward this PR?

Note that #13276 fixed MPI_Comm_get_name.

MPI_Win_get_name should be fixed, too. Do you prefer to rebase the changes in this PR and do everything here?
Or would it be better to close this PR and start fresh with a new one?

@bosilca
Copy link
Member Author

bosilca commented Oct 2, 2025

Once rebased this shows little difference with main. I checked with the example above and all NULL objects return the correct name with main, this PR is unnecessary.

@bosilca bosilca closed this Oct 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

libompitrace does not check use of MPI_DATATYPE_NULL when calling PMPI_Type_get_name and causes MPI_ERR_TYPE

7 participants