Skip to content

Commit 2dccccc

Browse files
patched MPI max-tag inference
which used an incorrect method to obtain MPI_TAG_UB. We additionally added a fallback to use the standard-gauranteed minimum when there is trouble obtaining MPI_TAG_UB. Shout out to the Julia G's for helping discover the bug! --------- Co-authored-by: Oliver Thomson Brown <[email protected]>
1 parent 9b39fe4 commit 2dccccc

File tree

3 files changed

+16
-12
lines changed

3 files changed

+16
-12
lines changed

quest/src/comm/comm_routines.cpp

Lines changed: 16 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@
66
*
77
* @author Tyson Jones
88
* @author Jakub Adamski (sped-up large comm by asynch messages)
9+
* @author Oliver Brown (patched max-message inference, consulted on AR and MPICH support)
910
* @author Ania (Anna) Brown (developed QuEST v1 logic)
1011
*/
1112

@@ -26,6 +27,7 @@
2627

2728
#include <vector>
2829
#include <array>
30+
#include <algorithm>
2931

3032
using std::vector;
3133

@@ -132,15 +134,24 @@ int getMaxNumMessages() {
132134
// the max supported tag value constrains the total number of messages
133135
// we can send in a round of communication, since we uniquely tag
134136
// each message in a round such that we do not rely upon message-order
135-
// gaurantees and ergo can safely support UCX adaptive routing (AR)
136-
int maxNumMsgs, isAttribSet;
137+
// gaurantees and ergo can safely support UCX adaptive routing (AR).
138+
// The MPI standard necessitates the max tag is always at least...
139+
int minTagUpperBound = 32767;
137140

138-
MPI_Comm_get_attr(MPI_COMM_WORLD, MPI_TAG_UB, &maxNumMsgs, &isAttribSet);
141+
// but we pedantically consult MPI in case we ever need to send MORE (smaller?)
142+
// messages. Beware the max is obtained via a void pointer and might be unset...
143+
void* tagUpperBoundPtr;
144+
int isAttribSet;
145+
MPI_Comm_get_attr(MPI_COMM_WORLD, MPI_TAG_UB, &tagUpperBoundPtr, &isAttribSet);
139146

147+
// if something went wrong with obtaining the tag bound, return the safe minimum
140148
if (!isAttribSet)
141-
error_commTagUpperBoundNotSet();
149+
return minTagUpperBound;
142150

143-
return maxNumMsgs;
151+
// otherwise return whichever is bigger of the found bound and the minimum
152+
// (which is really just hiding an error; it should ALWAYS be that UB>=min)
153+
int tagUpperBound = *(int*) tagUpperBoundPtr;
154+
return std::max({minTagUpperBound, tagUpperBound});
144155

145156
#else
146157
error_commButEnvNotDistributed();

quest/src/core/errors.cpp

Lines changed: 0 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -145,11 +145,6 @@ void error_commGivenInconsistentNumSubArraysANodes() {
145145
raiseInternalError("A distributed function was given a different number of per-node subarray lengths than exist nodes.");
146146
}
147147

148-
void error_commTagUpperBoundNotSet() {
149-
150-
raiseInternalError("The MPI attribute MPI_TAG_UB was not set for communicator MPI_COMM_WORLD, such that the maximum number of messages per communication-round could not be determined.");
151-
}
152-
153148
void error_commNumMessagesExceedTagMax() {
154149

155150
raiseInternalError("A function attempted to communicate via more messages than permitted (since there would be more uniquely-tagged messages than the tag upperbound).");

quest/src/core/errors.hpp

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -70,8 +70,6 @@ void error_commWithSameRank();
7070

7171
void error_commGivenInconsistentNumSubArraysANodes();
7272

73-
void error_commTagUpperBoundNotSet();
74-
7573
void error_commNumMessagesExceedTagMax();
7674

7775
void assert_commBoundsAreValid(Qureg qureg, qindex sendInd, qindex recvInd, qindex numAmps);

0 commit comments

Comments
 (0)