-
Notifications
You must be signed in to change notification settings - Fork 928
Closed
Description
I'm trying to run the varlist developed by LLNL that is supposed to list all MPI_T variables available. The code is available at https://github.com/LLNL/mpi-tools.
Running the benchmark ends in an error:
Found 20 performance variables
Found 20 performance variables with verbosity <= D/A-9
Variable VRB Class Type Bind R/O CNT ATM
-------------------------------------------------------------------------
mpool_hugepage_bytes_allocated U/A-3 SIZE ULONG n/a YES YES NO
ERROR: PVARINFO: MPI error code -18:
Thread 1 "varlist" hit Breakpoint 1, PMPI_Error_string (errorcode=-18, string=0x55555575a080 <errMsg> "MPI_ERR_OTHER: known error not in list",
resultlen=0x55555575a468 <errMsgLen>) at perror_string.c:44
44 OPAL_CR_NOOP_PROGRESS();
(gdb) bt
#0 PMPI_Error_string (errorcode=-18, string=0x55555575a080 <errMsg> "MPI_ERR_OTHER: known error not in list", resultlen=0x55555575a468 <errMsgLen>) at perror_string.c:44
#1 0x00005555555562a1 in list_pvars () at /home/joseph/src/mpi-tools/mpi_t/varlist/varlist.c:410
#2 0x0000555555557ca3 in main (argc=1, argv=0x7fffffffdcf8) at /home/joseph/src/mpi-tools/mpi_t/varlist/varlist.c:899
(gdb) f 1
#1 0x00005555555562a1 in list_pvars () at /home/joseph/src/mpi-tools/mpi_t/varlist/varlist.c:410
410 CHECKERR("PVARINFO",err);
(gdb) print err
$4 = -18
(gdb) list
405 for (i=0; i<num; i++)
406 {
407 namelen=maxnamelen;
408 desclen=maxdesclen;
409 err=MPI_T_pvar_get_info(i,name,&namelen,&verbos,&vc,&dt,&et,desc,&desclen,&bind,&ro,&ct,&at);
410 CHECKERR("PVARINFO",err);
411 if (verbos<=verbosity)
412 {
413 if (!longlist)
414 {
(gdb) print name
$5 = 0x55555593a630 "mpool_hugepage_bytes_allocated"
(gdb) c
Continuing.
[beryl:18517] *** An error occurred in MPI_Error_string
[beryl:18517] *** reported by process [3787063297,0]
[beryl:18517] *** on communicator MPI_COMM_WORLD
[beryl:18517] *** MPI_ERR_ARG: invalid argument of some other kind
[beryl:18517] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[beryl:18517] *** and potentially your MPI job)
[Thread 0x7fffed921700 (LWP 18523) exited]
[Thread 0x7ffff4f6c700 (LWP 18522) exited]
[Inferior 1 (process 18517) exited with code 015]
The definition of the CHECKERR macro is:
#define CHECKERR(errstr,err) if (err!=MPI_SUCCESS) { printf("ERROR: %s: MPI error code %i: \n",errstr,err); MPI_Error_string(err, errMsg, &errMsgLen); errMsg[errMsgLen]= 0; printf("%s\n", errMsg); /*usage(1);*/ }It checks the error and calls MPI_Error_string on the value returned by the previous call, MPI_T_pvar_get_info in this case. The value is -18 in this case. The call then causes a fatal error inside Open MPI.
I tried running the varlist tool using MPICH 3.2.1, which runs fine (although it reports that there are no performance variables in MPICH).
I tested with both the v4.0.x and master branches, both showing the same error.