You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
libnbc: int overflow "offset = i * count * extent"
This bug was hit inside an MPI_Iallgather() test using 4 ranks
each contributing 800,000,000 MPI_LONG_LONG where the overflow
is near the top of nbc_allgather_init().
The code was
```rbuf = (char *) recvbuf + rank * recvcount * rcvext;```
where rank and recvcount are ints, and rcvext is an MPI_Aint. But
the left part of the arithmetic happens first and overflows before
it gets to the Aint promotion.
So I changed this to
```ruf = (char *) recvbuf + (MPI_Aint) rcvext * rank * recvcount;```
The philosopy for styling the change this particular way is
1. readability: putting an explicit cast at the start of a
multiplication makes it obvious at a glance that adequate
int-sizing is in place to avoid overflow
2. performance: reordering it such that the first typecast is
extraneous (the extents are already MPI_Aint) avoids any
potential performance penalty vs for example if we put the
typecast on "rank". It would also work to just use
"rcvext*rank*recvcount" since rcvext is already MPI_Aint, but
I like the explicit typecast as it guarantees no overflow
without having to double check each spot to make sure the
extent variable really is an MPI_Aint as one would expect
it to be.
I'm not 100% convinced that performance penalty is real, but it
was a concern raised by bosilca and I can't say for sure it's not
real.
I made the same change a handful of places, although I only
have a testcase specifically hitting the overflow that was at the
top of nbc_allgather_init().
I also took the liberty of changing "int ext" in red_sched_chain()
to "MPI_Aint ext". I can't see any reason for it to just be int.
Signed-off-by: Mark Allen <[email protected]>
0 commit comments