You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[mpich romio bbb5210] make ADIOI_GEN_WriteStrided not step on itself
> Pulled in from mpich romio, branch "main".
> Their commit message is below.
>
> This is part of a batch of commits from the
> following set of PRs:
> * pmodels/mpich#4943
> -- darray fix which contains a flatten fix
> 73a3eba
> c4b5762
> * pmodels/mpich#4995
> -- write strided fix
> bbb5210
> * pmodels/mpich#5100
> -- build fix for -Wpedantic
> ad0e435
> * pmodels/mpich#5099
> -- build fix, they had let file-system=...gpfs bit rot
> e1d42af
> 313289a
> 83bbb82
> * pmodels/mpich#5150
> -- build fix, configure-related _GNU_SOURCE
> a712b56
> 5a036e7
> * pmodels/mpich#5184
> -- build fix, continuation of _GNU_SOURCE fix
> d97c4ee
The ADIOI_GEN_WriteStrided funcion uses data sieving on non-contiguous
types. That is, if it wants to write data at locations
[W...X...Y...Z...]
it reads the whole buffer
[dddddddddddddddd]
changes the locations it wants to write to
[WdddXdddYdddZddd]
then writes the whole thing back. It uses locks to make this safe, but
the problem is this only protects against other parts of the product that
are using locks. And without this PR a peer who is simultaneously making
a simple non-contiguous write wouldn't have locked.
A testcase to demonstrate the original problem is here:
https://gist.github.com/markalle/d7da240c19e57f095c5d1b13240dae24
% mpicc -o x romio_write_timing.c
% mpirun -np 4 ./x
Note: you need to use a filesystem that uses ADIOI_GEN_WriteStrided to
hit the bug. I was using GPFS.
This commit is pulled from wkliao after discussing where to put the
new lock. It adds locks to contiguous writes in independent write
functions when data sieving write is not disabled
Signed-off-by: Mark Allen <[email protected]>
0 commit comments