Skip to content

Conversation

@wkliao
Copy link
Collaborator

@wkliao wkliao commented Aug 19, 2025

After several tests using different versions of OpenMPI, it appears that
5.0.6 fixed the data sieving problem we suspect. This PR demonstrates
it by building Darshan with both 5.0.5 and 5.0.6. Only 5.0.5 failed.

The small test MPI program mpi_file_write.c calls only MPI_File_write()
once, running on 4 MPI processes. when using OpenMPI 5.0.5, the test
program ran fine, but the contents of generated Darshan log file were corrupted.

According to OpenMPI 5.0.6 Release note:

  • Detailed Locking Protocol: Modified default file-locking protocols in
    UFS component to ensure data consistency, especially when using
    data-sieving operations, which require broader locking.

Most likely relevant fix in OpenMPI 5.0.6 is open-mpi/ompi#12759

@github-actions github-actions bot added the CI continuous integration label Aug 19, 2025
@wkliao wkliao added DO NOT MERGE Tests only. CI continuous integration and removed CI continuous integration labels Aug 19, 2025
@wkliao
Copy link
Collaborator Author

wkliao commented Oct 1, 2025

Close this for now. It can be reopened later to test the future release of OpenMPI.

@wkliao wkliao closed this Oct 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI continuous integration DO NOT MERGE Tests only.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant