Skip to content

Conversation

@edgargabriel
Copy link
Member

@edgargabriel edgargabriel commented Oct 29, 2024

add an implementation of the read_all operation that uses the two-phase I/O algorithm using even partitioning, i.e. the same base idea that is used by the write_all operation of the vulcan component. Until now, all components have been using the same 'generic' read_all code, which was based on the fcoll/dynamic module idea.

In addition to using the 'correct' data partitioning approach for the component, the vulcan read_all implementation also adds some other features that were there for the write_all operations, but not for the (generic) read_all algorithm used by all components so far. Specifically, it can overlap the execution of the I/O phase and the communication phase. The algorithm can also use GPU buffers for aggregation.

The code has been tested with:

  • the ompio testsuite
  • the hdf5 testsuite for 4 and 8 processes (and some runs with 12 processes)
  • the internal hip-mpi testsuite for both device and host buffers.

The PR looks complicated, but its actually not that bad. The first two commits perform some code cleanup and reorganization of the fcoll_vulcan_file_write_all function, with the specific goal of simplifying the code and allowing to reuse big chunks of the code for the read_all operation. The third commit adds than the actual new algorithm code for read_all.

As a side note, I noticed one more issue in the code base regarding the registration/deregistration of the ompio progress function, but I will fix that after this PR is merged.

@edgargabriel edgargabriel changed the title Topic/vulcan two phase read all topic/vulcan: add two_phase read_all Oct 29, 2024
@edgargabriel edgargabriel force-pushed the topic/vulcan-two-phase-read-all branch from dbed343 to 69801e8 Compare October 29, 2024 21:15
@edgargabriel edgargabriel changed the title topic/vulcan: add two_phase read_all fcoll/vulcan: add two_phase read_all Oct 29, 2024
Copy link
Member

@bosilca bosilca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have not updated/added your copyright blurb. Other than that everything looks good, but I'm not well versed into the MPIO code.

@edgargabriel
Copy link
Member Author

@bosilca thank you, good point, I will do that before merging!

This is a cleanup of the write_all operation, in preparation for adding read_all implementation to the vulcan component. Specifically:

- remove the multiple group option: this was envisioned for the vulcan component, but never fully implemented. Therefore, having a few stubs at some locations (and an mca parameter) doesn't make sense, and are being removed to simplify the code.

- remote the write_chunksize option: this was an artifact of the code based having evolved from dynamic_gen2 (where the option makes sense). However, in vulcan it doesn't really make sense and was actually not correctly implemented eitherway, so if somebody would have used that option, it would prbably have failed.

The changed have been validated with the ompio testsuite as well as the hdf5 testphdf5, t_shapesame, and t_filters_parallel tests.

Signed-off-by: Edgar Gabriel <[email protected]>
extract some code into stand alone routines in preparation for adding a vulcan read_all implementation.
Specifically, this pr adds new routines for :
 - mca_fcoll_vulcan_calc_blocklen_disps
 - mca_fcoll_vulcan_calc_file_offsets
 - mca_fcoll_vulcan_calc_io_array

which will be reused in the read_all as well.
Also, some white-space cleanup of the code.

Signed-off-by: Edgar Gabriel <[email protected]>
add an implementation of the read_all operation that uses the two-phase I/O algorithm using even partitioning, i.e. the same base idea that is used by the write_all operation of this component.

In addition to using the 'correct' data partitioning approach for the component, the vulcan read_all implementation also adds some other features that were there for the write_all operations, but not for the (generic) read_all algorithm used by all components so far. Specifically, it can overlap the execution of the I/O phase and the communication phase. The algorithm can also use GPU buffers for aggregation.

Signed-off-by: Edgar Gabriel <[email protected]>
@edgargabriel edgargabriel force-pushed the topic/vulcan-two-phase-read-all branch from 69801e8 to 030ead1 Compare October 30, 2024 15:00
@edgargabriel edgargabriel merged commit a9f84cc into open-mpi:main Oct 30, 2024
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants