Distributed file read #2010
-
Hi PIO! I'm developing a parallel read interface using GDAL for GIS files. I've mimicked the way that pio_read_darray_nc() behaves for NETCDF4P (you can say if this is not the right way). I'm curious, what controls the number of procs where the read occurs? It looks like maxregions. Is this correct? Regardless the file size other file configs, the read always happens on one proc (even if num_iotasks > 1). I simply want to test that the parallel read works and the array is formed correctly. Thank you. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
The number of tasks that participate in the read is controlled by variable num_iotasks in the call to pio_init. If the read is only happening on a single task regardless of the value of numiotasks, this suggests that you are using the box rearranger with a rather small decomposition. maxregions is an internal variable that has nothing to do with the number of io-tasks and is related to the fragmentation of the data in memory with respect to the file order. I would recommend looking into the pnetcdf interface, it is generally faster than that of netcdf4/hdf5. |
Beta Was this translation helpful? Give feedback.
The number of tasks that participate in the read is controlled by variable num_iotasks in the call to pio_init.
https://github.com/NCAR/ParallelIO/blob/main/src/clib/pioc.c#L1272
If the read is only happening on a single task regardless of the value of numiotasks, this suggests that you are using the box rearranger with a rather small decomposition. maxregions is an internal variable that has nothing to do with the number of io-tasks and is related to the fragmentation of the data in memory with respect to the file order. I would recommend looking into the pnetcdf interface, it is generally faster than that of netcdf4/hdf5.