-
Notifications
You must be signed in to change notification settings - Fork 158
Review H5Easy "extend/part" API. #1018
Description
In H5Easy there's API for reading and writing one element at a time:
HighFive/include/highfive/h5easy_bits/H5Easy_scalar.hpp
Lines 66 to 70 in 5f3ded6
| inline static DataSet dump_extend(File& file, | |
| const std::string& path, | |
| const T& data, | |
| const std::vector<size_t>& idx, | |
| const DumpOptions& options) { |
HighFive/include/highfive/h5easy_bits/H5Easy_scalar.hpp
Lines 120 to 122 in 5f3ded6
| inline static T load_part(const File& file, | |
| const std::string& path, | |
| const std::vector<size_t>& idx) { |
It does this by creating a dataset that can be extended in all directions; and automatically grows if the index of the element written requires it to do so. (Negating our ability to spot off-by-one programming errors.)
The API for reading/writing one element at a time feels like it would tempt users into writing files that way in a loop. Which is a rather serious issue on common HPC hardware (and not great on consumer hardware).
To enable this API it must make a default choice for the chunk size, currently 10^n. That seems very small and is at risk of creating files that can't be read efficiently. Picking it reasonably large might inflate the size of the file by a factor 100 or more.
I think it might be fine to allow users to read and write single elements of an existing dataset, i.e. without the automatically growing aspect; and a warning in the documentation to not use it in a loop. In core we support various selection APIs that are reasonably compact: list of random points, regular hyperslabs (general too) and there's a proposal to allow Cartesian products of simple selections along each axes.