-
Notifications
You must be signed in to change notification settings - Fork 97
File I O Abstraction Design Thoughts
Thoughts on getting to a better model of file I/O abstraction in Arkouda, NOTE: this is in the very early stages of investigating and creating a design.
Current design is focused mainly on HDF5 since this is the only currently support format for data I/O from disk (technically you can read from the client side and push to the server to construct pdarray objects but this is not really a scalable approach).
As the Arkouda community looks to add new format such as Apache Parquet, etc. we would like to generalize the API for reading/writing files if possible from both an internal code perspective as well as the client interactive API.
Currently pdarray only supports single dimensional arrays. While HDF5 supports multi-dimensional datasets we limit our support to groups of single dimensional arrays. This actually maps pretty nicely to the concept of columns which makes it a bit easier to add support for something like Parquet format columnar stores.
With Chapel we really only have compatibility with C apis for various formats. I believe it is possible to link in C++ libraries but (anecdotally) this gets tricky so in effect we are currently limited to C wrappers and straight C implementations.