Skip to content

File I O Abstraction Design Thoughts

glitch edited this page Jul 15, 2021 · 2 revisions

Thoughts on getting to a better model of file I/O abstraction in Arkouda, NOTE: this is in the very early stages of investigating and creating a design.

Issue

Current design is focused mainly on HDF5 since this is the only currently support format for data I/O from disk (technically you can read from the client side and push to the server to construct pdarray objects but this is not really a scalable approach).

As the Arkouda community looks to add new format such as Apache Parquet, etc. we would like to generalize the API for reading/writing files if possible from both an internal code perspective as well as the client interactive API.

Initial constraints

1D arrays

Currently pdarray only supports single dimensional arrays. While HDF5 supports multi-dimensional datasets we limit our support to groups of single dimensional arrays. This actually maps pretty nicely to the concept of columns which makes it a bit easier to add support for something like Parquet format columnar stores.

C api compatibility

With Chapel we really only have compatibility with C apis for various formats. I believe it is possible to link in C++ libraries but (anecdotally) this gets tricky so in effect we are currently limited to C wrappers and straight C implementations.

Clone this wiki locally