-
Notifications
You must be signed in to change notification settings - Fork 1k
Description
Is your feature request related to a problem? Please describe.
Right now, systems like Sirius, RapidsMPF, and other users of libcudf that want to move data across some interface like the network, to disk, or to host memory en up using the pack / unpack apis to get access to the underlying data and make it available somewhere else.
This can lead to redundant copies and allocations being made. To give an example:
Say I want to move a cudf::table through a NIC to another GPU on another node. One option is to pack the data into a single buffer with cudf::pack and then send over that large buffer with the attached metadata. If we are using a library like UCX which can copy data directly from the GPU then we end up having to duplicate the data we want to send in GPU memory before we sending it. This is expensive in both memory and time (allocation + extra copy). Similarly on the reverse side, if a user wants to get a cudf::Table from a packed buffer they have to incur another allocation and copy to move it out of the non owning view that was created into an owned representation like a cudf::table or be forced to add some abstraction to be able to treat cudf::table and the packed representation the same way across their codebase which adds lots of code complexity.
Describe the solution you'd like
Another api for serializing / deserializing which provides
std::pair<std::vectorrmm::device_buffer, metadata> get_buffers_and_metadata()
cudf::table make_table_from_buffers_and_metadata(std::vectorrmm::device_buffer,metadata)
This gives us the ability to have 0 copy serialization and deserialization without forcing the user to have to dive deeply into the libcudf code in order to figure out how to extract all of the relevant buffers for the different data types.
Describe alternatives you've considered
In the past the way I have gotten around this is by understanding how every column type stores its data. Having to know when child buffers exsist, and what the different pieces represent so that we can extract the device_buffers and recreate the cudf::table when needed. This is fragile and can break with changes to libcudf. It also requires investment by many software builders as opposed to just once in libcudf.