Advice on streaming to transformParquetStream #825
-
|
I'm looking to stream a data set from disk (betterSQLite3) to a Parquet file being written to an sFTP tunnel. I don't see how to do this without putting the entire data set into an Arrow table first, which is not an option for me. Obviously I want to minimize the amount of conversions and memory copies. I'm currently using parquet-js to do this, but the conversion speed from js array to parquet is too slow. All the rest: read, ftp, etc., is fine. I just need to figure out how to stream data chunks into shared memory, while having transformParquetStream create a single file and releasing that memory as it's consumed.... I can read the data in an order supporting the creation of row groups if that helps... Can someone nudge me in the right direction? I'm happy to chase the details down myself. Thanks! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 2 replies
-
|
Sorry to say I haven't used The best I can say is to point you to the test for |
Beta Was this translation helpful? Give feedback.
Sorry to say I haven't used
transformParquetStreammyself; that was added by @H-Plus-Time in #305. (I actually have never used the stream API in Node/JS myself).The best I can say is to point you to the test for
transformParquetStream. Is that helpful for you?