Tips for improving encoding time for large vecs/hashmaps? #463
Replies: 4 comments 1 reply
-
If you have a single buffer that contains all strings, you can return that one and the positions to then extract the substrings using |
Beta Was this translation helpful? Give feedback.
-
Unfortunately it will be a mix of strings, integers, strructs, etc. You can see an example here: https://github.com/joshuataylor/serde_examples/blob/main/native/serde_examples/src/lib.rs#L57 |
Beta Was this translation helpful? Give feedback.
-
Well, in an Arrow frame it won't. The whole point of Arrow is to have a per-column homogeneous data. So you can check in the Arrow schema whether a column is a string column and return it in the way that I suggested. The relevant methods are (I'll take the liberty of moving this into a discussion, it's not really an issue with Rustler). |
Beta Was this translation helpful? Give feedback.
-
Awesome! Thanks so much for setting up discussions, I wasn't sure where to place this (as it's not an issue as you mentioned). wrt/ The initial thread:
To this:
This is a specific integration with Snowflake, I'm sure we'll also have a generic NIF at some point (or people can just use polars/nx). I also really appreciate your comments as well, the community here and over in Rust land is fantastic 🙌 edit: I'm going to do an experiment and return all columns as is, then do List.zip across the columns in Elixir |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi!
I'm writing a library for Elixir which deserialises Apache Arrow, specifically the IPC streaming files using arrow2 , then returning them back to Elixir as rows. Using Rustler for this has been an amazing experience, and has taught me a lot about Rust (as a Rust beginner).
This is for a Snowflake adapter for Elixir, they return both JSON/Arrow and from my initial benchmarks when Snowflake sends Arrow it returns 2-3x faster compared to JSON.
I seem to have hit a problem when returning a large amount of strings back to Elixir, as it needs to encode each one? Maybe there is a more efficient way to return data?
Here is an example repo I have: https://github.com/joshuataylor/serde_examples
My results across three different systems:
1/ My desktop, a 32 core threadripper 2990wx, designed for multicore not as much single threaded :)
A desktop 6 core 5600x, pretty decent single core performance.
My laptop, a 2020 m1 Macbook Air:
Beta Was this translation helpful? Give feedback.
All reactions