use arrow::read_parquet instead of nanoparquet

I've found in my benchmarks nanoparquet to be much less efficient than arrow in term of speed and RAM usage

```
        expression median mem_alloc   name   size
            <char>  <num>     <num> <char> <char>
 1:     df_parquet  1.153     5.578  write  small
 2: df_nanoparquet  0.674   183.986  write  small
 3:     dt_parquet  5.172     0.018  write  small
 4: dt_nanoparquet  0.656   183.876  write  small
 5:     df_parquet 10.878     0.015  write    big
 6: df_nanoparquet 10.182  2068.884  write    big
 7:     dt_parquet 11.461     0.015  write    big
 8: dt_nanoparquet 10.038  2068.947  write    big
 9:     df_parquet  0.088    34.901   read  small
10: df_nanoparquet  0.414   183.187   read  small
11:     df_parquet  1.187     0.009   read    big
12: df_nanoparquet  5.180  1324.072   read    big
```
 speed  and RAM usage when reading big files are not very good .
 
 on nanoparquet  repo they say : 
 
 ```
 Being single-threaded and not fully optimized, 
 nanoparquet is probably not suited well for large data sets. 
 It should be fine for a couple of gigabytes. 
 Reading or writing a ~250MB file that has 32 million rows 
 and 14 columns takes about 10-15 seconds on an M2 MacBook Pro.
  For larger files, use Apache Arrow or DuckDB.
 ```
 
rio uses arrow for feather already so I'm not sure why we rely on nanoparquet for parquet

If you keep nanoparquet as default maybe we could have an option to use arrow instead?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

use arrow::read_parquet instead of nanoparquet #462

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

use arrow::read_parquet instead of nanoparquet #462

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions