-
Notifications
You must be signed in to change notification settings - Fork 0
polars in R
polars is the fastest new data manipulation library written in Rust using Apache arrow storage. For e.g. larger data pipelines polars brings to R:
- Lazy file scanners (parquet, csv, idf, ....)
- Lazy interaction with SQL databases
- Query optimization across mixed data sources
- Seemless multi-threading.
- Easy and powerful scalability to hundreds of CPU's without cluster computing or much configuration.
- A type rich environment
- The immutable + (copy-on-write) data structures are very true to the spirit of the R functional paradigms.
- data.table package: C instead of Rust, not arrow storage, no optimization, no lazy syntax. Still pretty awesome :)
- arrow package: arrow storage + dplyr. No optimization, no extensive multithredding. A very popular syntax :)
- sparkR: polars copied the syntax. Great for Big Data. Cumbersome to setup, especially in a CI/CD machine-learning environment. Not very efficient (computation/resource). Long boot-up times. Only reasonable fast when using large clusters for long periods.
- [sorhawell minipolars R package] (https://https://github.com/sorhawell/minipolars/README.md).
- extendr: invaluable ground work to fuse R and rust. Template.
- py-polars: How polars was implemented in python.
- nodejs-polars: How polars was implemented in node-js.
- The book for starting rust: It's an amazing journey. After, you will see programming differently.
If R should stay relevant as a production language, then polars is a great stepping stone. For any problem where computation resources are a limiting factor, polars should be considered.
Contributors, please contact mentors below after completing at least one of the tests below.
-
Soren H. Welling [email protected] author of minipolars. New to R-GSOC. Independent consultant tackling data science problems with R, C++ and python. On a deep dive into rust since last year. PhD in some ML + computational chemistry.
-
Toby Hocking [email protected] has 10+ years experience in R-GSOC, and can co-mentor.
Contributors, please do one or more of the following tests before contacting the mentors above.
MENTORS: write several tests that potential contributors can do to demonstrate their capabilities for this particular project. Ask some hard questions that will give you insight about how the contributors write code to solve problems. You'll see that the harder the questions that you ask, the easier it will be for you to choose between the contributors that apply for your project! Please modify the suggestions below to make them specific for your project.
- Easy: something that any useR should be able to do, e.g. download some existing package listed in the Related Work, and run it on some example data.
- Medium: something a bit more complicated. You can encourage contributors to write a script or some functions that show their R coding abilities.
- Hard: Can the contributor write a package with Rd files, tests, and vignettes? If your package interfaces with non-R code, can the contributor write in that other language?
Contributors, please post a link to your test results here.
- EXAMPLE CONTRIBUTOR 1 NAME, LINK TO GITHUB PROFILE, LINK TO TEST RESULTS.