Skip to content

Conversation

@jangorecki
Copy link
Owner

Adding polars to rollbench, a mini rolling functions benchmark.

FYI @etiennebacher @sorhawell @eitsupi in case there is something to optimize. AFAIK in the next release there will be $rolling() that could be applied to quadruple rolling computation which now is made by calling rolling function twice.

polars is not yet very competitive in this field, still it is often faster than pandas. I understand rolling statistics functions are still marked as experimental and yet to be optimized for performance. Looking forward for future improvements.

@jangorecki jangorecki merged commit 2a41d70 into master Nov 21, 2023
@jangorecki jangorecki deleted the polars branch November 21, 2023 19:00
@sorhawell
Copy link

Thanks I see rust-polars has a naive rolling_mean derived from rolling_apply_ so it does not scale well with rolling window size to compute window from scratch each step.

I know the randomForest loss function uses a trick to compute rolling sums where the value entering the window is added to running sum and the value exiting is substracted. Then not the entire window has to recalculated. Mean is just to divide the runnings sums with window size.

For rolling_median the slow part is to sort every window before finding the mid value, and that does not scale well with window size. Maybe some variation of max-heap could be efficient.

@etiennebacher
Copy link

Thanks for heads-up @jangorecki, the performance gap of rolling_median() between data.table and polars is surprising, I've reported it here: pola-rs/polars#12609

@jangorecki
Copy link
Owner Author

jangorecki commented Nov 21, 2023

Wow, if polars is recomputing window for each observation then those numbers are actually very low.

max-heap is one of the two proper ways.

What can be low hanging fruit, to not reimplement all, is to ensure you are doing partial ordering rather than full, as you only need middle value. Implementing quickselect (partial ordering) instead of shell sort (full ordering) for data.table reduced timing tremendously, between 2 to 10 times! It is being used when algo="exact" to handle NAs.

Another "proper way" is what is now being used in data.table (default algo="fast"). You can read more about it in ?frollmedian (using rollmedian branch) or in a Rdatatable/data.table#5692.

@sorhawell
Copy link

I wrote a crude rolling mean function in rust for r-polars. The implementation does not handle missing correctly yet. It is quite faster it takes ~500us to roll over 1E8 with width 1E2 1E4 versus about 1s for data.table. Surprisingly width 1E6 takes 5000us, not sure why that is, but still fast enough.

> x = rnorm(1E8)
> s <- pl$Series(x)
> bench::mark(
+   width_e2 = fast_roll_mean_f64(s,  width = 1E2),
+   width_e4 = fast_roll_mean_f64(s, width = 1E4),
+   width_e6 = fast_roll_mean_f64(s, width = 1E6),
+   dt_w_e4 = data.table::frollmean(x, n = 1E4),
+   check = FALSE
+ )
# A tibble: 4 × 13
  expression      min   median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time result memory     time           gc      
  <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl> <int> <dbl>   <bch:tm> <list> <list>     <list>         <list>  
1 width_e2   154.91µs 186.66µs  3256.         280B        0  1628     0   499.94ms <NULL> <Rprofmem> <bench_tm>     <tibble>
2 width_e4   672.42µs 766.95µs  1234.         280B        0   618     0   500.74ms <NULL> <Rprofmem> <bench_tm>     <tibble>
3 width_e6     5.52ms   5.98ms   160.         280B        0    80     0   500.02ms <NULL> <Rprofmem> <bench_tm>     <tibble>
4 dt_w_e4       1.52s    1.52s     0.658     763MB        0     1     0      1.52s <NULL> <Rprofmem> <bench_tm [1]> <tibble>
> 

@jangorecki
Copy link
Owner Author

jangorecki commented Nov 26, 2023

Nice,

  • :: adds some overhead which in very fast computations can be significant. Therefore loading library should be preferred. Especially when it's a first call to this namespace, it's a lot that has to happen rather than just this single function call.
  • It seems that DT call has been called once while remaining ones 80-1600 times, it would be useful if you could include max statistic rather than only min and median because possibly it is the first call that has the overhead mentioned. Then we could at least compare max-to-max and not max-to-median as it is now.

@sorhawell
Copy link

sorhawell commented Nov 26, 2023

oh no that was too good to be true 🤣 this is a more fair comparison. And the two rolls are similar in speed, dt is slightly faster.

> bench::mark(
+   width_e2 = fast_roll_mean_f64(s,  width = 1E2),
+   width_e4 = fast_roll_mean_f64(s, width = 1E4),
+   width_e6 = fast_roll_mean_f64(s, width = 1E6),
+   dt_w_e2 = data.table::frollmean(x, n = 1E2),
+   dt_w_e4 = data.table::frollmean(x, n = 1E4),
+   dt_w_e6 = data.table::frollmean(x, n = 1E6),
+   check = FALSE
+ )
# A tibble: 6 × 13
  expression      min   median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time result memory     time           gc      
  <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl> <int> <dbl>   <bch:tm> <list> <list>     <list>         <list>  
1 width_e2       1.1s     1.1s     0.911      280B    0         1     0       1.1s <NULL> <Rprofmem> <bench_tm [1]> <tibble>
2 width_e4      1.14s    1.14s     0.878      280B    0         1     0      1.14s <NULL> <Rprofmem> <bench_tm [1]> <tibble>
3 width_e6      1.15s    1.15s     0.872      280B    0         1     0      1.15s <NULL> <Rprofmem> <bench_tm [1]> <tibble>
4 dt_w_e2       1.08s    1.08s     0.921     763MB    0.921     1     1      1.08s <NULL> <Rprofmem> <bench_tm [1]> <tibble>
5 dt_w_e4    959.47ms 959.47ms     1.04      763MB    0         1     0   959.47ms <NULL> <Rprofmem> <bench_tm [1]> <tibble>
6 dt_w_e6       1.05s    1.05s     0.950     763MB    0.950     1     1      1.05s <NULL> <Rprofmem> <bench_tm [1]> <tibble>
Warning message:
Some expressions had a GC in every iteration; so filtering is disabled. 

@sorhawell
Copy link

sry for not addressing all your suggestions for benchmarking. I think we might include roll function in r-polars, and we can do a more fair comparison later.

The complete way to do it is to add it directly into rust-polars and support any datatype. It will take a lot of time to write that PR for me, and I only work a little these days on r-polars.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants