Skip to content

Rewrite reduce() and family in C#1246

Open
ErdaradunGaztea wants to merge 30 commits intotidyverse:mainfrom
ErdaradunGaztea:optimize-reduce
Open

Rewrite reduce() and family in C#1246
ErdaradunGaztea wants to merge 30 commits intotidyverse:mainfrom
ErdaradunGaztea:optimize-reduce

Conversation

@ErdaradunGaztea
Copy link
Contributor

@ErdaradunGaztea ErdaradunGaztea commented Jan 28, 2026

Closes #1168 and closes #1243. Your acceptation of #1169 prompted me to tackle a bigger task 🦔

This PR reimplements reduce() (and reduce2(), accumulate(), and accumulate2() as well) in C. The idea came from Hadley's comment on #1168 stating that it'd be hard to performantly add progress bars to reduce() as they are written in C, so I rewrote the function in C instead. It is more performant too, getting reduce() to be on par with base Reduce():

x <- as.list(1:20000)
# Without .init or .progress
bench::mark(
  purrr::reduce_old(x, sum),
  purrr::reduce(x, sum),
  Reduce(sum, x),
  min_time = 1
)
#> # A tibble: 3 × 6
#>   expression                     min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>                <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 purrr::reduce_old(x, sum)   15.1ms  15.37ms      61.9     762KB     81.8
#> 2 purrr::reduce(x, sum)       8.08ms   8.35ms     118.      170KB     73.6
#> 3 Reduce(sum, x)              7.72ms   8.05ms     124.      349KB     51.6

Created on 2026-01-18 with reprex v2.1.1

accumulate() isn't quite as fast as in base R, but there's improvement nonetheless:

x <- as.list(1:20000)
bench::mark(
  purrr::accumulate_old(x, sum, .init = 0),
  purrr::accumulate(x, sum, .init = 0),
  Reduce(sum, x, init = 0, accumulate = TRUE),
  min_time = 1
)
#> # A tibble: 2 × 6
#>   expression                            min  median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>                        <bch:t> <bch:t>     <dbl> <bch:byt>    <dbl>
#> 1 purrr::accumulate_old(x, sum, .i… 26.2ms  27.3ms       36.8    1.45MB    178. 
#> 2 purrr::accumulate(x, sum, .init … 12.92ms 14.14ms      69.7    1.47MB    103. 
#> 3 Reduce(sum, x, init = 0, accumul…  9.33ms  9.86ms     100.   505.53KB     73.1

Created on 2026-01-28 with reprex v2.1.1

I tried not to change any behavior (except for #1243) and hopefully succeeded, though that meant adding a lot of tests in the end. That was also the reason I didn't tackle #1245, as it'd change non-erroring behavior.

As for the code, well... It's not pretty, but I couldn't think of any approach that would look cleaner with interacting .init, .dir, and done(), and trying to make one shared implementation for all four functions with only some differences in certain points of the algorithm. If you prefer, I can separate C loops for reduce() and accumulate() family, that would probably be the most influential one. If I were to say, it's probably 15-20% more complex than the current R implementation (numbers completely made-up).

Finally, the C implementation should make it very easy to add .dir parameter to reduce2() and accumulate2(), as well as make accumulate2() return a named vector if input is named, like accumulate() does. Then again, I didn't want to add more changes than necessary and I can always add it later if you so wish.

No worries, I'm quickly running out of purrr functions to optimize :)

Currently only 1 arg, requires .init, doesn't support directions or accumulation
This would make it more difficult to reason about `SEXP bar` in particular, as we'd not know it is protected already, and that's a very bad pattern.
@ErdaradunGaztea
Copy link
Contributor Author

Note that my last commit was necessitated by the latest vctrs (I had v0.6.5 installed) release adding extra information to the errors. If you'd rather snatch these updated snaps out of this PR because they do not fit the theme here - you're welcome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

accumulate() fails when input has names and using early return Progress bar to reduce

1 participant