Calculations with many groups are much slower by default than with setDTthreads(1)

The below example shows how the default number of threads on my machine (18) results in data.table being much slower than if we get rid of the parallelism. I have a minimal reproducible example below.

This is related to but not exactly like the following issues:
- https://github.com/Rdatatable/data.table/issues/3175
- https://github.com/Rdatatable/data.table/issues/4284

`#` [`Minimal reproducible example`](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example)

```r
> NN = 1e5
> set.seed(1)
> DT = data.table(grp1 = as.character(rep(1:(NN/4),each = 4)),
+                 grp2 = sample(5000L,NN,TRUE),
+                 V = rpois(NN, 10))
> 
> setDTthreads(18)
> system.time(DT[ , log(sum(V)), by = grp1,verbose=TRUE])
Detected that j uses these columns: V 
Finding groups using forderv ... forder.c received 100000 rows and 1 columns
0.710s elapsed (0.210s cpu) 
Finding group sizes from the positions (can be avoided to save RAM) ... 0.000s elapsed (0.000s cpu) 
Getting back original order ... forder.c received a vector type 'integer' length 25000
0.000s elapsed (0.000s cpu) 
lapply optimization is on, j unchanged as 'log(sum(V))'
GForce is on, left j unchanged
Old mean optimization is on, left j unchanged.
Making each group and running j (GForce FALSE) ... 
  collecting discontiguous groups took 17.674s for 25000 groups
  eval(j) took 0.072s for 25000 calls
17.8s elapsed (8.590s cpu) 
   user  system elapsed 
   8.80   27.58   18.49 
> system.time(DT[ , log(sum(V)), by = .(grp1,grp2),verbose=TRUE])
Detected that j uses these columns: V 
Finding groups using forderv ... forder.c received 100000 rows and 2 columns
0.700s elapsed (0.110s cpu) 
Finding group sizes from the positions (can be avoided to save RAM) ... 0.000s elapsed (0.000s cpu) 
Getting back original order ... forder.c received a vector type 'integer' length 99974
0.000s elapsed (0.000s cpu) 
lapply optimization is on, j unchanged as 'log(sum(V))'
GForce is on, left j unchanged
Old mean optimization is on, left j unchanged.
Making each group and running j (GForce FALSE) ... 
  collecting discontiguous groups took 69.031s for 99974 groups
  eval(j) took 0.176s for 99974 calls
00:01:09 elapsed (31.5s cpu) 
   user  system elapsed 
  31.64  105.61   70.11 
> 
> setDTthreads(1)
> system.time(DT[ , log(sum(V)), by = grp1,verbose=TRUE])
Detected that j uses these columns: V 
Finding groups using forderv ... forder.c received 100000 rows and 1 columns
0.040s elapsed (0.010s cpu) 
Finding group sizes from the positions (can be avoided to save RAM) ... 0.000s elapsed (0.000s cpu) 
lapply optimization is on, j unchanged as 'log(sum(V))'
GForce is on, left j unchanged
Old mean optimization is on, left j unchanged.
Making each group and running j (GForce FALSE) ... 
  memcpy contiguous groups took 0.004s for 25000 groups
  eval(j) took 0.013s for 25000 calls
0.020s elapsed (0.020s cpu) 
   user  system elapsed 
   0.03    0.03    0.06 
> system.time(DT[ , log(sum(V)), by = .(grp1,grp2),verbose=TRUE])
Detected that j uses these columns: V 
Finding groups using forderv ... forder.c received 100000 rows and 2 columns
0.050s elapsed (0.020s cpu) 
Finding group sizes from the positions (can be avoided to save RAM) ... 0.000s elapsed (0.000s cpu) 
Getting back original order ... forder.c received a vector type 'integer' length 99974
0.000s elapsed (0.000s cpu) 
lapply optimization is on, j unchanged as 'log(sum(V))'
GForce is on, left j unchanged
Old mean optimization is on, left j unchanged.
Making each group and running j (GForce FALSE) ... 
  collecting discontiguous groups took 3.048s for 99974 groups
  eval(j) took 0.031s for 99974 calls
3.140s elapsed (1.360s cpu) 
   user  system elapsed 
   1.38    1.81    3.19 
```

`#` `Output of sessionInfo()`
```r
> sessionInfo()
R version 3.6.2 (2019-12-12)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19041)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] extrafont_0.17    stargazer_5.2.2   lfe_2.8-5         Matrix_1.2-18     lmtest_0.9-37     sandwich_2.5-1    texreg_1.36.23    knitr_1.28       
 [9] kableExtra_1.1.0  gridExtra_2.3     ggridges_0.5.2    lubridate_1.7.4   scales_1.1.0      psych_1.9.12.31   zoo_1.8-7         forcats_0.4.0    
[17] stringr_1.4.0     dplyr_0.8.4       purrr_0.3.3       readr_1.3.1       tidyr_1.0.2       tibble_2.1.3      ggplot2_3.2.1     tidyverse_1.3.0  
[25] data.table_1.12.9

loaded via a namespace (and not attached):
 [1] httr_1.4.1        jsonlite_1.6.1    viridisLite_0.3.0 modelr_0.1.5      Formula_1.2-3     assertthat_0.2.1  highr_0.8         cellranger_1.1.0 
 [9] Rttf2pt1_1.3.8    pillar_1.4.3      backports_1.1.5   lattice_0.20-38   glue_1.3.1        extrafontdb_1.0   digest_0.6.23     rvest_0.3.5      
[17] colorspace_1.4-1  htmltools_0.4.0   plyr_1.8.5        pkgconfig_2.0.3   broom_0.5.4       haven_2.2.0       xtable_1.8-4      webshot_0.5.2    
[25] generics_0.0.2    withr_2.1.2       lazyeval_0.2.2    cli_2.0.1         mnormt_1.5-6      magrittr_1.5      crayon_1.3.4      readxl_1.3.1     
[33] evaluate_0.14     fs_1.3.1          fansi_0.4.1       nlme_3.1-143      xml2_1.2.2        tools_3.6.2       hms_0.5.3         lifecycle_0.1.0  
[41] munsell_0.5.0     reprex_0.3.0      compiler_3.6.2    rlang_0.4.4       grid_3.6.2        rstudioapi_0.10   rmarkdown_2.1     gtable_0.3.0     
[49] DBI_1.1.0         R6_2.4.1          stringi_1.4.4     parallel_3.6.2    Rcpp_1.0.3        vctrs_0.2.2       dbplyr_1.4.2      tidyselect_1.0.0 
[57] xfun_0.12 
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Calculations with many groups are much slower by default than with setDTthreads(1) #4294

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Calculations with many groups are much slower by default than with setDTthreads(1) #4294

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions