Skip to content

Calculations with many groups are much slower by default than with setDTthreads(1) #4294

@nhirschey

Description

@nhirschey

The below example shows how the default number of threads on my machine (18) results in data.table being much slower than if we get rid of the parallelism. I have a minimal reproducible example below.

This is related to but not exactly like the following issues:

# Minimal reproducible example

> NN = 1e5
> set.seed(1)
> DT = data.table(grp1 = as.character(rep(1:(NN/4),each = 4)),
+                 grp2 = sample(5000L,NN,TRUE),
+                 V = rpois(NN, 10))
> 
> setDTthreads(18)
> system.time(DT[ , log(sum(V)), by = grp1,verbose=TRUE])
Detected that j uses these columns: V 
Finding groups using forderv ... forder.c received 100000 rows and 1 columns
0.710s elapsed (0.210s cpu) 
Finding group sizes from the positions (can be avoided to save RAM) ... 0.000s elapsed (0.000s cpu) 
Getting back original order ... forder.c received a vector type 'integer' length 25000
0.000s elapsed (0.000s cpu) 
lapply optimization is on, j unchanged as 'log(sum(V))'
GForce is on, left j unchanged
Old mean optimization is on, left j unchanged.
Making each group and running j (GForce FALSE) ... 
  collecting discontiguous groups took 17.674s for 25000 groups
  eval(j) took 0.072s for 25000 calls
17.8s elapsed (8.590s cpu) 
   user  system elapsed 
   8.80   27.58   18.49 
> system.time(DT[ , log(sum(V)), by = .(grp1,grp2),verbose=TRUE])
Detected that j uses these columns: V 
Finding groups using forderv ... forder.c received 100000 rows and 2 columns
0.700s elapsed (0.110s cpu) 
Finding group sizes from the positions (can be avoided to save RAM) ... 0.000s elapsed (0.000s cpu) 
Getting back original order ... forder.c received a vector type 'integer' length 99974
0.000s elapsed (0.000s cpu) 
lapply optimization is on, j unchanged as 'log(sum(V))'
GForce is on, left j unchanged
Old mean optimization is on, left j unchanged.
Making each group and running j (GForce FALSE) ... 
  collecting discontiguous groups took 69.031s for 99974 groups
  eval(j) took 0.176s for 99974 calls
00:01:09 elapsed (31.5s cpu) 
   user  system elapsed 
  31.64  105.61   70.11 
> 
> setDTthreads(1)
> system.time(DT[ , log(sum(V)), by = grp1,verbose=TRUE])
Detected that j uses these columns: V 
Finding groups using forderv ... forder.c received 100000 rows and 1 columns
0.040s elapsed (0.010s cpu) 
Finding group sizes from the positions (can be avoided to save RAM) ... 0.000s elapsed (0.000s cpu) 
lapply optimization is on, j unchanged as 'log(sum(V))'
GForce is on, left j unchanged
Old mean optimization is on, left j unchanged.
Making each group and running j (GForce FALSE) ... 
  memcpy contiguous groups took 0.004s for 25000 groups
  eval(j) took 0.013s for 25000 calls
0.020s elapsed (0.020s cpu) 
   user  system elapsed 
   0.03    0.03    0.06 
> system.time(DT[ , log(sum(V)), by = .(grp1,grp2),verbose=TRUE])
Detected that j uses these columns: V 
Finding groups using forderv ... forder.c received 100000 rows and 2 columns
0.050s elapsed (0.020s cpu) 
Finding group sizes from the positions (can be avoided to save RAM) ... 0.000s elapsed (0.000s cpu) 
Getting back original order ... forder.c received a vector type 'integer' length 99974
0.000s elapsed (0.000s cpu) 
lapply optimization is on, j unchanged as 'log(sum(V))'
GForce is on, left j unchanged
Old mean optimization is on, left j unchanged.
Making each group and running j (GForce FALSE) ... 
  collecting discontiguous groups took 3.048s for 99974 groups
  eval(j) took 0.031s for 99974 calls
3.140s elapsed (1.360s cpu) 
   user  system elapsed 
   1.38    1.81    3.19 

# Output of sessionInfo()

> sessionInfo()
R version 3.6.2 (2019-12-12)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19041)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] extrafont_0.17    stargazer_5.2.2   lfe_2.8-5         Matrix_1.2-18     lmtest_0.9-37     sandwich_2.5-1    texreg_1.36.23    knitr_1.28       
 [9] kableExtra_1.1.0  gridExtra_2.3     ggridges_0.5.2    lubridate_1.7.4   scales_1.1.0      psych_1.9.12.31   zoo_1.8-7         forcats_0.4.0    
[17] stringr_1.4.0     dplyr_0.8.4       purrr_0.3.3       readr_1.3.1       tidyr_1.0.2       tibble_2.1.3      ggplot2_3.2.1     tidyverse_1.3.0  
[25] data.table_1.12.9

loaded via a namespace (and not attached):
 [1] httr_1.4.1        jsonlite_1.6.1    viridisLite_0.3.0 modelr_0.1.5      Formula_1.2-3     assertthat_0.2.1  highr_0.8         cellranger_1.1.0 
 [9] Rttf2pt1_1.3.8    pillar_1.4.3      backports_1.1.5   lattice_0.20-38   glue_1.3.1        extrafontdb_1.0   digest_0.6.23     rvest_0.3.5      
[17] colorspace_1.4-1  htmltools_0.4.0   plyr_1.8.5        pkgconfig_2.0.3   broom_0.5.4       haven_2.2.0       xtable_1.8-4      webshot_0.5.2    
[25] generics_0.0.2    withr_2.1.2       lazyeval_0.2.2    cli_2.0.1         mnormt_1.5-6      magrittr_1.5      crayon_1.3.4      readxl_1.3.1     
[33] evaluate_0.14     fs_1.3.1          fansi_0.4.1       nlme_3.1-143      xml2_1.2.2        tools_3.6.2       hms_0.5.3         lifecycle_0.1.0  
[41] munsell_0.5.0     reprex_0.3.0      compiler_3.6.2    rlang_0.4.4       grid_3.6.2        rstudioapi_0.10   rmarkdown_2.1     gtable_0.3.0     
[49] DBI_1.1.0         R6_2.4.1          stringi_1.4.4     parallel_3.6.2    Rcpp_1.0.3        vctrs_0.2.2       dbplyr_1.4.2      tidyselect_1.0.0 
[57] xfun_0.12 

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions