-
Notifications
You must be signed in to change notification settings - Fork 1k
Open
Description
The below example shows how the default number of threads on my machine (18) results in data.table being much slower than if we get rid of the parallelism. I have a minimal reproducible example below.
This is related to but not exactly like the following issues:
- test and confirm new parallel subset performance #3175
- Proposal for a function to handle many small groups in some situations #4284
# Minimal reproducible example
> NN = 1e5
> set.seed(1)
> DT = data.table(grp1 = as.character(rep(1:(NN/4),each = 4)),
+ grp2 = sample(5000L,NN,TRUE),
+ V = rpois(NN, 10))
>
> setDTthreads(18)
> system.time(DT[ , log(sum(V)), by = grp1,verbose=TRUE])
Detected that j uses these columns: V
Finding groups using forderv ... forder.c received 100000 rows and 1 columns
0.710s elapsed (0.210s cpu)
Finding group sizes from the positions (can be avoided to save RAM) ... 0.000s elapsed (0.000s cpu)
Getting back original order ... forder.c received a vector type 'integer' length 25000
0.000s elapsed (0.000s cpu)
lapply optimization is on, j unchanged as 'log(sum(V))'
GForce is on, left j unchanged
Old mean optimization is on, left j unchanged.
Making each group and running j (GForce FALSE) ...
collecting discontiguous groups took 17.674s for 25000 groups
eval(j) took 0.072s for 25000 calls
17.8s elapsed (8.590s cpu)
user system elapsed
8.80 27.58 18.49
> system.time(DT[ , log(sum(V)), by = .(grp1,grp2),verbose=TRUE])
Detected that j uses these columns: V
Finding groups using forderv ... forder.c received 100000 rows and 2 columns
0.700s elapsed (0.110s cpu)
Finding group sizes from the positions (can be avoided to save RAM) ... 0.000s elapsed (0.000s cpu)
Getting back original order ... forder.c received a vector type 'integer' length 99974
0.000s elapsed (0.000s cpu)
lapply optimization is on, j unchanged as 'log(sum(V))'
GForce is on, left j unchanged
Old mean optimization is on, left j unchanged.
Making each group and running j (GForce FALSE) ...
collecting discontiguous groups took 69.031s for 99974 groups
eval(j) took 0.176s for 99974 calls
00:01:09 elapsed (31.5s cpu)
user system elapsed
31.64 105.61 70.11
>
> setDTthreads(1)
> system.time(DT[ , log(sum(V)), by = grp1,verbose=TRUE])
Detected that j uses these columns: V
Finding groups using forderv ... forder.c received 100000 rows and 1 columns
0.040s elapsed (0.010s cpu)
Finding group sizes from the positions (can be avoided to save RAM) ... 0.000s elapsed (0.000s cpu)
lapply optimization is on, j unchanged as 'log(sum(V))'
GForce is on, left j unchanged
Old mean optimization is on, left j unchanged.
Making each group and running j (GForce FALSE) ...
memcpy contiguous groups took 0.004s for 25000 groups
eval(j) took 0.013s for 25000 calls
0.020s elapsed (0.020s cpu)
user system elapsed
0.03 0.03 0.06
> system.time(DT[ , log(sum(V)), by = .(grp1,grp2),verbose=TRUE])
Detected that j uses these columns: V
Finding groups using forderv ... forder.c received 100000 rows and 2 columns
0.050s elapsed (0.020s cpu)
Finding group sizes from the positions (can be avoided to save RAM) ... 0.000s elapsed (0.000s cpu)
Getting back original order ... forder.c received a vector type 'integer' length 99974
0.000s elapsed (0.000s cpu)
lapply optimization is on, j unchanged as 'log(sum(V))'
GForce is on, left j unchanged
Old mean optimization is on, left j unchanged.
Making each group and running j (GForce FALSE) ...
collecting discontiguous groups took 3.048s for 99974 groups
eval(j) took 0.031s for 99974 calls
3.140s elapsed (1.360s cpu)
user system elapsed
1.38 1.81 3.19 # Output of sessionInfo()
> sessionInfo()
R version 3.6.2 (2019-12-12)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19041)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] extrafont_0.17 stargazer_5.2.2 lfe_2.8-5 Matrix_1.2-18 lmtest_0.9-37 sandwich_2.5-1 texreg_1.36.23 knitr_1.28
[9] kableExtra_1.1.0 gridExtra_2.3 ggridges_0.5.2 lubridate_1.7.4 scales_1.1.0 psych_1.9.12.31 zoo_1.8-7 forcats_0.4.0
[17] stringr_1.4.0 dplyr_0.8.4 purrr_0.3.3 readr_1.3.1 tidyr_1.0.2 tibble_2.1.3 ggplot2_3.2.1 tidyverse_1.3.0
[25] data.table_1.12.9
loaded via a namespace (and not attached):
[1] httr_1.4.1 jsonlite_1.6.1 viridisLite_0.3.0 modelr_0.1.5 Formula_1.2-3 assertthat_0.2.1 highr_0.8 cellranger_1.1.0
[9] Rttf2pt1_1.3.8 pillar_1.4.3 backports_1.1.5 lattice_0.20-38 glue_1.3.1 extrafontdb_1.0 digest_0.6.23 rvest_0.3.5
[17] colorspace_1.4-1 htmltools_0.4.0 plyr_1.8.5 pkgconfig_2.0.3 broom_0.5.4 haven_2.2.0 xtable_1.8-4 webshot_0.5.2
[25] generics_0.0.2 withr_2.1.2 lazyeval_0.2.2 cli_2.0.1 mnormt_1.5-6 magrittr_1.5 crayon_1.3.4 readxl_1.3.1
[33] evaluate_0.14 fs_1.3.1 fansi_0.4.1 nlme_3.1-143 xml2_1.2.2 tools_3.6.2 hms_0.5.3 lifecycle_0.1.0
[41] munsell_0.5.0 reprex_0.3.0 compiler_3.6.2 rlang_0.4.4 grid_3.6.2 rstudioapi_0.10 rmarkdown_2.1 gtable_0.3.0
[49] DBI_1.1.0 R6_2.4.1 stringi_1.4.4 parallel_3.6.2 Rcpp_1.0.3 vctrs_0.2.2 dbplyr_1.4.2 tidyselect_1.0.0
[57] xfun_0.12