-
Notifications
You must be signed in to change notification settings - Fork 19
Open
Description
I have a number of tasks that look like: lapply(long_list, fast_function) and I'd like to get away from using mclapply (for reasons you've talked about before).
However, in my benchmarks I see that future_apply has a larger overhead comapred to parLapply/mclapply.
Are there parameters I can tune to improve the performance on these types of tasks?
An example:
library(dplyr)
library(parallel)
library(future.apply)
library(microbenchmark)
plan(multisession(workers=4))
cl <- parallel::makeCluster(4)
v <- paste0(paste0("gene", 1:100), "*", 1:3)
v <- sample(v, 10000, replace=T)
parL <- function(v) {
parallel::clusterExport(cl, varlist = "%>%")
v <- parallel::parLapply(cl, v, function(.x) {
gsub("\\*$", "", .x) %>% gsub("\\*.+$", "", .) %>% unique %>%
paste0(collapse = ",")
})
}
serial <- function(v) {
v <- lapply(v, function(.x) {
gsub("\\*$", "", .x) %>% gsub("\\*.+$", "", .) %>% unique %>%
paste0(collapse = ",")
})
}
mcl <- function(v) {
v <- mclapply(v, function(.x) {
gsub("\\*$", "", .x) %>% gsub("\\*.+$", "", .) %>% unique %>%
paste0(collapse = ",")
}, mc.cores=4)
}
fut <- function(v) {
v <- future_lapply(v, function(.x) {
gsub("\\*$", "", .x) %>% gsub("\\*.+$", "", .) %>% unique %>%
paste0(collapse = ",")
})
}
microbenchmark(parL = parL(v), mcl = mcl(v), serial = serial(v), fut = fut(v), times = 5, setup=gc())
Unit: milliseconds
expr min lq mean median uq max neval cld
parL 529.5245 534.1097 677.4822 640.9563 746.9266 935.8941 5 a
mcl 445.8535 451.9500 464.9154 459.4391 474.9048 492.4295 5 a
serial 1339.9738 1451.7585 1467.4781 1461.9080 1517.0687 1566.6813 5 b
fut 1059.6930 1060.1854 1342.6222 1064.8015 1456.4210 2072.0099 5 b
Metadata
Metadata
Assignees
Labels
No labels