-
Notifications
You must be signed in to change notification settings - Fork 19
Description
I recently implemented a new feature to my logger that requires the use of .traceback to fetch the function name and line number that generated that logger function call. A while later, I realised this was slowing things down drastically when run sequentially or in parallel with future_Map function. I am not sure if this has the same effect on other future_* functions (e.g., lapply).
Took me a while to figure out and come up with a small example, but here it is:
require(future)
require(future.apply)
myDF <- function(...) {
t <- system.time(x <- .traceback(x=1))[["elapsed"]]
cat("time: ", t, ", length: ", length(x), ", size_mb: ", object.size(x)/1024/1024, "\n", sep="")
# print(x) # prints the entire call stack where the data is completely populated when called from future_Map
base::data.frame(...)
}
ll <- replicate(3, sample(10, 1e6, TRUE), simplify=FALSE)
system.time(Map(function(x, y) myDF(x, y), ll, ll))
# time: 0, length: 6, size_mb: 0.002937317
# time: 0, length: 6, size_mb: 0.002937317
# time: 0, length: 6, size_mb: 0.002937317
# user system elapsed
# 0.15 0.00 0.16
system.time(future_Map(function(x, y) myDF(x, y), ll, ll))
# time: 4.24, length: 27, size_mb: 53.27061
# time: 4.15, length: 27, size_mb: 53.27061
# time: 4.18, length: 27, size_mb: 53.27061
# user system elapsed
# 13.01 0.03 13.05If you uncomment the print(x) statement and run them, you'll see the difference in the way the call is populated. This is probably because of the way you use do.call() in generating the mapply function call: do.call(mapply, args=args). Here args, every element of args gets evaluated and are therefore not just calls/expressions, but materialised objects, I suspect.