Skip to content

Proposed bugfix for batchtools reveals bug in future.batchtools? #74

@stuvet

Description

@stuvet

I have submitted a simple pull request for a bugfix in batchtools which fuelled some of its behaviour mentioned in #73.

I'm writing here because this proposed bugfix reveals an error in future.batchtools, though based on the future.debug output I don't believe the two are related - just that the batchtools bug previously threw the error first.

Describe the bug
When a slurm worker is availble for the job, (& when the batchtools::waitForFile is not required by batchtools::getLog -> batchtools::readLog), everything functions correctly (note the status):

[01:10:21.286] Job Registry
[01:10:21.286]   Backend  : Slurm
[01:10:21.286]   File dir : /home/stuart/pipelines/production_reports/.future/20210605_011015-MndN2h/todo_chains_5cd13a69_900365739
[01:10:21.286]   Work dir : /home/stuart/pipelines/production_reports
[01:10:21.286]   Jobs     : 0
[01:10:21.286]   Seed     : 26422
[01:10:21.286]   Writeable: TRUE
[01:10:21.287] Attaching 1 packages (‘targets’) ...
[01:10:21.294] Attaching 1 packages (‘targets’) ... DONE
[01:10:22.894] batchtools::batchMap()
Adding 1 jobs ...
[01:10:22.919] Created BatchtoolsFuture future #1
Submitting 1 jobs in 1 chunks using cluster functions 'Slurm' ...
[01:10:24.107] Launched future #1
[01:10:24.109] plan(): Setting new future strategy stack:
[01:10:24.109] List of future strategies:
[01:10:24.109] 1. FutureStrategy:
[01:10:24.109]    - args: function (..., envir = parent.frame(), template = "batchtools.slurm.tmpl", workers = 8, resources = list(bucket = "targets-production-reports-209813", walltime = 3600, memory = 2048, ncpus = 1, ntasks = 1, partition = "ph4"))
[01:10:24.109]    - tweaked: TRUE
[01:10:24.109]    - call: future::plan(slurm)
[01:10:24.111] plan(): nbrOfWorkers() = 8
[01:10:24.112] status() for BatchtoolsFuture ...
[01:10:24.377] - status: ‘defined’, ‘running’, ‘started’, ‘submitted’, ‘system’
[01:10:24.377] status() for BatchtoolsFuture ... done

But when a slurm worker needs to be provisioned to run the job (& so the batchtools::waitForFile will also be called), the initial result of the call to future.batchtools::status(future) in future.batchtools::await is incorrect:

[01:43:55.613] Job Registry
[01:43:55.613]   Backend  : Slurm
[01:43:55.613]   File dir : /home/stuart/pipelines/production_reports/.future/20210605_014355-hJknNd/todo_chains_dced720f_1866986666
[01:43:55.613]   Work dir : /home/stuart/pipelines/production_reports
[01:43:55.613]   Jobs     : 0
[01:43:55.613]   Seed     : 23275
[01:43:55.613]   Writeable: TRUE
[01:43:55.614] Attaching 1 packages (‘targets’) ...
[01:43:55.622] Attaching 1 packages (‘targets’) ... DONE
[01:43:57.447] batchtools::batchMap()
Adding 1 jobs ...
[01:43:57.482] Created BatchtoolsFuture future #1
Submitting 1 jobs in 1 chunks using cluster functions 'Slurm' ...
[01:43:58.682] Launched future #1
[01:43:58.683] plan(): Setting new future strategy stack:
[01:43:58.683] List of future strategies:
[01:43:58.683] 1. FutureStrategy:
[01:43:58.683]    - args: function (..., envir = parent.frame(), template = "batchtools.slurm.tmpl", workers = 8, resources = list(bucket = "targets-production-reports-209813", walltime = 3600, memory = 2048, ncpus = 1, ntasks = 1, partition = "ph4"))
[01:43:58.683]    - tweaked: TRUE
[01:43:58.683]    - call: future::plan(slurm)
[01:43:58.684] plan(): nbrOfWorkers() = 8
[01:43:58.685] status() for BatchtoolsFuture ...
[01:43:58.778] - status: ‘defined’, ‘expired’, ‘submitted’
[01:43:58.779] status() for BatchtoolsFuture ... done
[01:43:58.779] status() for BatchtoolsFuture ...
[01:43:58.870] - status: ‘defined’, ‘expired’, ‘submitted’
[01:43:58.871] status() for BatchtoolsFuture ... done
[01:43:58.872] batchtools::waitForJobs() ...
[01:44:03.310] - batchtools::waitForJobs(): FALSE
[01:44:03.310] status() for BatchtoolsFuture ...
[01:44:03.414] - status: ‘defined’, ‘expired’, ‘submitted’
[01:44:03.414] status() for BatchtoolsFuture ... done
[01:44:03.414] - status(): ‘defined’, ‘expired’, ‘submitted’
[01:44:03.414] batchtools::waitForJobs() ... done
[01:44:03.415] Results:
[01:44:03.415] status() for BatchtoolsFuture ...
[01:44:03.508] - status: ‘defined’, ‘expired’, ‘submitted’
[01:44:03.509] status() for BatchtoolsFuture ... done
[01:44:03.509] status() for BatchtoolsFuture ...
[01:44:03.601] - status: ‘defined’, ‘expired’, ‘submitted’
[01:44:03.601] status() for BatchtoolsFuture ... done

At this point, logging inserted into batchtools:::waitForFile begins to appear. No more future.debug messages appear until the log file has been detected (now, after the proposed bugfix) & batchtools::waitForFile exits.

[01:44:47.684] plan(): Setting new future strategy stack:
[01:44:47.684] List of future strategies:
[01:44:47.684] 1. sequential:
[01:44:47.684]    - args: function (..., envir = parent.frame())
[01:44:47.684]    - tweaked: FALSE
[01:44:47.684]    - call: future::plan(future::sequential)
[01:44:47.685] plan(): nbrOfWorkers() = 1
Error : BatchtoolsExpiration: Future ('todo_chains_dced720f') expired (registry path /home/stuart/pipelines/production_reports/.future/20210605_014355-hJknNd/todo_chains_dced720f_1866986666).. No logged output exist.
In addition: There were 17 warnings (use warnings() to see them)
[01:44:47.724] status() for BatchtoolsFuture ...
[01:44:47.829] - status: ‘defined’, ‘running’, ‘started’, ‘submitted’, ‘system’
[01:44:47.829] status() for BatchtoolsFuture ... done

To be clear, the logged output does exist, and continues to be written by the running job after future.batchtools::await flags the job as expired & exits.

Please let me know if there's anything else I can do to help resolve this issue.

Expected behavior
future.batchtools::await waits for the running job to exit, even when workers need to be provisioned or batchtools::waitForFile is triggered.

Session information

R version 4.1.0 (2021-05-18)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS:   /apps/R/4.1.0/lib64/R/lib/libRblas.so
LAPACK: /apps/R/4.1.0/lib64/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8       
 [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] defineDataAccess_0.0.0.9002 forcats_0.5.1               stringr_1.4.0              
 [4] purrr_0.3.4                 readr_1.4.0                 tidyr_1.1.3                
 [7] tibble_3.1.2                ggplot2_3.3.3               tidyverse_1.3.1            
[10] brms_2.15.0                 Rcpp_1.0.6                  dplyr_1.0.6                
[13] future.batchtools_0.10.0    batchtools_0.9.15           stringi_1.6.2              
[16] future_1.21.0               stantargets_0.0.2.9000      tarchetypes_0.2.0          
[19] targets_0.4.2               aws.s3_0.3.21               googleCloudStorageR_0.6.0  
[22] data.table_1.14.0          

loaded via a namespace (and not attached):
  [1] utf8_1.2.1            R.utils_2.10.1        tidyselect_1.1.1     
  [4] lme4_1.1-27           htmlwidgets_1.5.3     grid_4.1.0           
  [7] aws.signature_0.6.0   munsell_0.5.0         base64url_1.4        
 [10] codetools_0.2-18      nleqslv_3.3.2         DT_0.18              
 [13] miniUI_0.1.1.1        withr_2.4.2           Brobdingnag_1.2-6    
 [16] colorspace_2.0-1      fst_0.9.4             knitr_1.33           
 [19] rstudioapi_0.13       stats4_4.1.0          bayesplot_1.8.0      
 [22] listenv_0.8.0         emmeans_1.6.0         rstan_2.26.1         
 [25] bit64_4.0.5           farver_2.1.0          bridgesampling_1.1-2 
 [28] coda_0.19-4           parallelly_1.25.0     vctrs_0.3.8          
 [31] generics_0.1.0        TH.data_1.0-10        xfun_0.23            
 [34] R6_2.5.0              markdown_1.1          gamm4_0.2-6          
 [37] projpred_2.0.2        arrow_4.0.0.1         cachem_1.0.5         
 [40] assertthat_0.2.1      promises_1.2.0.1      scales_1.1.1         
 [43] multcomp_1.4-17       debugme_1.1.0         gtable_0.3.0         
 [46] globals_0.14.0        processx_3.5.2        sandwich_3.0-1       
 [49] rlang_0.4.11          miniCRAN_0.2.14       splines_4.1.0        
 [52] gargle_1.1.0          broom_0.7.6           brew_1.0-6           
 [55] checkmate_2.0.0       inline_0.3.18         modelr_0.1.8         
 [58] yaml_2.2.1            reshape2_1.4.4        abind_1.4-5          
 [61] threejs_0.3.3         crosstalk_1.1.1       backports_1.2.1      
 [64] httpuv_1.6.1          rsconnect_0.8.17      tensorA_0.36.2       
 [67] tools_4.1.0           feather_0.3.5         ellipsis_0.3.2       
 [70] posterior_0.1.6       ggridges_0.5.3        plyr_1.8.6           
 [73] base64enc_0.1-3       progress_1.2.2        ps_1.6.0             
 [76] prettyunits_1.1.1     openssl_1.4.4         zoo_1.8-9            
 [79] haven_2.4.1           fs_1.5.0              furrr_0.2.2          
 [82] magrittr_2.0.1        reprex_2.0.0          colourpicker_1.1.0   
 [85] mvtnorm_1.1-1         matrixStats_0.58.0    stringfish_0.15.1    
 [88] evaluate_0.14         hms_1.1.0             qs_0.24.1            
 [91] shinyjs_2.0.0         mime_0.10             xtable_1.8-4         
 [94] shinystan_2.5.0       readxl_1.3.1          gridExtra_2.3        
 [97] rstantools_2.1.1      compiler_4.1.0        credentials_1.3.0    
[100] V8_3.4.2              crayon_1.4.1          minqa_1.2.4          
[103] R.oo_1.24.0           StanHeaders_2.26.1    htmltools_0.5.1.1    
[106] mgcv_1.8-35           later_1.2.0           RcppParallel_5.1.4   
[109] lubridate_1.7.10      DBI_1.1.1             RApiSerialize_0.1.0  
[112] dbplyr_2.1.1          MASS_7.3-54           connectapi_0.1.0.9026
[115] rappdirs_0.3.3        boot_1.3-28           sys_3.4              
[118] Matrix_1.3-3          cli_2.5.0             R.methodsS3_1.8.1    
[121] parallel_4.1.0        igraph_1.2.6          pkgconfig_2.0.3      
[124] xml2_1.3.2            dygraphs_1.1.1.6      pins_0.4.5           
[127] estimability_1.3      rvest_1.0.0           distributional_0.2.2 
[130] callr_3.7.0           digest_0.6.27         rmarkdown_2.8        
[133] cellranger_1.1.0      curl_4.3.1            shiny_1.6.0          
[136] gtools_3.8.2          nloptr_1.2.2.2        lifecycle_1.0.0      
[139] nlme_3.1-152          jsonlite_1.7.2        cmdstanr_0.4.0.9000  
[142] askpass_1.1           fansi_0.4.2           pillar_1.6.1         
[145] lattice_0.20-44       loo_2.4.1             fastmap_1.1.0        
[148] httr_1.4.2            pkgbuild_1.2.0        survival_3.2-11      
[151] googleAuthR_1.4.0     glue_1.4.2            xts_0.12.1           
[154] zip_2.1.1             gert_1.3.0            shinythemes_1.2.0    
[157] bit_4.0.4             memoise_2.0.0        

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions