Skip to content

Latest commit

 

History

History
197 lines (158 loc) · 7.23 KB

File metadata and controls

197 lines (158 loc) · 7.23 KB

simChef Template Project

This project aims to provide a template for running simulation studies comparing different estimators across multiple data generating processes (DGPs).

The examples shown are for comparing the performance of several estimators of the average treatment effect.

Specifically, this template project is optimized for running quickly on a SLURM cluster. Compared with previous projects in the nshlab, this project demonstrates how to further stratify simulations into a greater number of jobs in a SLURM job-array. Each job is designed to only process a few number of replications of a specific experiment, and the results are stored for later processing. Once all of the jobs have been run on the cluster, the results can be processed using the standard tools of {simChef} and we additionally show how to generate a static HTML report of the simulation results.

Project Structure

It is instructive to first take a look at the folder structure of this project. The project is designed around two stages of analysis:

  1. running the experiment and saving the results from a job-array on a SLURM cluster, and 2) processing the saved results.
├── R
│   ├── 01_run_on_cluster
│   │   ├── 00_dependencies.R
│   │   ├── 01_job_array_config.R
│   │   ├── 02_dgp.R
│   │   ├── 03_methods.R
│   │   ├── 04_experiment.R
│   │   └── 05_run_experiment.R
│   ├── 02_evaluators_and_visualizers
│   │   ├── 05_evaluators.R
│   │   └── 06_visualizers.R
│   └── 03_meal.R
├── README.md
├── results
│   ├── ATE methods/
│   └── sim_results/
├── simChef_job_arrays.Rproj
└── slurm
    └── slurm_run_experiment.sh

7 directories, 5 files

01_job_array_config.R

This is a very important script as it constructs a data.frame specifying what every job in the SLURM job-array will do.

For reference, the sim_array_config data frame produced looks like this:

> head(sim_array_config)
  job_id     dgp sample_size n_reps_job
       1 complex          30         10
       2 complex         300         10
       3 complex        3000         10
       4 complex       10000         10
       5 complex          30         10
       6 complex         300         10

In this template project, it goes on to specify simulation scenarios including the simple dgp as well at varying sample sizes.

How to Run This Project

  1. Clone the project onto the cluster.
  2. Run it by navigating to the slurm/ directory and running sbatch slurm_run_experiment.sh.
  • When the jobs are running, you may notice that earlier jobs in your job array are running while the later jobs are still listed as status pending. This is normal and a good feature of the SLURM scheduler: if the SLURM scheduler waited for there to be n_jobs many cores available all at once, you might have to wait a lot longer for your jobs to run at all. By running some of the first jobs in your array earlier while others are pending, you can find out sooner if there are any errors in running the code on the cluster that you will need to debug.
  1. Once the jobs have finished running on the cluster, copy the produced files to the computer you wish to process them on (often your local machine). For example, you may do so by using rsync -avh username@login.rc.fas.harvard.edu:~/simChef_job_arrays/results/ ./results/ or using a file transfer software like FileZilla.
  2. On the computer you wish to use to process your results, source R/03_meal.R.

The rendered HTML report will open in your default browser.

How to Adapt This Project

There are several elements you need to replace for this to be useful to you:

  1. Replace the DGPs with ones that are suited to your simulation study.
  2. Replace the methods.
  3. Rewrite R/01_run_on_cluster/01_job_array_config.R to define the jobs as you would like them to be run. You do not have to stick to the exact approach or columns of sim_array_config, but can adapt it as you see best fit to split up the jobs that need to be run. * I would recommend keeping the array size in the 100s or low thousands. * https://docs.rc.fas.harvard.edu/kb/running-jobs/#Job_arrays
  4. Replace the evaluators.
  5. Replace the visualizers.

Debugging

Debugging a Method

Example on how to see if a particular method works:

data <- complex_dgp$dgp_fun(n = 1000)
df <- data$df
truth <- data$df

confounders <- c("age","educ","income","diabetes","earnings74")
outcome <- 'earnings78'
exposure <- 'treat'

m_np$method_fun(
  df = df,
  truth = truth,
  confounders = confounders,
  outcome = outcome,
  exposure = exposure)

Debugging R/01_run_on_cluster/05_run_experiment.R:

Since this script is designed to be run on the cluster and run by an Rscript call from slurm/slurm_run_experiment.sh with a command line argument for the job id, it is important to know how to run this on your local computer for debugging purposes.

Open the R/01_run_on_cluster/05_run_experiment.R file and run job_id <- 1 or choose another job_id that suits your debugging needs based on what's in the sim_array_config data frame. Then you can library(here) and start running the rest of the file starting from after line 7. I would recommend not running the job_id <- commandArgs(trailingOnly=TRUE) line on your local computer if you're just trying to debug the simulation as it's easier to simply fix the job_id to one of the jobs that you need to debug.

When you get to the line that calls fit_experiment(), you may find it useful to reduce the n_reps = ... argument to a small number like 1 or 2 when you call it.

Make sure to take a look in the output fit_results object, especially in the err column.

> head(fit_results)
# A tibble: 6 × 11
  .rep  .dgp_name   .method_name method      truth_ate ate_hat se_hat ci_low ci_high ok    err  
  <chr> <chr>       <chr>        <chr>           <dbl>   <dbl>  <dbl>  <dbl>   <dbl> <lgl> <chr>
1 1     Complex DGP AIPW         AIPW_pkg           10    13.3   2.25   8.91   17.7  TRUE  NA   
2 1     Complex DGP Manual AIPW  manual_aipw        10    13.3   2.22   8.91   17.6  TRUE  NA   
3 1     Complex DGP drtmle       drtmle             10    13.2   2.22  17.5     8.81 TRUE  NA   
4 1     Complex DGP npcausal     npcausal           10    13.2   2.22   8.88   17.6  TRUE  NA   
5 1     Complex DGP tmle         tmle               10    13.3   2.23   8.90   17.6  TRUE  NA   
6 1     Complex DGP tmle3        tmle3              10    13.3   2.22   8.91   17.6  TRUE  NA   

Debugging the Cluster job

Check the log files produced in slurm/logs/

Useful to know how to run:

watch sacct -j XXXXXXXX 

# or 
watch " sacct -j XXXXXXXX | tail -n 25 "

Not a bad idea to learn tmux or screen to be able to check on the log files on the cluster while paying attention to the jobs status.

Advice

To take this further, you definitely should read: