This project aims to provide a template for running simulation studies comparing different estimators across multiple data generating processes (DGPs).
The examples shown are for comparing the performance of several estimators of the average treatment effect.
Specifically, this template project is optimized for running quickly on a
SLURM cluster. Compared with previous projects in the nshlab, this
project demonstrates how to further stratify simulations into a greater
number of jobs in a SLURM job-array. Each job is designed to only
process a few number of replications of a specific experiment, and the
results are stored for later processing. Once all of the jobs have been
run on the cluster, the results can be processed using the standard tools
of {simChef} and we additionally show how to generate a static HTML
report of the simulation results.
It is instructive to first take a look at the folder structure of this project. The project is designed around two stages of analysis:
- running the experiment and saving the results from a job-array on a SLURM cluster, and 2) processing the saved results.
├── R
│ ├── 01_run_on_cluster
│ │ ├── 00_dependencies.R
│ │ ├── 01_job_array_config.R
│ │ ├── 02_dgp.R
│ │ ├── 03_methods.R
│ │ ├── 04_experiment.R
│ │ └── 05_run_experiment.R
│ ├── 02_evaluators_and_visualizers
│ │ ├── 05_evaluators.R
│ │ └── 06_visualizers.R
│ └── 03_meal.R
├── README.md
├── results
│ ├── ATE methods/
│ └── sim_results/
├── simChef_job_arrays.Rproj
└── slurm
└── slurm_run_experiment.sh
7 directories, 5 files
This is a very important script as it constructs a data.frame specifying what every job in the SLURM job-array will do.
For reference, the sim_array_config data frame produced looks
like this:
> head(sim_array_config)
job_id dgp sample_size n_reps_job
1 complex 30 10
2 complex 300 10
3 complex 3000 10
4 complex 10000 10
5 complex 30 10
6 complex 300 10In this template project, it goes on to specify simulation scenarios including the
simple dgp as well at varying sample sizes.
- Clone the project onto the cluster.
- Run it by navigating to the
slurm/directory and runningsbatch slurm_run_experiment.sh.
- When the jobs are running, you may notice that earlier jobs in
your job array are running while the later jobs are still listed
as status pending. This is normal and a good feature of the
SLURM scheduler: if the SLURM scheduler waited for there to be
n_jobsmany cores available all at once, you might have to wait a lot longer for your jobs to run at all. By running some of the first jobs in your array earlier while others are pending, you can find out sooner if there are any errors in running the code on the cluster that you will need to debug.
- Once the jobs have finished running on the cluster,
copy the produced files to the computer you wish to process them on (often your
local machine).
For example, you may do so by using
rsync -avh username@login.rc.fas.harvard.edu:~/simChef_job_arrays/results/ ./results/or using a file transfer software like FileZilla. - On the computer you wish to use to process your results, source
R/03_meal.R.
The rendered HTML report will open in your default browser.
There are several elements you need to replace for this to be useful to you:
- Replace the DGPs with ones that are suited to your simulation study.
- Replace the methods.
- Rewrite
R/01_run_on_cluster/01_job_array_config.Rto define the jobs as you would like them to be run. You do not have to stick to the exact approach or columns ofsim_array_config, but can adapt it as you see best fit to split up the jobs that need to be run. * I would recommend keeping the array size in the 100s or low thousands. * https://docs.rc.fas.harvard.edu/kb/running-jobs/#Job_arrays - Replace the evaluators.
- Replace the visualizers.
Example on how to see if a particular method works:
data <- complex_dgp$dgp_fun(n = 1000)
df <- data$df
truth <- data$df
confounders <- c("age","educ","income","diabetes","earnings74")
outcome <- 'earnings78'
exposure <- 'treat'
m_np$method_fun(
df = df,
truth = truth,
confounders = confounders,
outcome = outcome,
exposure = exposure)Since this script is designed to be run on the cluster and
run by an Rscript call from slurm/slurm_run_experiment.sh with
a command line argument for the job id, it is important to know how
to run this on your local computer for debugging purposes.
Open the R/01_run_on_cluster/05_run_experiment.R file and
run job_id <- 1 or choose another job_id that
suits your debugging needs based on what's in the sim_array_config
data frame. Then you can library(here) and start running
the rest of the file starting from after line 7. I would recommend not
running the job_id <- commandArgs(trailingOnly=TRUE) line
on your local computer if you're just trying to debug
the simulation as it's easier to simply fix the job_id to
one of the jobs that you need to debug.
When you get to the line that calls fit_experiment(),
you may find it useful to reduce the n_reps = ... argument
to a small number like 1 or 2 when you call it.
Make sure to take a look in the output fit_results object,
especially in the err column.
> head(fit_results)
# A tibble: 6 × 11
.rep .dgp_name .method_name method truth_ate ate_hat se_hat ci_low ci_high ok err
<chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <lgl> <chr>
1 1 Complex DGP AIPW AIPW_pkg 10 13.3 2.25 8.91 17.7 TRUE NA
2 1 Complex DGP Manual AIPW manual_aipw 10 13.3 2.22 8.91 17.6 TRUE NA
3 1 Complex DGP drtmle drtmle 10 13.2 2.22 17.5 8.81 TRUE NA
4 1 Complex DGP npcausal npcausal 10 13.2 2.22 8.88 17.6 TRUE NA
5 1 Complex DGP tmle tmle 10 13.3 2.23 8.90 17.6 TRUE NA
6 1 Complex DGP tmle3 tmle3 10 13.3 2.22 8.91 17.6 TRUE NA Check the log files produced in slurm/logs/
Useful to know how to run:
watch sacct -j XXXXXXXX
# or
watch " sacct -j XXXXXXXX | tail -n 25 "Not a bad idea to learn tmux or screen to be able to check on the log files on the cluster while paying attention to the jobs status.
To take this further, you definitely should read:
- The README to
{simChef}https://github.com/Yu-Group/simChef - The
{simChef}documentation https://yu-group.github.io/simChef/ - Refer to the code for their example repositories that they list on their README.