You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -441,16 +442,15 @@ scancel -u <username> # cancel all queued and running jobs for a specific user
441
442
442
443
This issue is important whenever the HPC cluster has mandatory time/RAM limits for the job submissions, where the array job may not complete within the assigned resources --- hence, if not properly managed, will discard any valid replication information when abruptly terminated. Unfortunately, this is a very likely occurrence, and is largely a function of being unsure about how long each simulation condition/replication will take to complete when distributed across the arrays (some conditions/replications will take longer than others, and it is difficult to be perfectly knowledgeable about this information beforehand) or how large the final objects will grow as the simulation progresses.
443
444
444
-
To avoid this time/resource waste it is **strongly recommended** to add a `max_time`and/or `max_RAM`argument to the `control` list (see `help(runArraySimulation)` for supported specifications), which are less than the Slurm specifications. These control flags will halt the `runArraySimulation()` executions early and return only the complete simulation results up to this point. However, this will only work if these arguments are*non-trivially less than the allocated Slurm resources*; otherwise, you'll run the risk that the job terminates before the `SimDesign` functions have the chance to store the successfully completed replications. Setting these to around 90-95% of the respective `#SBATCH --time=`and `#SBATCH --mem-per-cpu=` inputs should, however, be sufficient in most cases.
445
+
To avoid this time/resource waste it is **strongly recommended** to add a `max_time` argument to the `control` list (see `help(runArraySimulation)` for supported specifications) which is less than the Slurm specifications. This control flag will halt the `runArraySimulation()` executions early and return only the complete simulation results up to this point. However, this will only work if the argument is*non-trivially less than the allocated Slurm resources*; otherwise, you'll run the risk that the job terminates before the `SimDesign` functions have the chance to store the successfully completed replications. Setting this to around 90-95% of the respective `#SBATCH --time=`input should, however, be sufficient in most cases.
445
446
446
447
```{r eval=FALSE}
447
-
# Return successful results up to the 11 hour mark, and terminate early
448
-
# if more than 3.5 GB of RAM are required to store the internal results
448
+
# Return successful results up to the 11 hour mark
0 commit comments