-
Notifications
You must be signed in to change notification settings - Fork 0
Test with a large run #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
e9b7912
d0e2ad4
687d015
628318a
f420f16
916b701
e6fab87
c043b33
bc9fd34
b51ed18
add653f
3b2f8cf
71c3dfa
2089f5b
c6d5e1d
c596674
4dbea4c
cbb07e5
6e1eab7
1bea41d
71d0114
34243b7
3aa836b
3e359c4
1cab186
c553e68
76b7b66
26c4436
b8ce297
3a09763
b4d1e74
f662d22
f08341b
eafc679
2919ef4
490bf14
f36f295
ada21e6
a2c974a
e3a72c4
49336c8
13def2e
f75197d
62b5da4
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,59 @@ | ||
|
|
||
| /* | ||
| * ------------------------------------------------- | ||
| * Nextflow nf-core config file for ICR alma HPC | ||
| * ------------------------------------------------- | ||
| * Defines slurm process executor and singularity | ||
| * settings. | ||
| * | ||
| */ | ||
| params { | ||
|
|
||
| config_profile_description = "Nextflow nf-core profile for ICR alma HPC" | ||
| config_profile_contact = "Rachel Alcraft (@rachelicr), Mira Sarkis (@msarkis-icr)" | ||
| // max_memory = 256.GB | ||
| // max_cpus = 30 | ||
| // max_time = 5.d | ||
| } | ||
|
|
||
| process { | ||
| queue="compute" | ||
| executor = "slurm" | ||
| maxRetries = 3 | ||
| maxErrors = '-1' | ||
|
|
||
| errorStrategy = { task.exitStatus in [137,255] ? 'retry' : 'terminate' } | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. before merging to nf-core, we had To be honest, I have run through 255 code error, and retrying was a waste of resources..
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I see. The issue here is that the current config ties the allocated memory to the number of cpus, which is not what some processes expect. This resulted in an error 255, in my case, and I could increase the mem/cpu by using a retry error strategy and
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In addition, retry does request more resources if the process resource allocation is done with |
||
| withName: ".*" { time = 5.d } | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Setting a fixed time limit of 5 days for all processes could lead to inefficient resource usage. Some tasks might finish much quicker...
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agreed. There maybe a better way to increase time without hard coding to 5.d. For example use it with |
||
|
|
||
| clusterOptions = '--mem-per-cpu=8000' | ||
|
|
||
| resourceLimits = [ | ||
| memory: 256.GB, | ||
| cpus: 30, | ||
| time: 5.d | ||
| ] | ||
|
|
||
| } | ||
| // Preform work directory cleanup after a successful run? | ||
| cleanup = false | ||
|
|
||
| executor { | ||
| // This is set because of an issue with too many | ||
| // singularity containers launching at once, they | ||
| // cause an singularity error with exit code 255. | ||
| // submitRateLimit = "2 sec" | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As the above comment states, Alma gets overwhelmed when many processes are fired simultaneously. By unsetting it, we allow an "unlimited" number of jobs to be launched simultaneously, which is not good! Im curious as why would you want to remove this? and how beneficial it could be for the long runs?
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I see the need to limit simultanous launches. I just think "2 sec" penalizes smaller quick processes. queueSize may be a suitable alternative. |
||
| queueSize = 50 | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ok for queueSize, but again I don't see how it could be beneficial for long runs?
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Above |
||
| perCpuMemAllocation = true | ||
| } | ||
|
|
||
| singularity { | ||
| enabled = true | ||
| // runOptions = "--bind /mnt:/mnt --bind /data:/data" | ||
| autoMounts = true | ||
| // pullTimeout = 2.h | ||
| // cacheDir = '/data/scratch/shared/SINGULARITY-DOWNLOAD/nextflow/.singularity' | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Im confused why would you avoid using the cacheDir? Again, this is a generic config that is meant to work for most cases.
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. True. The existance of these run-options gave the impression that I could just excute the run without pre downloading the containers, which still fails. I needed to predownload the containers to a separate cache, and override the cashe dir param in a custom config. |
||
| } | ||
|
|
||
|
|
||
|
|
||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,30 +1,33 @@ | ||
| //Create profiles to easily switch between the different process executors and platforms. | ||
| def assoc = System.getenv("ASSOC") // Association belonging to a lab or project | ||
|
|
||
| //global parameters | ||
| params { | ||
| config_profile_description = 'The SCRI (seattle childrens research institute) cluster profile' | ||
| config_profile_contact = 'Research Scientific Computing (@RSC-RP)' | ||
| config_profile_url = 'https://github.com/RSC-RP' | ||
| config_profile_url = 'https://github.com/RSC-RP/nextflow_scri_config' | ||
| } | ||
|
|
||
| //workDir = "/data/hps/assoc/private/${assoc}/user/$USER/temp" | ||
|
|
||
| // SCRI HPC project params | ||
| queue = "paidq" | ||
| // freeq | ||
| project = "${params.project}" | ||
| process { | ||
| executor = 'slurm' | ||
| queue = 'cpu-core-sponsored' | ||
| memory = 7500.MB | ||
| time = '72h' | ||
| clusterOptions = "--account cpu-${assoc}-sponsored" | ||
| } | ||
|
|
||
| docker { | ||
| enabled = false | ||
| } | ||
|
|
||
| singularity { | ||
| enabled = true | ||
| autoMounts = true | ||
| cacheDir = "/data/hps/assoc/private/${assoc}/container" | ||
| runOptions = '--containall --no-home' | ||
| } | ||
|
|
||
| profiles { | ||
| //For running on an interactive session on cybertron with singularity module loaded | ||
| local_singularity { | ||
| process.executor = 'local' | ||
| singularity.enabled = true | ||
| } | ||
| //For executing the jobs on the HPC cluster with singularity containers | ||
| PBS_singularity { | ||
| process.executor = 'pbspro' | ||
| process.queue = "${params.queue}" | ||
| process.clusterOptions = "-P ${params.project}" | ||
| process.beforeScript = 'module load singularity' | ||
| singularity.enabled = true | ||
| } | ||
| executor { | ||
| queueSize = 2000 | ||
| } |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,25 +1,31 @@ | ||
| params { | ||
| config_profile_description = "University of Bern, Interfaculty Bioinformatics Unit cluster profile" | ||
| config_profile_contact = "irene.keller@dbmr.unibe.ch; [email protected]" | ||
| config_profile_contact = "alexander.nater@unibe.ch; [email protected]" | ||
| config_profile_url = "https://www.bioinformatics.unibe.ch/" | ||
| max_memory = 500.GB | ||
| max_cpus = 128 | ||
| max_time = 240.h | ||
| schema_ignore_params = "project,clusterOptions" | ||
| project = null | ||
| clusterOptions = null | ||
| } | ||
|
|
||
| validation { | ||
| ignoreParams = ["schema_ignore_params", "project", "clusterOptions"] | ||
| } | ||
|
|
||
| process { | ||
| resourceLimits = [ | ||
| memory: 500.GB, | ||
| cpus: 128, | ||
| time: 240.h | ||
| time: 672.h | ||
| ] | ||
| executor = "slurm" | ||
| maxRetries = 2 | ||
| beforeScript = 'mkdir -p ./tmp/ && export TMPDIR=./tmp/' | ||
| executor = 'slurm' | ||
| queue = 'pibu_el8' | ||
| maxRetries = 2 | ||
| scratch = '$SCRATCH' | ||
| clusterOptions = (params.project ? "-A ${params.project} " : '') + "${params.clusterOptions ?: ''}" | ||
| } | ||
|
|
||
| executor { | ||
| queueSize = 30 | ||
| queueSize = 50 | ||
| } | ||
|
|
||
| singularity { | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These values specify resource limits for tasks running on the compute node. This is useful for setting upper bounds on resource usage, especially when using dynamic resource allocation.
Any process asking for excessive resources, will fail!
Also this Allow for dynamic resource allocation within these limits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is, using the max_* in params applies globally to the whole workflow. I could be wrong. Do we want to limit a run to these resources?