Skip to content

Commit 151accd

Browse files
Merge pull request #251 from nf-core/config-handle
deepmodeloptim now properly splits and use yaml configs.
2 parents 8645a0e + 16e6fa3 commit 151accd

File tree

18 files changed

+182
-243
lines changed

18 files changed

+182
-243
lines changed

.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,3 +19,5 @@ bin/.vscode/
1919
.nf-test/
2020
prototype/
2121
*.ipynb
22+
CLAUDE.md
23+
.claude

README.md

Lines changed: 2 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -21,24 +21,9 @@
2121

2222
## Introduction
2323

24-
**nf-core/deepmodeloptim** is a bioinformatics end-to-end pipeline designed to facilitate the testing and development of deep learning models for genomics.
24+
**nf-core/deepmodeloptim** augments your bio data towards an optimal task-specific training set.
2525

26-
Deep learning model development in natural science is an empirical and costly process. Despite the existence of generic tools for the tuning of hyperparameters and the training of the models, the connection between these procedures and the impact coming from the data is often underlooked, or at least not easily automatized. Indeed, researchers must define a pre-processing pipeline, an architecture, find the best parameters for said architecture and iterate over this process, often manually.
27-
28-
Leveraging the power of Nextflow (polyglotism, container integration, scalable on the cloud), this pipeline will help users to 1) automatize the testing of the model, 2) gain useful insights with respect to the learning behaviour of the model, and hence 3) accelerate the development.
29-
30-
## Pipeline summary
31-
32-
It takes as input:
33-
34-
- A dataset
35-
- A configuration file to describe the data pre-processing steps to be performed
36-
- An user defined PyTorch model
37-
- A configuration file describing the range of parameters for the PyTorch model
38-
39-
It then transforms the data according to all possible pre-processing steps, finds the best architecture parameters for each of the transformed datasets, performs sanity checks on the models and train a minimal deep learning version for each dataset/architecture.
40-
41-
Those experiments are then compiled into an intuitive report, making it easier for scientists to pick the best design choice to be sent to large scale training.
26+
Methods in deep learning are vastly equivalent (see neural scaling laws paper), most of the performance is driven by the training data.
4227

4328
<picture>
4429
<source media="(prefers-color-scheme: dark)" srcset="assets/metromap.png">

conf/modules.config

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -90,7 +90,7 @@ process {
9090
// main config
9191
// ==============================================================================
9292

93-
withName: "STIMULUS_SPLIT_TRANSFORM" {
93+
withName: "STIMULUS_SPLIT_YAML" {
9494
publishDir = [
9595
path: { "${params.outdir}/configs/${meta.id}" },
9696
mode: params.publish_dir_mode,

modules/local/custom/modify_model_config/main.nf

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,11 @@ process CUSTOM_MODIFY_MODEL_CONFIG {
2020
meta_updated = meta + ["n_trials": "${n_trials}"]
2121
"""
2222
# substitte the line containing n_trials in the config file with n_trials: \${n_trials}
23-
awk -v n_trials=${n_trials} '/n_trials: [0-9]+/ {gsub(/n_trials: [0-9]+/, "n_trials: " n_trials)}1' ${config} > ${prefix}.yaml
23+
if [ "${n_trials}" = "[]" ]; then
24+
cp "${config}" "${prefix}.yaml"
25+
else
26+
awk -v n_trials="${n_trials}" '/n_trials: [0-9]+/ {gsub(/n_trials: [0-9]+/, "n_trials: " n_trials)}1' "${config}" > "${prefix}.yaml"
27+
fi
2428
2529
cat <<-END_VERSIONS > versions.yml
2630
"${task.process}":

modules/local/stimulus/check_model/main.nf

Lines changed: 4 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -6,11 +6,10 @@ process CHECK_MODEL {
66
container "docker.io/mathysgrapotte/stimulus-py:dev"
77

88
input:
9-
tuple val(meta), path(data_config)
10-
tuple val(meta2), path(data)
11-
tuple val(meta3), path(model)
12-
tuple val(meta4), path(model_config)
13-
tuple val(meta5), path(initial_weights)
9+
tuple val(meta1), path(data)
10+
tuple val(meta2), path(model)
11+
tuple val(meta3), path(model_config)
12+
tuple val(meta4), path(initial_weights)
1413

1514
output:
1615
stdout emit: standardout

modules/local/stimulus/predict/main.nf

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ process STIMULUS_PREDICT {
55

66
input:
77
tuple val(meta) , path(model), path(model_config), path(weigths)
8-
tuple val(meta2), path(data), path(config)
8+
tuple val(meta2), path(data)
99

1010
output:
1111
tuple val(meta), path("${prefix}-pred.safetensors"), emit: predictions

modules/local/stimulus/split_split/main.nf

Lines changed: 0 additions & 37 deletions
This file was deleted.

modules/local/stimulus/split_transform/main.nf

Lines changed: 0 additions & 37 deletions
This file was deleted.
Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
process STIMULUS_SPLIT_YAML {
2+
3+
tag "$meta.id"
4+
label 'process_low'
5+
// TODO: push image to nf-core quay.io
6+
container "docker.io/mathysgrapotte/stimulus-py:dev"
7+
8+
input:
9+
tuple val(meta), path(data_config)
10+
11+
output:
12+
tuple val(meta), path("*_encode.yaml") , emit: encode_config
13+
tuple val(meta), path("*_split.yaml") , emit: split_config
14+
tuple val(meta), path("*_transform.yaml") , emit: transform_config
15+
path "versions.yml" , emit: versions
16+
17+
script:
18+
"""
19+
stimulus split-yaml -y ${data_config} --out-dir ./
20+
21+
cat <<-END_VERSIONS > versions.yml
22+
"${task.process}":
23+
stimulus: \$(stimulus -v | cut -d ' ' -f 3)
24+
END_VERSIONS
25+
"""
26+
27+
stub:
28+
def prefix = data_config.baseName
29+
"""
30+
touch ${prefix}_encode.yaml
31+
touch ${prefix}_RandomSplit_70-30_split.yaml
32+
touch ${prefix}_noise_std0.1_transform.yaml
33+
touch ${prefix}_noise_std0.2_transform.yaml
34+
touch ${prefix}_noise_std0.3_transform.yaml
35+
36+
cat <<-END_VERSIONS > versions.yml
37+
"${task.process}":
38+
stimulus: \$(stimulus -v | cut -d ' ' -f 3)
39+
END_VERSIONS
40+
"""
41+
}

modules/local/stimulus/tune/main.nf

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ process STIMULUS_TUNE {
44
container "docker.io/mathysgrapotte/stimulus-py:dev"
55

66
input:
7-
tuple val(meta), path(transformed_data), path(data_sub_config)
7+
tuple val(meta), path(transformed_data)
88
tuple val(meta2), path(model), path(model_config), path(initial_weights)
99

1010
output:
@@ -15,7 +15,6 @@ process STIMULUS_TUNE {
1515
path "versions.yml" , emit: versions
1616
// now we need to output these in this format for the predict module - thiw will have to be changed!
1717
tuple val(meta), path(model), path("best_config.json"), path("${prefix}-best-model.safetensors"), emit: model_tmp
18-
tuple val(meta), path(data_sub_config) , emit: data_config_tmp
1918

2019
script:
2120
prefix = task.ext.prefix ?: meta.id

0 commit comments

Comments
 (0)