Skip to content

Commit 9a2424e

Browse files
New section
Signed-off-by: Christopher Hakkaart <[email protected]>
1 parent f568c0e commit 9a2424e

File tree

4 files changed

+171
-75
lines changed

4 files changed

+171
-75
lines changed

docs/index.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -78,6 +78,7 @@ module
7878
notifications
7979
secrets
8080
sharing
81+
structure
8182
vscode
8283
dsl1
8384
```

docs/process.md

Lines changed: 0 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -162,48 +162,10 @@ In the above example, the process will execute one of several scripts depending
162162

163163
Process scripts can be externalized to **template** files and reused across multiple processes. Templates can be accessed using the `template` function in the script section. For example:
164164

165-
```nextflow
166-
process templateExample {
167-
input:
168-
val STR
169-
170-
script:
171-
template 'my_script.sh'
172-
}
173-
174-
workflow {
175-
Channel.of('this', 'that') | templateExample
176-
}
177-
```
178-
179165
By default, Nextflow looks for template scripts in the `templates` directory, located alongside the Nextflow script that defines the process. An absolute path can be used to specify a different template location. However, this practice is discouraged because it hinders pipeline portability.
180166

181167
Templates can be tested independently of pipeline execution. However, variables prefixed with the dollar character (`$`) are interpreted as Nextflow variables when the template script is executed by Nextflow and Bash variables when executed directly. Consider the following template script:
182168

183-
```bash
184-
#!/bin/bash
185-
echo "process started at `date`"
186-
echo $STR
187-
echo "process completed"
188-
```
189-
190-
The above script can be executed from the command line by providing each input as an environment variable:
191-
192-
```bash
193-
STR='foo' bash templates/my_script.sh
194-
```
195-
196-
Several caveats should be considered when using templates:
197-
198-
- Template scripts are only recommended for Bash scripts.
199-
- Languages that do not prefix variables with `$` (e.g. Python and R) can't be executed directly as a template script from the command line.
200-
- Template variables escaped with `\$` will be interpreted as Bash variables when executed by Nextflow but not the command line.
201-
- Template variables are evaluated even if they are commented out in the template script.
202-
- The pipeline to fail if a template variable is missing, regardless of where it occurs in the template.
203-
204-
:::{tip}
205-
The best practice for using a custom script is to first embed it in the process definition and transfer it to a separate file with its own command line interface once the code matures.
206-
:::
207169

208170
(process-shell)=
209171

docs/sharing.md

Lines changed: 0 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -93,43 +93,6 @@ Read the {ref}`container-page` page to learn more about how to use containers wi
9393
For maximal reproducibility, make sure to define a specific version for each tool. Otherwise, your pipeline might use different versions across subsequent runs, which can introduce subtle differences to your results.
9494
:::
9595

96-
(bundling-executables)=
97-
98-
#### The `bin` directory
99-
100-
Executable scripts can be included in the pipeline `bin` directory located at the root of your pipeline directory. This allows you to create and organize custom scripts that can be invoked like regular commands from any process in your pipeline without modifying the `PATH` environment variable or using an absolute path. For example:
101-
102-
```
103-
├── bin
104-
│ └── custom_script.py
105-
└── main.nf
106-
```
107-
108-
Each script should include a shebang line to specify the interpreter for the script.
109-
110-
:::{tip}
111-
Use `env` to resolve the interpreter's location instead of hard-coding the interpreter path. For example:
112-
113-
```
114-
#!/usr/bin/env python
115-
```
116-
117-
:::
118-
119-
Scripts placed in the `bin` directory must have executable permissions. Use `chmod` to grant the required permissions. For example:
120-
121-
```
122-
chmod a+x bin/custom_script.py
123-
```
124-
125-
After setting the executable permission, the script can be run directly within your pipeline processes.
126-
127-
Executable scripts can also be stored as scripts that are locally scoped to the processes defined by the tasks. See {ref}`module-binaries` for more information.
128-
129-
#### The `lib` directory
130-
131-
Any Groovy scripts or JAR files in the `lib` directory will be automatically loaded and made available to your pipeline scripts. The `lib` directory is a useful way to provide utility code or external libraries without cluttering the pipeline scripts.
132-
13396
### Data
13497

13598
In general, input data should be provided by external sources using parameters which can be controlled by the user. This way, a pipeline can be easily reused to process different datasets which are appropriate for the pipeline.

docs/structure.md

Lines changed: 170 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,170 @@
1+
(structure-page)=
2+
3+
# Structure
4+
5+
## The `templates` directory
6+
7+
The `templates` directory in the Nextflow project root can be used to store scripts.
8+
9+
```
10+
├── templates
11+
│ └── sayhello.py
12+
└── main.nf
13+
```
14+
15+
It allows custom scripts to be invoked like regular scripts from any process in your pipeline using the `template` function:
16+
17+
```
18+
process sayHello {
19+
20+
input:
21+
val x
22+
23+
output:
24+
stdout
25+
26+
script:
27+
template 'sayhello.py'
28+
}
29+
30+
workflow {
31+
Channel.of("Foo") | sayHello | view
32+
}
33+
```
34+
35+
Variables prefixed with the dollar character (`$`) are interpreted as Nextflow variables when the template script is executed by Nextflow:
36+
37+
```
38+
#!/usr/bin/env python
39+
40+
print("Hello ${x}!")
41+
```
42+
43+
The pipeline will fail if a template variable is missing, regardless of where it occurs in the template.
44+
45+
Templates can be tested independently of pipeline execution by providing each input as an environment variable. For example:
46+
47+
```bash
48+
STR='foo' bash templates/my_script.sh
49+
```
50+
51+
Template scripts are only recommended for Bash scripts. Languages that do not prefix variables with `$` (e.g. Python and R) can't be executed directly as a template script from the command line as variables prefixed with `$` are interpreted as Bash variables. Similarly, template variables escaped with `\$` will be interpreted as Bash variables when executed by Nextflow but not the command line.
52+
53+
:::{warning}
54+
Template variables are evaluated even if they are commented out in the template script.
55+
:::
56+
57+
:::{tip}
58+
The best practice for using a custom script is to first embed it in the process definition and transfer it to a separate file with its own command line interface once the code matures.
59+
:::
60+
61+
(bundling-executables)=
62+
63+
## The `bin` directory
64+
65+
The `bin` directory in the Nextflow project root can be used to store executable scripts.
66+
67+
```
68+
├── bin
69+
│ └── sayhello.py
70+
└── main.nf
71+
```
72+
73+
It allows custom scripts to be invoked like regular commands from any process in your pipeline without modifying the `PATH` environment variable or using an absolute path. Each script should include a shebang to specify the interpreter. Inputs should be supplied as arguments.
74+
75+
```python
76+
#!/usr/bin/env python
77+
78+
import argparse
79+
80+
def main():
81+
parser = argparse.ArgumentParser(description="A simple argparse example.")
82+
parser.add_argument("name", type=str, help="Person to greet.")
83+
84+
args = parser.parse_args()
85+
print(f"Hello {args.name}!")
86+
87+
if __name__ == "__main__":
88+
main()
89+
```
90+
91+
:::{tip}
92+
Use `env` to resolve the interpreter's location instead of hard-coding the interpreter path.
93+
:::
94+
95+
Scripts placed in the `bin` directory must have executable permissions. Use `chmod` to grant the required permissions. For example:
96+
97+
```
98+
chmod a+x bin/sayhello.py
99+
```
100+
101+
Like modifying a process script, changing the executable script will cause the task to be re-executed on a resumed run.
102+
103+
:::{warning}
104+
When using containers and the Wave service, Nextflow will send the project-level `bin` directory to the Wave service for inclusion as a layer in the container. Any changes to scripts in the `bin` directory will change the layer md5sum and the hash for the final container. The container identity is a component of the task hash calculation and will force re-calculation of all tasks in the workflow.
105+
106+
When using the Wave service, use module-specific bin directories instead. See {ref}`module-binaries` for more information.
107+
:::
108+
109+
## The `lib` directory
110+
111+
The `lib` directory can be used to add utility code or external libraries without cluttering the pipeline scripts. The `lib` directory in the Nextflow project root is added to the classpath by default.
112+
113+
```
114+
├── lib
115+
│ └── DNASequence.groovy
116+
└── main.nf
117+
```
118+
119+
Classes or packages defined in the `lib` directory will be available in the execution context. Scripts or functions defined outside of classes will not be available in the execution context.
120+
121+
For example, `lib/DNASequence.groovy` defines the `DNASequence` class:
122+
123+
```groovy
124+
// lib/DNASequence.groovy
125+
class DNASequence {
126+
String sequence
127+
128+
// Constructor
129+
DNASequence(String sequence) {
130+
this.sequence = sequence.toUpperCase() // Ensure sequence is in uppercase for consistency
131+
}
132+
133+
// Method to calculate melting temperature using the Wallace rule
134+
double getMeltingTemperature() {
135+
int g_count = sequence.count('G')
136+
int c_count = sequence.count('C')
137+
int a_count = sequence.count('A')
138+
int t_count = sequence.count('T')
139+
140+
// Wallace rule calculation
141+
double tm = 4 * (g_count + c_count) + 2 * (a_count + t_count)
142+
return tm
143+
}
144+
145+
String toString() {
146+
return "DNA[$sequence]"
147+
}
148+
}
149+
```
150+
151+
The `DNASequence` class is available in the execution context:
152+
153+
```nextflow
154+
// main.nf
155+
workflow {
156+
Channel.of('ACGTTGCAATGCCGTA', 'GCGTACGGTACGTTAC')
157+
.map { seq -> new DNASequence(seq) }
158+
.view { dna ->
159+
def meltTemp = dna.getMeltingTemperature()
160+
"Found sequence '$dna' with melting temperature ${meltTemp}°C"
161+
}
162+
}
163+
```
164+
165+
It returns:
166+
167+
```
168+
Found sequence 'DNA[ACGTTGCAATGCCGTA]' with melting temperaure 48.0°C
169+
Found sequence 'DNA[GCGTACGGTACGTTAC]' with melting temperaure 50.0°C
170+
```

0 commit comments

Comments
 (0)