Skip to content

Commit 6627bbc

Browse files
Update examples and language for custom scripts
Signed-off-by: Christopher Hakkaart <[email protected]>
1 parent 9248c04 commit 6627bbc

File tree

2 files changed

+51
-49
lines changed

2 files changed

+51
-49
lines changed

docs/module.md

Lines changed: 14 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -186,7 +186,7 @@ Ciao world!
186186

187187
Process script {ref}`templates <process-template>` can be included alongside a module in the `templates` directory.
188188

189-
For example, suppose we have a project L with a module that defines two processes, P1 and P2, both of which use templates. The template files can be made available in the local `templates` directory:
189+
For example, Project L contains a module (`myModules.nf`) that defines two processes, P1 and P2. Both processes use templates that are available in the local `templates` directory:
190190

191191
```
192192
Project L
@@ -196,29 +196,29 @@ Project L
196196
└── P2-template.sh
197197
```
198198

199-
Then, we have a second project A with a workflow that includes P1 and P2:
199+
Projects A contains a workflow that includes processes P1 and P2:
200200

201201
```
202-
Pipeline A
202+
Project A
203203
└── main.nf
204204
```
205205

206-
Finally, we have a third project B with a workflow that also includes P1 and P2:
206+
Pipeline B contains a workflow that also includes process P1 and P2:
207207

208208
```
209-
Pipeline B
209+
Project B
210210
└── main.nf
211211
```
212212

213-
With the possibility to keep the template files inside the project L, A and B can use the modules defined in L without any changes. A future project C would do the same, just cloning L (if not available on the system) and including its module.
213+
As the template files are stored with the modules inside the Project L, Projects A and B can include them without any changing any code. Future projects would also be able to include these modules by cloning Project L and including its module (if they were not available on the system).
214214

215-
Beside promoting the sharing of modules across pipelines, there are several advantages to keeping the module template under the script path:
215+
Keeping the module template within the script path has several advantages beyond facilitating module sharing across pipelines:
216216

217217
1. Modules are self-contained
218218
2. Modules can be tested independently from the pipeline(s) that import them
219219
3. Modules can be made into libraries
220220

221-
Having multiple template locations enables a structured project organization. If a project has several modules, and they all use templates, the project could group module scripts and their templates as needed. For example:
221+
Organizing templates locations allows for a well-structured project. In projects with multiple modules that rely on templates, you can organize module scripts and their corresponding templates into logical groups. For example:
222222

223223
```
224224
baseDir
@@ -240,10 +240,11 @@ baseDir
240240
|── mymodules6.nf
241241
└── templates
242242
|── P5-template.sh
243-
|── P6-template.sh
244-
└── P7-template.sh
243+
└── P6-template.sh
245244
```
246245

246+
See {ref}`process-template` for more information about how to externalize process scripts to template files.
247+
247248
(module-binaries)=
248249

249250
## Module binaries
@@ -253,13 +254,13 @@ baseDir
253254

254255
Modules can define binary scripts that are locally scoped to the processes defined by the tasks.
255256

256-
To enable this feature, set the following flag in your pipeline script or configuration file:
257+
To use this feature, the module binaries must be enabled in your pipeline script or configuration file:
257258

258259
```nextflow
259260
nextflow.enable.moduleBinaries = true
260261
```
261262

262-
The binary scripts must be placed in the module directory names `<module-dir>/resources/usr/bin`:
263+
Binary scripts must be placed in the module directory named `<module-dir>/resources/usr/bin` and granted execution permissions:
263264

264265
```
265266
<module-dir>
@@ -271,10 +272,8 @@ The binary scripts must be placed in the module directory names `<module-dir>/re
271272
└── another-module-script2.py
272273
```
273274

274-
Those scripts will be made accessible like any other command in the task environment, provided they have been granted the Linux execute permissions.
275-
276275
:::{note}
277-
This feature requires the use of a local or shared file system for the pipeline work directory, or {ref}`wave-page` when using cloud-based executors.
276+
Module binary scripts require a local or shared file system for the pipeline work directory, or {ref}`wave-page` when using cloud-based executors.
278277
:::
279278

280279
## Sharing modules

docs/process.md

Lines changed: 37 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -24,11 +24,11 @@ See {ref}`syntax-process` for a full description of the process syntax.
2424

2525
## Script
2626

27-
The `script` block defines, as a string expression, the script that is executed by the process.
27+
The `script` block defines the string expression that is executed by the process.
2828

29-
A process may contain only one script, and if the `script` guard is not explicitly declared, the script must be the final statement in the process block.
29+
The process can contain only one script block. If the `script` guard is not explicitly declared it must be the final statement in the process block.
3030

31-
The script string is executed as a [Bash](<http://en.wikipedia.org/wiki/Bash_(Unix_shell)>) script in the host environment. It can be any command or script that you would normally execute on the command line or in a Bash script. Naturally, the script may only use commands that are available in the host environment.
31+
The script string is executed as a [Bash](<http://en.wikipedia.org/wiki/Bash_(Unix_shell)>) script in the host environment. It can be any command or script that you would execute on the command line or in a Bash script and can only use commands that are available in the host environment.
3232

3333
The script block can be a simple string or a multi-line string. The latter approach makes it easier to write scripts with multiple commands spanning multiple lines. For example:
3434

@@ -42,19 +42,17 @@ process doMoreThings {
4242
}
4343
```
4444

45-
As explained in the script tutorial section, strings can be defined using single-quotes or double-quotes, and multi-line strings are defined by three single-quote or three double-quote characters.
45+
Strings can be defined using single-quotes or double-quotes. Multi-line strings are defined by three single-quote or three double-quote characters.
4646

47-
There is a subtle but important difference between them. Like in Bash, strings delimited by a `"` character support variable substitutions, while strings delimited by `'` do not.
47+
There is a subtle but important difference between single-quote (`'`) or three double-quote (`"`) characters. Like in Bash, strings delimited by the `"` character support variable substitutions, while strings delimited by `'` do not.
4848

49-
In the above code fragment, the `$db` variable is replaced by the actual value defined elsewhere in the pipeline script.
49+
For example, in the above code fragment, the `$db` variable is replaced by the actual value defined elsewhere in the pipeline script.
5050

5151
:::{warning}
52-
Since Nextflow uses the same Bash syntax for variable substitutions in strings, you must manage them carefully depending on whether you want to evaluate a *Nextflow* variable or a *Bash* variable.
52+
Nextflow uses the same Bash syntax for variable substitutions in strings. You must manage them carefully depending on whether you want to evaluate a *Nextflow* variable or a *Bash* variable.
5353
:::
5454

55-
When you need to access a system environment variable in your script, you have two options.
56-
57-
If you don't need to access any Nextflow variables, you can define your script block with single-quotes:
55+
System environment variables and Nextflow variables can be accessed by your script. If you don't need to access any Nextflow variables, you can define your script block with single-quotes and use the dollar character (`$`) to access system environment variables. For example:
5856

5957
```nextflow
6058
process printPath {
@@ -64,7 +62,7 @@ process printPath {
6462
}
6563
```
6664

67-
Otherwise, you can define your script with double-quotes and escape the system environment variables by prefixing them with a back-slash `\` character, as shown in the following example:
65+
Otherwise, you can define your script with double-quotes and escape the system environment variables by prefixing them with a back-slash `\` character. For example:
6866

6967
```nextflow
7068
process doOtherThings {
@@ -76,21 +74,17 @@ process doOtherThings {
7674
}
7775
```
7876

79-
In this example, `$MAX` is a Nextflow variable that must be defined elsewhere in the pipeline script. Nextflow replaces it with the actual value before executing the script. Meanwhile, `$DB` is a Bash variable that must exist in the execution environment, and Bash will replace it with the actual value during execution.
80-
81-
:::{tip}
82-
Alternatively, you can use the {ref}`process-shell` block definition, which allows a script to contain both Bash and Nextflow variables without having to escape the first.
83-
:::
77+
In this example, `$MAX` is a Nextflow variable that is defined elsewhere in the pipeline script. Nextflow replaces it with the actual value before executing the script. In contrast, `$DB` is a Bash variable that must exist in the execution environment. Bash will replace it with the actual value during execution.
8478

8579
### Scripts *à la carte*
8680

87-
The process script is interpreted by Nextflow as a Bash script by default, but you are not limited to Bash.
81+
The process script is interpreted as Bash by default.
8882

89-
You can use your favourite scripting language (Perl, Python, R, etc), or even mix them in the same pipeline.
83+
However, you can use your favorite scripting language (Perl, Python, R, etc) for each process. You can also mix languages in the same pipeline.
9084

91-
A pipeline may be composed of processes that execute very different tasks. With Nextflow, you can choose the scripting language that best fits the task performed by a given process. For example, for some processes R might be more useful than Perl, whereas for others you may need to use Python because it provides better access to a library or an API, etc.
85+
A pipeline may be composed of processes that execute very different tasks. You can choose the scripting language that best fits the task performed by a given process. For example, R might be more useful than Perl for some processes, whereas for others you may need to use Python because it provides better access to a library or an API.
9286

93-
To use a language other than Bash, simply start your process script with the corresponding [shebang](<http://en.wikipedia.org/wiki/Shebang_(Unix)>). For example:
87+
To use a language other than Bash, start your process script with the corresponding [shebang](<http://en.wikipedia.org/wiki/Shebang_(Unix)>). For example:
9488

9589
```nextflow
9690
process perlTask {
@@ -118,12 +112,17 @@ workflow {
118112
```
119113

120114
:::{tip}
121-
Since the actual location of the interpreter binary file can differ across platforms, it is wise to use the `env` command followed by the interpreter name, e.g. `#!/usr/bin/env perl`, instead of the absolute path, in order to make your script more portable.
115+
As the location of the interpreter binary file can differ across platforms. Use the `env` command followed by the interpreter name to make your script more portable. For example:
116+
117+
```nextflow
118+
#!/usr/bin/env perl
119+
```
120+
122121
:::
123122

124123
### Conditional scripts
125124

126-
The `script` block is like a function that returns a string. This means that you can write arbitrary code to determine the script, as long as the final statement is a string.
125+
The `script` block is like a function that returns a string. You can write arbitrary code to determine the script as long as the final statement is a string.
127126

128127
If-else statements based on task inputs can be used to produce a different script. For example:
129128

@@ -155,15 +154,13 @@ process align {
155154
}
156155
```
157156

158-
In the above example, the process will execute one of several scripts depending on the value of the `mode` parameter. By default it will execute the `tcoffee` command.
157+
In the above example, the process will execute one of several scripts depending on the value of the `mode` parameter. By default, the process will execute the `tcoffee` command.
159158

160159
(process-template)=
161160

162161
### Template
163162

164-
Process scripts can be externalized to **template** files, which allows them to be reused across different processes and tested independently from the pipeline execution.
165-
166-
A template can be used in place of an embedded script using the `template` function in the script section:
163+
Process scripts can be externalized to **template** files and accessed using the `template` function in the script section. For example:
167164

168165
```nextflow
169166
process templateExample {
@@ -179,9 +176,9 @@ workflow {
179176
}
180177
```
181178

182-
By default, Nextflow looks for the template script in the `templates` directory located alongside the Nextflow script in which the process is defined. An absolute path can be used to specify a different location. However, this practice is discouraged because it hinders pipeline portability.
179+
By default, Nextflow looks for template scripts in the `templates` directory, located alongside the Nextflow script that defines the process. A template can be reused across multiple processes. An absolute path can be used to specify a different template location. However, this practice is discouraged because it hinders pipeline portability.
183180

184-
An example template script is provided below:
181+
Templates can be tested independently of pipeline execution. Consider the following template script:
185182

186183
```bash
187184
#!/bin/bash
@@ -190,22 +187,28 @@ echo $STR
190187
echo "process completed"
191188
```
192189

193-
Variables prefixed with the dollar character (`$`) are interpreted as Nextflow variables when the template script is executed by Nextflow and Bash variables when executed directly. For example, the above script can be executed from the command line by providing each input as an environment variable:
190+
The above script can be executed from the command line by providing each input as an environment variable.
194191

195192
```bash
196193
STR='foo' bash templates/my_script.sh
197194
```
198195

199-
The following caveats should be considered:
196+
Variables prefixed with the dollar character (`$`) are interpreted as Nextflow variables when the template script is executed by Nextflow and Bash variables when executed directly.
197+
198+
The following caveats should be considered when using templates:
199+
200+
- Template scripts are only recommended for Bash scripts.
201+
202+
- Languages that do not prefix variables with `$` (e.g. Python and R) can't be executed directly as a template script from the command line.
200203

201-
- Template scripts are recommended only for Bash scripts. Languages that do not prefix variables with `$` (e.g. Python and R) can't be executed directly as a template script.
204+
- Template variables escaped with `\$` will be interpreted as Bash variables when executed by Nextflow but not the command line.
202205

203-
- Variables escaped with `\$` will be interpreted as Bash variables when executed by Nextflow, but will not be interpreted as variables when executed from the command line. This practice should be avoided to ensure that the template script behaves consistently.
206+
- Template variables are evaluated even if they are commented out in the template script.
204207

205-
- Template variables are evaluated even if they are commented out in the template script. If a template variable is missing, it will cause the pipeline to fail regardless of where it occurs in the template.
208+
- The pipeline to fail if a template variable is missing, regardless of where it occurs in the template.
206209

207210
:::{tip}
208-
Template scripts are generally discouraged due to the caveats described above. The best practice for using a custom script is to embed it in the process definition at first and move it to a separate file with its own command line interface once the code matures.
211+
The best practice for using a custom script is to first embed it in the process definition and transfer it to a separate file with its own command line interface once the code matures.
209212
:::
210213

211214
(process-shell)=

0 commit comments

Comments
 (0)