New section

christopher-hakkaart · christopher-hakkaart · commit 9a2424ef1077 · 2024-12-03T16:27:27.000+01:00
Signed-off-by: Christopher Hakkaart &lt;chris.hakkaart@seqera.io&gt;
diff --git a/docs/index.md b/docs/index.md
@@ -78,6 +78,7 @@ module
 notifications
 secrets
 sharing
+structure
 vscode
 dsl1
 ```
diff --git a/docs/process.md b/docs/process.md
@@ -162,48 +162,10 @@ In the above example, the process will execute one of several scripts depending
 
 Process scripts can be externalized to **template** files and reused across multiple processes. Templates can be accessed using the `template` function in the script section. For example:
 
-```nextflow
-process templateExample {
-    input:
-    val STR
-
-    script:
-    template 'my_script.sh'
-}
-
-workflow {
-    Channel.of('this', 'that') | templateExample
-}
-```
-
 By default, Nextflow looks for template scripts in the `templates` directory, located alongside the Nextflow script that defines the process. An absolute path can be used to specify a different template location. However, this practice is discouraged because it hinders pipeline portability.
 
 Templates can be tested independently of pipeline execution. However, variables prefixed with the dollar character (`$`) are interpreted as Nextflow variables when the template script is executed by Nextflow and Bash variables when executed directly. Consider the following template script:
 
-```bash
-#!/bin/bash
-echo "process started at `date`"
-echo $STR
-echo "process completed"
-```
-
-The above script can be executed from the command line by providing each input as an environment variable:
-
-```bash
-STR='foo' bash templates/my_script.sh
-```
-
-Several caveats should be considered when using templates:
-
-- Template scripts are only recommended for Bash scripts.
-- Languages that do not prefix variables with `$` (e.g. Python and R) can't be executed directly as a template script from the command line.
-- Template variables escaped with `\$` will be interpreted as Bash variables when executed by Nextflow but not the command line.
-- Template variables are evaluated even if they are commented out in the template script.
-- The pipeline to fail if a template variable is missing, regardless of where it occurs in the template.
-
-:::{tip}
-The best practice for using a custom script is to first embed it in the process definition and transfer it to a separate file with its own command line interface once the code matures.
-:::
 
 (process-shell)=
 
diff --git a/docs/sharing.md b/docs/sharing.md
@@ -93,43 +93,6 @@ Read the {ref}`container-page` page to learn more about how to use containers wi
 For maximal reproducibility, make sure to define a specific version for each tool. Otherwise, your pipeline might use different versions across subsequent runs, which can introduce subtle differences to your results.
 :::
 
-(bundling-executables)=
-
-#### The `bin` directory
-
-Executable scripts can be included in the pipeline `bin` directory located at the root of your pipeline directory. This allows you to create and organize custom scripts that can be invoked like regular commands from any process in your pipeline without modifying the `PATH` environment variable or using an absolute path. For example:
-
-```
-├── bin
-│   └── custom_script.py
-└── main.nf
-```
-
-Each script should include a shebang line to specify the interpreter for the script.
-
-:::{tip}
-Use `env` to resolve the interpreter's location instead of hard-coding the interpreter path. For example:
-
-```
-#!/usr/bin/env python
-```
-
-:::
-
-Scripts placed in the `bin` directory must have executable permissions. Use `chmod` to grant the required permissions. For example:
-
-```
-chmod a+x bin/custom_script.py
-```
-
-After setting the executable permission, the script can be run directly within your pipeline processes.
-
-Executable scripts can also be stored as scripts that are locally scoped to the processes defined by the tasks. See {ref}`module-binaries` for more information.
-
-#### The `lib` directory
-
-Any Groovy scripts or JAR files in the `lib` directory will be automatically loaded and made available to your pipeline scripts. The `lib` directory is a useful way to provide utility code or external libraries without cluttering the pipeline scripts.
-
 ### Data
 
 In general, input data should be provided by external sources using parameters which can be controlled by the user. This way, a pipeline can be easily reused to process different datasets which are appropriate for the pipeline.
diff --git a/docs/structure.md b/docs/structure.md
@@ -0,0 +1,170 @@
+(structure-page)=
+
+# Structure
+
+## The `templates` directory
+
+The `templates` directory in the Nextflow project root can be used to store scripts.
+
+```
+├── templates
+│   └── sayhello.py
+└── main.nf
+```
+
+It allows custom scripts to be invoked like regular scripts from any process in your pipeline using the `template` function:
+
+```
+process sayHello {
+    
+    input:
+    val x
+
+    output:
+    stdout
+
+    script:
+    template 'sayhello.py'
+}
+
+workflow {
+    Channel.of("Foo") | sayHello | view
+}
+```
+
+Variables prefixed with the dollar character (`$`) are interpreted as Nextflow variables when the template script is executed by Nextflow:
+
+```
+#!/usr/bin/env python
+
+print("Hello ${x}!")
+```
+
+The pipeline will fail if a template variable is missing, regardless of where it occurs in the template.
+
+Templates can be tested independently of pipeline execution by providing each input as an environment variable. For example:
+
+```bash
+STR='foo' bash templates/my_script.sh
+```
+
+Template scripts are only recommended for Bash scripts. Languages that do not prefix variables with `$` (e.g. Python and R) can't be executed directly as a template script from the command line as variables prefixed with `$` are interpreted as Bash variables. Similarly, template variables escaped with `\$` will be interpreted as Bash variables when executed by Nextflow but not the command line.
+
+:::{warning}
+Template variables are evaluated even if they are commented out in the template script.
+:::
+
+:::{tip}
+The best practice for using a custom script is to first embed it in the process definition and transfer it to a separate file with its own command line interface once the code matures.
+:::
+
+(bundling-executables)=
+
+## The `bin` directory
+
+The `bin` directory in the Nextflow project root can be used to store executable scripts.
+
+```
+├── bin
+│   └── sayhello.py
+└── main.nf
+```
+
+It allows custom scripts to be invoked like regular commands from any process in your pipeline without modifying the `PATH` environment variable or using an absolute path. Each script should include a shebang to specify the interpreter. Inputs should be supplied as arguments.
+
+```python
+#!/usr/bin/env python
+
+import argparse
+
+def main():
+    parser = argparse.ArgumentParser(description="A simple argparse example.")
+    parser.add_argument("name", type=str, help="Person to greet.")
+    
+    args = parser.parse_args()
+    print(f"Hello {args.name}!")
+
+if __name__ == "__main__":
+    main()
+```
+
+:::{tip}
+Use `env` to resolve the interpreter's location instead of hard-coding the interpreter path.
+:::
+
+Scripts placed in the `bin` directory must have executable permissions. Use `chmod` to grant the required permissions. For example:
+
+```
+chmod a+x bin/sayhello.py
+```
+
+Like modifying a process script, changing the executable script will cause the task to be re-executed on a resumed run.
+
+:::{warning}
+When using containers and the Wave service, Nextflow will send the project-level `bin` directory to the Wave service for inclusion as a layer in the container. Any changes to scripts in the `bin` directory will change the layer md5sum and the hash for the final container. The container identity is a component of the task hash calculation and will force re-calculation of all tasks in the workflow.
+
+When using the Wave service, use module-specific bin directories instead. See {ref}`module-binaries` for more information.
+:::
+
+## The `lib` directory
+
+The `lib` directory can be used to add utility code or external libraries without cluttering the pipeline scripts. The `lib` directory in the Nextflow project root is added to the classpath by default.
+
+```
+├── lib
+│   └── DNASequence.groovy
+└── main.nf
+```
+
+Classes or packages defined in the `lib` directory will be available in the execution context. Scripts or functions defined outside of classes will not be available in the execution context.
+
+For example, `lib/DNASequence.groovy` defines the `DNASequence` class:
+
+```groovy
+// lib/DNASequence.groovy
+class DNASequence {
+    String sequence
+
+    // Constructor
+    DNASequence(String sequence) {
+        this.sequence = sequence.toUpperCase() // Ensure sequence is in uppercase for consistency
+    }
+
+    // Method to calculate melting temperature using the Wallace rule
+    double getMeltingTemperature() {
+        int g_count = sequence.count('G')
+        int c_count = sequence.count('C')
+        int a_count = sequence.count('A')
+        int t_count = sequence.count('T')
+
+        // Wallace rule calculation
+        double tm = 4 * (g_count + c_count) + 2 * (a_count + t_count)
+        return tm
+    }
+
+    String toString() {
+        return "DNA[$sequence]"
+    }
+}
+```
+
+The `DNASequence` class is available in the execution context:
+
+```nextflow
+// main.nf
+workflow {
+    Channel.of('ACGTTGCAATGCCGTA', 'GCGTACGGTACGTTAC')
+    .map { seq -> new DNASequence(seq) }
+    .view { dna -> 
+        def meltTemp = dna.getMeltingTemperature()
+        "Found sequence '$dna' with melting temperature ${meltTemp}°C" 
+    }
+}
+```
+
+It returns:
+
+```
+Found sequence 'DNA[ACGTTGCAATGCCGTA]' with melting temperaure 48.0°C
+Found sequence 'DNA[GCGTACGGTACGTTAC]' with melting temperaure 50.0°C
+```