Skip to content

Commit 6c72493

Browse files
authored
Merge branch 'master' into fix-azure-starttask-concatenation
2 parents e9a19b4 + a468f8e commit 6c72493

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

43 files changed

+342
-221
lines changed

build.gradle

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -102,8 +102,8 @@ allprojects {
102102

103103
// Documentation required libraries
104104
groovyDoc 'org.fusesource.jansi:jansi:2.4.0'
105-
groovyDoc "org.apache.groovy:groovy-groovydoc:4.0.27"
106-
groovyDoc "org.apache.groovy:groovy-ant:4.0.27"
105+
groovyDoc "org.apache.groovy:groovy-groovydoc:4.0.28"
106+
groovyDoc "org.apache.groovy:groovy-ant:4.0.28"
107107
}
108108

109109
test {

docs/aws.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -281,7 +281,7 @@ There are several reasons why you might need to create your own [AMI (Amazon Mac
281281
### Create your custom AMI
282282

283283
From the EC2 Dashboard, select **Launch Instance**, then select **Browse more AMIs**. In the new page, select
284-
**AWS Marketplace AMIs**, and then search for **Amazon ECS-Optimized Amazon Linux 2 (AL2) x86_64 AMI**. Select the AMI and continue as usual to configure and launch the instance.
284+
**AWS Marketplace AMIs**, and then search for `Amazon ECS-Optimized Amazon Linux 2 (AL2) x86_64 AMI`. Select the AMI and continue as usual to configure and launch the instance.
285285

286286
:::{note}
287287
The selected instance has a root volume of 30GB. Make sure to increase its size or add a second EBS volume with enough storage for real genomic workloads.

docs/cache-and-resume.md

Lines changed: 49 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -72,75 +72,86 @@ For this reason, it is important to preserve both the task cache (`.nextflow/cac
7272

7373
## Troubleshooting
7474

75-
Cache failures happen when either (1) a task that was supposed to be cached was re-executed, or (2) a task that was supposed to be re-executed was cached.
75+
Cache failures occur when a task that was supposed to be cached was re-executed or a task that was supposed to be re-executed was cached.
7676

77-
When this happens, consider the following questions:
77+
Common causes of cache failures include:
7878

79-
- Is resume enabled via `-resume`?
80-
- Is the {ref}`process-cache` directive set to a non-default value?
81-
- Is the task still present in the task cache and work directory?
82-
- Were any of the task inputs changed?
79+
- [Resume not enabled](#resume-not-enabled)
80+
- [Cache directive disabled](#cache-directive-disabled)
81+
- [Modified inputs](#modified-inputs)
82+
- [Inconsistent file attributes](#inconsistent-file-attributes)
83+
- [Race condition on a global variable](#race-condition-on-a-global-variable)
84+
- [Non-deterministic process inputs](#non-deterministic-process-inputs)
8385

84-
Changing any of the inputs included in the [task hash](#task-hash) will invalidate the cache, for example:
86+
### Resume not enabled
8587

86-
- Resuming from a different session ID
87-
- Changing the process name
88-
- Changing the task container image or Conda environment
89-
- Changing the task script
90-
- Changing an input file or bundled script used by the task
88+
The `-resume` option is required to resume a pipeline. Ensure you enable `-resume` in your run command or your Nextflow configuration file.
89+
90+
### Cache directive disabled
9191

92-
While the following examples would not invalidate the cache:
92+
The `cache` directive is enabled by default. However, you can disable or modify its behavior for a specific process. For example:
9393

94-
- Changing the value of a directive (other than {ref}`process-ext`), even if that directive is used in the task script
94+
```nextflow
95+
process FOO {
96+
cache false
97+
// ...
98+
}
99+
```
95100

96-
In many cases, cache failures happen because of a change to the pipeline script or configuration, or because the pipeline itself has some non-deterministic behavior.
101+
Ensure that the `cache` directive has not been disabled. See {ref}`process-cache` for more information.
97102

98-
Here are some common reasons for cache failures:
103+
### Modified inputs
99104

100-
### Modified input files
105+
Modifying inputs that are used in the task hash invalidates the cache. Common causes of modified inputs include:
101106

102-
Make sure that your input files have not been changed. Keep in mind that the default caching mode uses the complete file path, the last modified timestamp, and the file size. If any of these attributes change, the task will be re-executed, even if the file content is unchanged.
107+
- Changing input files
108+
- Resuming from a different session ID
109+
- Changing the process name
110+
- Changing the calling workflow name
111+
- Changing the task container image or Conda environment
112+
- Changing the task script
113+
- Changing a bundled script used by the task
103114

104-
### Process that modifies its inputs
115+
Nextflow calculates a hash for an input file using its full path, last modified timestamp, and file size. If any of these attributes change, Nextflow re-executes the task.
105116

106-
If a process modifies its own input files, it cannot be resumed for the reasons described in the previous point. As a result, processes that modify their own input files are considered an anti-pattern and should be avoided.
117+
:::{warning}
118+
If a process modifies its input files, it cannot be resumed. Avoid processes that modify their own input files as this is considered an anti-pattern.
119+
:::
107120

108121
### Inconsistent file attributes
109122

110-
Some shared file systems, such as NFS, may report inconsistent file timestamps, which can invalidate the cache. If you encounter this problem, you can avoid it by using the `'lenient'` {ref}`caching mode <process-cache>`, which ignores the last modified timestamp and uses only the file path and size.
123+
Some shared file systems, such as NFS, may report inconsistent file timestamps, which can invalidate the cache when using the standard caching mode.
124+
125+
To resolve this issue, use the `'lenient'` {ref}`caching mode <process-cache>` to ignore the last modified timestamp and use only the file path and size.
111126

112127
(cache-global-var-race-condition)=
113128

114129
### Race condition on a global variable
115130

116-
While Nextflow tries to make it easy to write safe concurrent code, it is still possible to create race conditions, which can in turn impact the caching behavior of your pipeline.
117-
118-
Consider the following example:
131+
Race conditions can disrupt the caching behavior of your pipeline. For example:
119132

120133
```nextflow
121-
channel.of(1,2,3) | map { v -> X=v; X+=2 } | view { v -> "ch1 = $v" }
122-
channel.of(1,2,3) | map { v -> X=v; X*=2 } | view { v -> "ch2 = $v" }
134+
channel.of(1,2,3).map { v -> X=v; X+=2 }.view { v -> "ch1 = $v" }
135+
channel.of(1,2,3).map { v -> X=v; X*=2 }.view { v -> "ch2 = $v" }
123136
```
124137

125-
The problem here is that `X` is declared in each `map` closure without the `def` keyword (or other type qualifier). Using the `def` keyword makes the variable local to the enclosing scope; omitting the `def` keyword makes the variable global to the entire script.
138+
In the above example, `X` is declared in each `map` closure. Without the `def` keyword, the variable `X` is global to the entire script. Because operators are executed concurrently and `X` is global, there is a *race condition* that causes the emitted values to vary depending on the order of the concurrent operations. If these values were passed to a process as inputs, the process would execute different tasks during each run due to the race condition.
126139

127-
Because `X` is global, and operators are executed concurrently, there is a *race condition* on `X`, which means that the emitted values will vary depending on the particular order of the concurrent operations. If the values were passed as inputs into a process, the process would execute different tasks on each run due to the race condition.
128-
129-
The solution is to not use a global variable where a local variable is enough (or in this simple example, avoid the variable altogether):
140+
To resolve this issue, avoid declaring global variables in closures:
130141

131142
```nextflow
132-
// local variable
133-
channel.of(1,2,3) | map { v -> def X=v; X+=2 } | view { v -> "ch1 = $v" }
134-
135-
// no variable
136-
channel.of(1,2,3) | map { v -> v * 2 } | view { v -> "ch2 = $v" }
143+
channel.of(1,2,3).map { v -> def X=v; X+=2 }.view { v -> "ch1 = $v" }
137144
```
138145

146+
:::{versionadded} 25.04.0
147+
The {ref}`strict syntax <strict-syntax-page>` does not allow global variables to be declared in closures.
148+
:::
149+
139150
(cache-nondeterministic-inputs)=
140151

141152
### Non-deterministic process inputs
142153

143-
Sometimes a process needs to merge inputs from different sources. Consider the following example:
154+
A process that merges inputs from different sources non-deterministically may invalidate the cache. For example:
144155

145156
```nextflow
146157
workflow {
@@ -161,9 +172,9 @@ process check_bam_bai {
161172
}
162173
```
163174

164-
It is tempting to assume that the process inputs will be matched by `id` like the {ref}`operator-join` operator. But in reality, they are simply merged like the {ref}`operator-merge` operator. As a result, not only will the process inputs be incorrect, they will also be non-deterministic, thus invalidating the cache.
175+
In the above example, the inputs will be merged without matching on `id`, in a similar manner as the {ref}`operator-merge` operator. As a result, the inputs are incorrect and non-deterministic.
165176

166-
The solution is to explicitly join the two channels before the process invocation:
177+
To resolve this issue, use the `join` operator to join the channels into a single input channel before invoking the process:
167178

168179
```nextflow
169180
workflow {

docs/conf.py

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -51,7 +51,11 @@
5151
'operator.md': 'reference/operator.md',
5252
'dsl1.md': 'migrations/dsl1.md',
5353
'updating-syntax.md': 'strict-syntax.md',
54-
'updating-spot-retries.md': 'guides/updating-spot-retries.md'
54+
'updating-spot-retries.md': 'guides/updating-spot-retries.md',
55+
'metrics.md': 'tutorials/metrics.md',
56+
'data-lineage.md' : 'tutorials/data-lineage.md',
57+
'workflow-outputs.md': 'tutorials/workflow-outputs.md',
58+
'flux.md': 'tutorials/flux.md'
5559
}
5660

5761
# Add any paths that contain templates here, relative to this directory.

docs/developer-env.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -96,9 +96,9 @@ See {ref}`vscode-page` for more information about the Nextflow extension feature
9696

9797
**nf-core**
9898

99-
The [nf-core extension pack](https://marketplace.visualstudio.com/items?itemName=nf-core.nf-core-extensionpack) adds a selection of tools that help develop with nf-core, a community effort to collect a curated set of analysis pipelines built using Nextflow.
99+
[nf-core](https://nf-co.re/) is a community effort to collect a curated set of analysis pipelines built using Nextflow. The [nf-core extension pack](https://marketplace.visualstudio.com/items?itemName=nf-core.nf-core-extensionpack) adds a selection of tools that support development. For example, it includes [Code Spell Checker](https://marketplace.visualstudio.com/items?itemName=streetsidesoftware.code-spell-checker), [Prettier](https://marketplace.visualstudio.com/items?itemName=esbenp.prettier-vscode), [Todo Tree](https://marketplace.visualstudio.com/items?itemName=Gruntfuggly.todo-tree), and [Markdown Extended](https://marketplace.visualstudio.com/items?itemName=jebbs.markdown-extended).
100100

101-
The nf-core extension pack includes several useful extensions. For example, [Code Spell Checker](https://marketplace.visualstudio.com/items?itemName=streetsidesoftware.code-spell-checker), [Prettier](https://marketplace.visualstudio.com/items?itemName=esbenp.prettier-vscode), [Todo Tree](https://marketplace.visualstudio.com/items?itemName=Gruntfuggly.todo-tree), and [Markdown Extended](https://marketplace.visualstudio.com/items?itemName=jebbs.markdown-extended). See [nf-core extension pack](https://marketplace.visualstudio.com/items?itemName=nf-core.nf-core-extensionpack) for more information about the tools included in the nf-core extension pack.
101+
See the [nf-core extension pack](https://marketplace.visualstudio.com/items?itemName=nf-core.nf-core-extensionpack) for more information about the included tools.
102102

103103
(devenv-remote)=
104104

docs/guides/aws-java-sdk-v2.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -2,13 +2,13 @@
22

33
# AWS Java SDK v2
44

5-
AWS Java SDK v1 is reaching end of life at the end of 2025. Starting in version `25.06.0-edge`, Nextflow uses AWS Java SDK v2 in the `nf-amazon` plugin.
5+
AWS Java SDK v1 will reach end of life at the end of 2025. Starting with version `25.06.0-edge`, Nextflow uses AWS Java SDK v2 in the `nf-amazon` plugin.
66

7-
This migration introduced several breaking changes to the `aws.client` config scope, including new options and removed options. This page describes these changes and how they affect your Nextflow configuraiton.
7+
This migration introduces several breaking changes to the `aws.client` config scope, including new and removed options. This page describes these changes and how they affect your Nextflow configuration.
88

99
## New HTTP client
1010

11-
The HTTP client used by SDK v2 does not support overriding certain advanced HTTP options. As a result, the following config options are no longer supported:
11+
The HTTP client in SDK v2 does not support overriding certain advanced HTTP options. As a result, the following config options are no longer supported:
1212

1313
- `aws.client.protocol`
1414
- `aws.client.signerOverride`
@@ -18,15 +18,15 @@ The HTTP client used by SDK v2 does not support overriding certain advanced HTTP
1818

1919
## S3 transfer manager
2020

21-
The *S3 transfer manager* is a subsystem of SDK v2 which handles S3 transfers, including S3 uploads and downloads.
21+
The *S3 transfer manager* is a subsystem of SDK v2 that handles S3 uploads and downloads.
2222

23-
The concurrency and throughput of the S3 transfer manager can be configured manually using the `aws.client.maxConcurrency` and `aws.client.maxNativeMemory` config options. Alternatively, the `aws.client.targetThroughputInGbps` config option can be used to set the previous two options automatically based on a target throughput.
23+
You can configure the concurrency and throughput of the S3 transfer manager manually using the `aws.client.maxConcurrency` and `aws.client.maxNativeMemory` configuration options. Alternatively, you can use the `aws.client.targetThroughputInGbps` option to set both values automatically based on a target throughput.
2424

25-
## Multi-part uplaods
25+
## Multi-part uploads
2626

27-
Multi-part uploads are handled by the S3 transfer manager. The `aws.client.minimumPartSize` and `aws.client.multipartThreshold` config options can be used to control when and how multi-part uploads are performed.
27+
Multi-part uploads are handled by the S3 transfer manager. You can use the `aws.client.minimumPartSize` and `aws.client.multipartThreshold` config options to control when and how multi-part uploads are performed.
2828

29-
The following multi-part upload options are no longer supported:
29+
The following multi-part upload config options are no longer supported:
3030

3131
- `aws.client.uploadChunkSize`
3232
- `aws.client.uploadMaxAttempts`

docs/migrations/24-04.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -112,7 +112,7 @@ The `nf-ga4gh` plugin has since been moved into its own repository, [nextflow-io
112112
```groovy
113113
conda.channels = ['seqera', 'conda-forge', 'bioconda', 'defaults']
114114
115-
## Miscellanous
115+
## Miscellaneous
116116
117117
- New config option: `azure.batch.pools.<name>.lowPriority`
118118
- New config option: `azure.batch.pools.<name>.startTask.script`

docs/migrations/24-10.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@ Nextflow now supports managed identities for the Azure Batch executor. See {ref}
3535

3636
<h3>Task previous execution trace</h3>
3737

38-
The `task` variable in the process definition has two new proprties, `task.previousTrace` and `task.previousException`, which allows a task to access the runtime metadata of the previous attempt. See {ref}`task-previous-execution-trace` for details.
38+
The `task` variable in the process definition has two new properties, `task.previousTrace` and `task.previousException`, which allows a task to access the runtime metadata of the previous attempt. See {ref}`task-previous-execution-trace` for details.
3939

4040
## Breaking changes
4141

@@ -53,7 +53,7 @@ The `task` variable in the process definition has two new proprties, `task.previ
5353

5454
- The use of `addParams` and `params` clauses in include declarations is deprecated. See {ref}`module-params` for details.
5555

56-
## Miscellanous
56+
## Miscellaneous
5757

5858
- New config option: `aws.client.requesterPays`
5959
- New config option: `google.batch.autoRetryExitCodes`

docs/migrations/25-04.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ The third preview of workflow outputs introduces the following breaking changes
3030

3131
- The syntax for dynamic publish paths has changed. Instead of defining a closure that returns a closure with the `path` directive, the outer closure should use the `>>` operator to publish individual files. See {ref}`workflow-publishing-files` for details.
3232

33-
- The `mapper` index directive has been removed. Use a `map` operator in the workflwo body instead.
33+
- The `mapper` index directive has been removed. Use a `map` operator in the workflow body instead.
3434

3535
See {ref}`migrating-workflow-outputs` to get started.
3636

docs/reports.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -302,7 +302,7 @@ The following table shows the fields that can be included in the execution repor
302302
: The value of the process `scratch` directive.
303303

304304
`error_action`
305-
: The action applied on errof task failure.
305+
: The action applied on error for task failure.
306306

307307
`hostname`
308308
: :::{versionadded} 22.05.0-edge

0 commit comments

Comments
 (0)