Skip to content

Commit 23e60ab

Browse files
Revise the revisions to cache/resume docs [ci skip] (#6299)
--------- Signed-off-by: Ben Sherman <[email protected]> Co-authored-by: Chris Hakkaart <[email protected]>
1 parent f6eb2f7 commit 23e60ab

File tree

1 file changed

+25
-23
lines changed

1 file changed

+25
-23
lines changed

docs/cache-and-resume.md

Lines changed: 25 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -76,9 +76,9 @@ Cache failures occur when a task that was supposed to be cached was re-executed
7676

7777
Common causes of cache failures include:
7878

79-
- [Resume not being enabled](#resume-not-enabled)
80-
- [Non-default cache directives](#non-default-cache-directives)
81-
- [Modified inputs](#modified-input-files)
79+
- [Resume not enabled](#resume-not-enabled)
80+
- [Cache directive disabled](#cache-directive-disabled)
81+
- [Modified inputs](#modified-inputs)
8282
- [Inconsistent file attributes](#inconsistent-file-attributes)
8383
- [Race condition on a global variable](#race-condition-on-a-global-variable)
8484
- [Non-deterministic process inputs](#non-deterministic-process-inputs)
@@ -87,7 +87,7 @@ Common causes of cache failures include:
8787

8888
The `-resume` option is required to resume a pipeline. Ensure you enable `-resume` in your run command or your Nextflow configuration file.
8989

90-
### Non-default cache directives
90+
### Cache directive disabled
9191

9292
The `cache` directive is enabled by default. However, you can disable or modify its behavior for a specific process. For example:
9393

@@ -98,52 +98,54 @@ process FOO {
9898
}
9999
```
100100

101-
Ensure that the cache has not been set to a non-default value. See {ref}`process-cache` for more information about the `cache` directive.
101+
Ensure that the `cache` directive has not been disabled. See {ref}`process-cache` for more information.
102102

103-
### Modified input files
103+
### Modified inputs
104104

105105
Modifying inputs that are used in the task hash invalidates the cache. Common causes of modified inputs include:
106106

107107
- Changing input files
108108
- Resuming from a different session ID
109109
- Changing the process name
110+
- Changing the calling workflow name
110111
- Changing the task container image or Conda environment
111112
- Changing the task script
112113
- Changing a bundled script used by the task
113114

114-
Nextflow calculates a hash for an input file using its full path, last modified timestamp, and file size. If any of these attributes change, Nextflow re-executes the task. If a process modifies its input files, you cannot resume it. Avoid processes that modify their own input files as this is considered an anti-pattern.
115+
Nextflow calculates a hash for an input file using its full path, last modified timestamp, and file size. If any of these attributes change, Nextflow re-executes the task.
116+
117+
:::{warning}
118+
If a process modifies its input files, it cannot be resumed. Avoid processes that modify their own input files as this is considered an anti-pattern.
119+
:::
115120

116121
### Inconsistent file attributes
117122

118-
Some shared file systems, such as NFS, may report inconsistent file timestamps.
123+
Some shared file systems, such as NFS, may report inconsistent file timestamps, which can invalidate the cache when using the standard caching mode.
119124

120-
To resolve this issue, use the `'lenient'` {ref}`caching mode <process-cache>` to ignore the last modified timestamp and use only the file path.
125+
To resolve this issue, use the `'lenient'` {ref}`caching mode <process-cache>` to ignore the last modified timestamp and use only the file path and size.
121126

122127
(cache-global-var-race-condition)=
123128

124129
### Race condition on a global variable
125130

126-
Race conditions can in disrupt caching behavior of your pipeline. For example:
131+
Race conditions can disrupt the caching behavior of your pipeline. For example:
127132

128133
```nextflow
129-
channel.of(1,2,3) | map { v -> X=v; X+=2 } | view { v -> "ch1 = $v" }
130-
channel.of(1,2,3) | map { v -> X=v; X*=2 } | view { v -> "ch2 = $v" }
134+
channel.of(1,2,3).map { v -> X=v; X+=2 }.view { v -> "ch1 = $v" }
135+
channel.of(1,2,3).map { v -> X=v; X*=2 }.view { v -> "ch2 = $v" }
131136
```
132137

133-
In the above example, `X` is declared in each `map` closure. Without the `def` keyword, or other type qualifier, the variable `X` is global to the entire script. Operators and executed concurrently and, as `X` is global, there is a *race condition* that causes the emitted values to vary depending on the order of the concurrent operations. If these values were passed to a process as inputs the process would execute different tasks during each run due to the race condition.
138+
In the above example, `X` is declared in each `map` closure. Without the `def` keyword, the variable `X` is global to the entire script. Because operators are executed concurrently and `X` is global, there is a *race condition* that causes the emitted values to vary depending on the order of the concurrent operations. If these values were passed to a process as inputs, the process would execute different tasks during each run due to the race condition.
134139

135-
To resolve this failure type, use local variables:
140+
To resolve this issue, avoid declaring global variables in closures:
136141

137142
```nextflow
138-
// local variable
139-
channel.of(1,2,3) | map { v -> def X=v; X+=2 } | view { v -> "ch1 = $v" }
143+
channel.of(1,2,3).map { v -> def X=v; X+=2 }.view { v -> "ch1 = $v" }
140144
```
141145

142-
Alternatively, remove the variable:
143-
144-
```nextflow
145-
channel.of(1,2,3) | map { v -> v * 2 } | view { v -> "ch2 = $v" }
146-
```
146+
:::{versionadded} 25.04.0
147+
The {ref}`strict syntax <strict-syntax-page>` does not allow global variables to be declared in closures.
148+
:::
147149

148150
(cache-nondeterministic-inputs)=
149151

@@ -170,9 +172,9 @@ process check_bam_bai {
170172
}
171173
```
172174

173-
In the above example, the inputs will merge without matching. This is the same way method used by the {ref}`operator-merge` operator. When merged, the inputs are incorrect, non-deterministic, and invalidate the cache.
175+
In the above example, the inputs will be merged without matching on `id`, in a similar manner as the {ref}`operator-merge` operator. As a result, the inputs are incorrect and non-deterministic.
174176

175-
To resolve this failure type, join channels before invoking the process:
177+
To resolve this issue, use the `join` operator to join the channels into a single input channel before invoking the process:
176178

177179
```nextflow
178180
workflow {

0 commit comments

Comments
 (0)