You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
-[Race condition on a global variable](#race-condition-on-a-global-variable)
84
84
-[Non-deterministic process inputs](#non-deterministic-process-inputs)
@@ -87,7 +87,7 @@ Common causes of cache failures include:
87
87
88
88
The `-resume` option is required to resume a pipeline. Ensure you enable `-resume` in your run command or your Nextflow configuration file.
89
89
90
-
### Non-default cache directives
90
+
### Cache directive disabled
91
91
92
92
The `cache` directive is enabled by default. However, you can disable or modify its behavior for a specific process. For example:
93
93
@@ -98,52 +98,54 @@ process FOO {
98
98
}
99
99
```
100
100
101
-
Ensure that the cachehas not been set to a non-default value. See {ref}`process-cache` for more information about the `cache` directive.
101
+
Ensure that the `cache` directive has not been disabled. See {ref}`process-cache` for more information.
102
102
103
-
### Modified input files
103
+
### Modified inputs
104
104
105
105
Modifying inputs that are used in the task hash invalidates the cache. Common causes of modified inputs include:
106
106
107
107
- Changing input files
108
108
- Resuming from a different session ID
109
109
- Changing the process name
110
+
- Changing the calling workflow name
110
111
- Changing the task container image or Conda environment
111
112
- Changing the task script
112
113
- Changing a bundled script used by the task
113
114
114
-
Nextflow calculates a hash for an input file using its full path, last modified timestamp, and file size. If any of these attributes change, Nextflow re-executes the task. If a process modifies its input files, you cannot resume it. Avoid processes that modify their own input files as this is considered an anti-pattern.
115
+
Nextflow calculates a hash for an input file using its full path, last modified timestamp, and file size. If any of these attributes change, Nextflow re-executes the task.
116
+
117
+
:::{warning}
118
+
If a process modifies its input files, it cannot be resumed. Avoid processes that modify their own input files as this is considered an anti-pattern.
119
+
:::
115
120
116
121
### Inconsistent file attributes
117
122
118
-
Some shared file systems, such as NFS, may report inconsistent file timestamps.
123
+
Some shared file systems, such as NFS, may report inconsistent file timestamps, which can invalidate the cache when using the standard caching mode.
119
124
120
-
To resolve this issue, use the `'lenient'` {ref}`caching mode <process-cache>` to ignore the last modified timestamp and use only the file path.
125
+
To resolve this issue, use the `'lenient'` {ref}`caching mode <process-cache>` to ignore the last modified timestamp and use only the file path and size.
121
126
122
127
(cache-global-var-race-condition)=
123
128
124
129
### Race condition on a global variable
125
130
126
-
Race conditions can in disrupt caching behavior of your pipeline. For example:
131
+
Race conditions can disrupt the caching behavior of your pipeline. For example:
127
132
128
133
```nextflow
129
-
channel.of(1,2,3) | map { v -> X=v; X+=2 } | view { v -> "ch1 = $v" }
130
-
channel.of(1,2,3) | map { v -> X=v; X*=2 } | view { v -> "ch2 = $v" }
134
+
channel.of(1,2,3).map { v -> X=v; X+=2 }.view { v -> "ch1 = $v" }
135
+
channel.of(1,2,3).map { v -> X=v; X*=2 }.view { v -> "ch2 = $v" }
131
136
```
132
137
133
-
In the above example, `X` is declared in each `map` closure. Without the `def` keyword, or other type qualifier, the variable `X` is global to the entire script. Operators and executed concurrently and, as `X` is global, there is a *race condition* that causes the emitted values to vary depending on the order of the concurrent operations. If these values were passed to a process as inputs the process would execute different tasks during each run due to the race condition.
138
+
In the above example, `X` is declared in each `map` closure. Without the `def` keyword, the variable `X` is global to the entire script. Because operators are executed concurrently and`X` is global, there is a *race condition* that causes the emitted values to vary depending on the order of the concurrent operations. If these values were passed to a process as inputs, the process would execute different tasks during each run due to the race condition.
134
139
135
-
To resolve this failure type, use local variables:
140
+
To resolve this issue, avoid declaring global variables in closures:
136
141
137
142
```nextflow
138
-
// local variable
139
-
channel.of(1,2,3) | map { v -> def X=v; X+=2 } | view { v -> "ch1 = $v" }
143
+
channel.of(1,2,3).map { v -> def X=v; X+=2 }.view { v -> "ch1 = $v" }
140
144
```
141
145
142
-
Alternatively, remove the variable:
143
-
144
-
```nextflow
145
-
channel.of(1,2,3) | map { v -> v * 2 } | view { v -> "ch2 = $v" }
146
-
```
146
+
:::{versionadded} 25.04.0
147
+
The {ref}`strict syntax <strict-syntax-page>` does not allow global variables to be declared in closures.
148
+
:::
147
149
148
150
(cache-nondeterministic-inputs)=
149
151
@@ -170,9 +172,9 @@ process check_bam_bai {
170
172
}
171
173
```
172
174
173
-
In the above example, the inputs will merge without matching. This is the same way method used by the {ref}`operator-merge` operator. When merged, the inputs are incorrect, non-deterministic, and invalidate the cache.
175
+
In the above example, the inputs will be merged without matching on `id`, in a similar manner as the {ref}`operator-merge` operator. As a result, the inputs are incorrect and non-deterministic.
174
176
175
-
To resolve this failure type, joinchannels before invoking the process:
177
+
To resolve this issue, use the `join` operator to join the channels into a single input channel before invoking the process:
0 commit comments