Skip to content

Commit a5f8d9b

Browse files
Revise troubleshooting section (#6292) [ci skip]
Signed-off-by: Christopher Hakkaart <[email protected]>
1 parent 18c278b commit a5f8d9b

File tree

1 file changed

+40
-31
lines changed

1 file changed

+40
-31
lines changed

docs/cache-and-resume.md

Lines changed: 40 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -72,75 +72,84 @@ For this reason, it is important to preserve both the task cache (`.nextflow/cac
7272

7373
## Troubleshooting
7474

75-
Cache failures happen when either (1) a task that was supposed to be cached was re-executed, or (2) a task that was supposed to be re-executed was cached.
75+
Cache failures occur when a task that was supposed to be cached was re-executed or a task that was supposed to be re-executed was cached.
7676

77-
When this happens, consider the following questions:
77+
Common causes of cache failures include:
7878

79-
- Is resume enabled via `-resume`?
80-
- Is the {ref}`process-cache` directive set to a non-default value?
81-
- Is the task still present in the task cache and work directory?
82-
- Were any of the task inputs changed?
79+
- [Resume not being enabled](#resume-not-enabled)
80+
- [Non-default cache directives](#non-default-cache-directives)
81+
- [Modified inputs](#modified-input-files)
82+
- [Inconsistent file attributes](#inconsistent-file-attributes)
83+
- [Race condition on a global variable](#race-condition-on-a-global-variable)
84+
- [Non-deterministic process inputs](#non-deterministic-process-inputs)
8385

84-
Changing any of the inputs included in the [task hash](#task-hash) will invalidate the cache, for example:
86+
### Resume not enabled
8587

86-
- Resuming from a different session ID
87-
- Changing the process name
88-
- Changing the task container image or Conda environment
89-
- Changing the task script
90-
- Changing an input file or bundled script used by the task
88+
The `-resume` option is required to resume a pipeline. Ensure you enable `-resume` in your run command or your Nextflow configuration file.
9189

92-
While the following examples would not invalidate the cache:
90+
### Non-default cache directives
9391

94-
- Changing the value of a directive (other than {ref}`process-ext`), even if that directive is used in the task script
92+
The `cache` directive is enabled by default. However, you can disable or modify its behavior for a specific process. For example:
9593

96-
In many cases, cache failures happen because of a change to the pipeline script or configuration, or because the pipeline itself has some non-deterministic behavior.
94+
```nextflow
95+
process FOO {
96+
cache false
97+
// ...
98+
}
99+
```
97100

98-
Here are some common reasons for cache failures:
101+
Ensure that the cache has not been set to a non-default value. See {ref}`process-cache` for more information about the `cache` directive.
99102

100103
### Modified input files
101104

102-
Make sure that your input files have not been changed. Keep in mind that the default caching mode uses the complete file path, the last modified timestamp, and the file size. If any of these attributes change, the task will be re-executed, even if the file content is unchanged.
105+
Modifying inputs that are used in the task hash invalidates the cache. Common causes of modified inputs include:
103106

104-
### Process that modifies its inputs
107+
- Changing input files
108+
- Resuming from a different session ID
109+
- Changing the process name
110+
- Changing the task container image or Conda environment
111+
- Changing the task script
112+
- Changing a bundled script used by the task
105113

106-
If a process modifies its own input files, it cannot be resumed for the reasons described in the previous point. As a result, processes that modify their own input files are considered an anti-pattern and should be avoided.
114+
Nextflow calculates a hash for an input file using its full path, last modified timestamp, and file size. If any of these attributes change, Nextflow re-executes the task. If a process modifies its input files, you cannot resume it. Avoid processes that modify their own input files as this is considered an anti-pattern.
107115

108116
### Inconsistent file attributes
109117

110-
Some shared file systems, such as NFS, may report inconsistent file timestamps, which can invalidate the cache. If you encounter this problem, you can avoid it by using the `'lenient'` {ref}`caching mode <process-cache>`, which ignores the last modified timestamp and uses only the file path and size.
118+
Some shared file systems, such as NFS, may report inconsistent file timestamps.
119+
120+
To resolve this issue, use the `'lenient'` {ref}`caching mode <process-cache>` to ignore the last modified timestamp and use only the file path.
111121

112122
(cache-global-var-race-condition)=
113123

114124
### Race condition on a global variable
115125

116-
While Nextflow tries to make it easy to write safe concurrent code, it is still possible to create race conditions, which can in turn impact the caching behavior of your pipeline.
117-
118-
Consider the following example:
126+
Race conditions can in disrupt caching behavior of your pipeline. For example:
119127

120128
```nextflow
121129
channel.of(1,2,3) | map { v -> X=v; X+=2 } | view { v -> "ch1 = $v" }
122130
channel.of(1,2,3) | map { v -> X=v; X*=2 } | view { v -> "ch2 = $v" }
123131
```
124132

125-
The problem here is that `X` is declared in each `map` closure without the `def` keyword (or other type qualifier). Using the `def` keyword makes the variable local to the enclosing scope; omitting the `def` keyword makes the variable global to the entire script.
133+
In the above example, `X` is declared in each `map` closure. Without the `def` keyword, or other type qualifier, the variable `X` is global to the entire script. Operators and executed concurrently and, as `X` is global, there is a *race condition* that causes the emitted values to vary depending on the order of the concurrent operations. If these values were passed to a process as inputs the process would execute different tasks during each run due to the race condition.
126134

127-
Because `X` is global, and operators are executed concurrently, there is a *race condition* on `X`, which means that the emitted values will vary depending on the particular order of the concurrent operations. If the values were passed as inputs into a process, the process would execute different tasks on each run due to the race condition.
128-
129-
The solution is to not use a global variable where a local variable is enough (or in this simple example, avoid the variable altogether):
135+
To resolve this failure type, use local variables:
130136

131137
```nextflow
132138
// local variable
133139
channel.of(1,2,3) | map { v -> def X=v; X+=2 } | view { v -> "ch1 = $v" }
140+
```
134141

135-
// no variable
142+
Alternatively, remove the variable:
143+
144+
```nextflow
136145
channel.of(1,2,3) | map { v -> v * 2 } | view { v -> "ch2 = $v" }
137146
```
138147

139148
(cache-nondeterministic-inputs)=
140149

141150
### Non-deterministic process inputs
142151

143-
Sometimes a process needs to merge inputs from different sources. Consider the following example:
152+
A process that merges inputs from different sources non-deterministically may invalidate the cache. For example:
144153

145154
```nextflow
146155
workflow {
@@ -161,9 +170,9 @@ process check_bam_bai {
161170
}
162171
```
163172

164-
It is tempting to assume that the process inputs will be matched by `id` like the {ref}`operator-join` operator. But in reality, they are simply merged like the {ref}`operator-merge` operator. As a result, not only will the process inputs be incorrect, they will also be non-deterministic, thus invalidating the cache.
173+
In the above example, the inputs will merge without matching. This is the same way method used by the {ref}`operator-merge` operator. When merged, the inputs are incorrect, non-deterministic, and invalidate the cache.
165174

166-
The solution is to explicitly join the two channels before the process invocation:
175+
To resolve this failure type, join channels before invoking the process:
167176

168177
```nextflow
169178
workflow {

0 commit comments

Comments
 (0)