Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
44 commits
Select commit Hold shift + click to select a range
ca4570b
Migrate first five pages
christopher-hakkaart Jun 22, 2025
3af1dfc
Push changes
christopher-hakkaart Jun 23, 2025
028fb64
Migrating reports testing
christopher-hakkaart Jun 24, 2025
d6f20f4
Migrate pages
christopher-hakkaart Jun 25, 2025
bef5afd
Migrate more pages
christopher-hakkaart Jun 26, 2025
27712ef
Migrate VS code
christopher-hakkaart Jun 26, 2025
8c3953a
Update admonitions
christopher-hakkaart Jun 27, 2025
f0467b6
Migrate conda, container, and git pages
christopher-hakkaart Jun 27, 2025
49015d9
Migrated compute and storage sections
christopher-hakkaart Jun 30, 2025
81355b1
Migrate stdlib pages and others
christopher-hakkaart Jun 30, 2025
7ad2d7a
Remove leftover text
christopher-hakkaart Jun 30, 2025
91bbe01
Add more reference pages
christopher-hakkaart Jul 3, 2025
d9aef0a
Convert all pages to mdx and fix links
christopher-hakkaart Jul 7, 2025
44b7dc8
Fix broken links
christopher-hakkaart Jul 7, 2025
5299304
Fix more links
christopher-hakkaart Jul 7, 2025
6c5ba3e
Fix missing code blocks
christopher-hakkaart Jul 7, 2025
cac462e
Fix styling
christopher-hakkaart Jul 7, 2025
934c863
Fix merge conflicts
christopher-hakkaart Jul 9, 2025
87d54e4
Fix image paths
christopher-hakkaart Jul 9, 2025
7e7a9a5
Remove missed header
christopher-hakkaart Jul 9, 2025
dcdb7f7
Remove missed heading
christopher-hakkaart Jul 9, 2025
6ee0525
Fix missing references on config page
christopher-hakkaart Jul 9, 2025
e1643ed
Adding missing links
christopher-hakkaart Jul 9, 2025
2f4b14c
Fix admonition
christopher-hakkaart Jul 9, 2025
9c06067
Fix errors
christopher-hakkaart Jul 9, 2025
190ca99
Flex wWrap badges
christopher-hakkaart Jul 10, 2025
d167b41
Mix indenting
christopher-hakkaart Jul 11, 2025
4097598
Fix link
christopher-hakkaart Jul 14, 2025
3fa5a40
Rename files
christopher-hakkaart Jul 16, 2025
dfaa4fd
Merge remote-tracking branch 'origin/master' into docs-migration
christopher-hakkaart Jul 18, 2025
18666bf
Fix conflicts
christopher-hakkaart Jul 18, 2025
ff35c70
Fix headings
christopher-hakkaart Jul 18, 2025
a35099c
Fix merge conflicts
christopher-hakkaart Aug 13, 2025
671641b
Fix version tags
christopher-hakkaart Aug 13, 2025
9bb37e1
Fix link
christopher-hakkaart Aug 13, 2025
8df23b7
Fix links
christopher-hakkaart Aug 13, 2025
c677067
Fix conflicts
christopher-hakkaart Sep 18, 2025
e61ae97
Fix breaks
christopher-hakkaart Sep 18, 2025
0118ac1
Fix links
christopher-hakkaart Sep 19, 2025
deec647
Fix conflicts
christopher-hakkaart Oct 29, 2025
64cc3a2
install docusaurus into "new-docs" folder
dana-seqera Nov 6, 2025
3fd9492
migrate existing doc files into "new-docs" folder
dana-seqera Nov 6, 2025
1134282
update docusaurus.config
dana-seqera Nov 6, 2025
b1fbd95
remove import of VersionedAdmonitions in new-docs files
dana-seqera Nov 6, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
Binary file added .DS_Store
Binary file not shown.
11 changes: 7 additions & 4 deletions docs/amazons3.md → docs/amazons3.mdx
Original file line number Diff line number Diff line change
@@ -1,5 +1,3 @@
(amazons3-page)=

# Amazon S3

Nextflow includes support for AWS S3 storage. Files stored in an S3 bucket can be accessed transparently in your pipeline script like any other file in the local file system.
Expand All @@ -20,7 +18,7 @@ The usual file operations can be applied to a path handle with the above notatio
println file('s3://my-bucket/data/sequences.fa').text
```

See {ref}`working-with-files` and the {ref}`stdlib-types-path` reference to learn more about available file operations.
See [Working with files][working-with-files] and the [Path][stdlib-types-path] reference to learn more about available file operations.

## Security credentials

Expand Down Expand Up @@ -96,4 +94,9 @@ aws {

## Advanced configuration

Read {ref}`AWS configuration<config-aws>` section to learn more about advanced S3 client configuration options.
See [AWS configuration][config-aws] for more information about advanced S3 client configuration options.


[config-aws]: /nextflow_docs/nextflow_repo/docs/reference/config.mdx#aws
[stdlib-types-path]: /nextflow_docs/nextflow_repo/docs/reference/stdlib-types.mdx#path
[working-with-files]: /nextflow_docs/nextflow_repo/docs/working-with-files.mdx
545 changes: 545 additions & 0 deletions docs/aws.mdx

Large diffs are not rendered by default.

714 changes: 714 additions & 0 deletions docs/azure.mdx

Large diffs are not rendered by default.

258 changes: 258 additions & 0 deletions docs/cache-and-resume.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,258 @@
import { AddedInVersion, ChangedInVersion, DeprecatedInVersion } from '@site/src/components/VersionedAdmonitions';

# Caching and resuming

One of the core features of Nextflow is the ability to cache task executions and re-use them in subsequent runs to minimize duplicate work. Resumability is useful both for recovering from errors and for iteratively developing a pipeline. It is similar to [checkpointing](https://en.wikipedia.org/wiki/Application_checkpointing), a common practice used by HPC applications.

You can enable resumability in Nextflow with the `-resume` flag when launching a pipeline with `nextflow run`. In most cases, that is all you need to do and resumability will "just work". This page describes Nextflow's caching behavior in more detail in order to help advanced users understand how the cache works and troubleshoot it when it doesn't work.

## Task cache

All task executions are automatically saved to the task cache, regardless of the `-resume` option (so that you always have the option to resume later). The task cache is a key-value store, where each key-value pair corresponds to a previously-executed task.

The task cache is used in conjunction with the [work directory](#work-directory) to recover cached tasks in a resumed run. It is also used by the [`log`][cli-log] sub-command to query task metadata.

### Task hash

The task hash is computed from the following metadata:

- Session ID (see `workflow.sessionId` in the [workflow][stdlib-namespaces-workflow] namespace)
- Task name (see `name` in [Trace file][trace-report])
- Task container image (if applicable)
- Task [environment modules][process-module] (if applicable)
- Task [Conda environment][process-conda] (if applicable)
- Task [Spack environment][process-spack] and [CPU architecture][process-arch>] (if applicable)
- Task [inputs][process-input]
- Task [script][process-script]
- Any global variables referenced in the task script
- Any task [`ext`][process-ext] properties referenced in the task script
- Any [bundled scripts][bundling-executables] used in the task script
- Whether the task is a [stub run][process-stub]

:::note
Nextflow also includes an incrementing component in the hash generation process, which allows it to iterate through multiple hash values until it finds one that does not match an existing execution directory. This mechanism typically aligns with task retries (i.e., task attempts), however this is not guaranteed.
:::

<ChangedInVersion version="23.09.2-edge">
The [`ext`][process-ext] directive was added to the task hash.
</ChangedInVersion>

Nextflow computes this hash for every task when it is created but before it is executed. If resumability is enabled and there is an entry in the task cache with the same hash, Nextflow tries to recover the previous task execution. A cache hit does not guarantee that the task will be resumed, because it must also recover the task outputs from the [work directory](#work-directory).

Files are hashed differently depending on the caching mode. See the [`cache`][process-cache] directive for more details.

### Task entry

The task entry is a serialized blob of the task metadata required to resume a task, including the fields used by the [Trace file][trace-report] and the task input variables.

### Cache stores

The default cache store uses the `.nextflow/cache` directory, relative to the launch directory (i.e. `workflow.launchDir`), to store the task cache, with a separate subdirectory for each session ID backed by [LevelDB](https://github.com/dain/leveldb).

Due to the limitations of LevelDB, the database for a given session ID can only be accessed by one reader/writer at a time. This means, for example, that you cannot use `nextflow log` to query the task metadata for a pipeline run while it is still running.

<AddedInVersion version="23.07.0-edge" />

The cloud cache is an alternative cache store that uses cloud storage instead of the local cache directory. You can use it by setting the `NXF_CLOUDCACHE_PATH` environment variable to the desired cache path (e.g. `s3://my-bucket/cache`) and providing the necessary credentials.

The cloud cache is particularly useful when launching Nextflow from within the cloud, where the default cache would be lost once the pipeline completes and the VM instance is terminated. Furthermore, because it is backed by cloud storage, it can support multiple readers and writers.

## Work directory

While the [task cache](#task-cache) stores the task metadata for subsequent runs, the work directory stores various files used during a pipeline run.

Each task uses a unique directory based on its hash. When a task is created, Nextflow stages the task input files, script, and other helper files into the task directory. The task writes any output files to this directory during its execution, and Nextflow uses these output files for downstream tasks and/or publishing.

When a previous task is retrieved from the task cache on a resumed run, Nextflow then checks the corresponding task directory in the work directory. If all the required outputs are present and the exit code is valid, then the task is successfully cached; otherwise, the task is re-executed.

For this reason, it is important to preserve both the task cache (`.nextflow/cache`) and work directories in order to resume runs successfully. You can use the [`clean`][cli-clean] command to delete specific runs from the cache.

## Troubleshooting

Cache failures occur when a task that was supposed to be cached was re-executed or a task that was supposed to be re-executed was cached.

Common causes of cache failures include:

- [Resume not enabled](#resume-not-enabled)
- [Cache directive disabled](#cache-directive-disabled)
- [Modified inputs](#modified-inputs)
- [Inconsistent file attributes](#inconsistent-file-attributes)
- [Race condition on a global variable](#race-condition-on-a-global-variable)
- [Non-deterministic process inputs](#non-deterministic-process-inputs)

### Resume not enabled

The `-resume` option is required to resume a pipeline. Ensure you enable `-resume` in your run command or your Nextflow configuration file.

### Cache directive disabled

The `cache` directive is enabled by default. However, you can disable or modify its behavior for a specific process. For example:

```nextflow
process FOO {
cache false
// ...
}
```

Ensure that the `cache` directive has not been disabled. See [cache][process-cache] for more information.

### Modified inputs

Modifying inputs that are used in the task hash invalidates the cache. Common causes of modified inputs include:

- Changing input files
- Resuming from a different session ID
- Changing the process name
- Changing the calling workflow name
- Changing the task container image or Conda environment
- Changing the task script
- Changing a bundled script used by the task

Nextflow calculates a hash for an input file using its full path, last modified timestamp, and file size. If any of these attributes change, Nextflow re-executes the task.

:::warning
If a process modifies its input files, it cannot be resumed. Avoid processes that modify their own input files as this is considered an anti-pattern.
:::

### Inconsistent file attributes

Some shared file systems, such as NFS, may report inconsistent file timestamps, which can invalidate the cache when using the standard caching mode.

To resolve this issue, use the `'lenient'` [caching mode][process-cache] to ignore the last modified timestamp and use only the file path and size.

### Race condition on a global variable

Race conditions can disrupt the caching behavior of your pipeline. For example:

```nextflow
channel.of(1,2,3).map { v -> X=v; X+=2 }.view { v -> "ch1 = $v" }
channel.of(1,2,3).map { v -> X=v; X*=2 }.view { v -> "ch2 = $v" }
```

In the above example, `X` is declared in each `map` closure. Without the `def` keyword, the variable `X` is global to the entire script. Because operators are executed concurrently and `X` is global, there is a *race condition* that causes the emitted values to vary depending on the order of the concurrent operations. If these values were passed to a process as inputs, the process would execute different tasks during each run due to the race condition.

To resolve this issue, avoid declaring global variables in closures:

```nextflow
channel.of(1,2,3).map { v -> def X=v; X+=2 }.view { v -> "ch1 = $v" }
```

<AddedInVersion version="25.04.0">
The [strict syntax][strict-syntax-page] does not allow global variables to be declared in closures.
</AddedInVersion>

### Non-deterministic process inputs

A process that merges inputs from different sources non-deterministically may invalidate the cache. For example:

```nextflow
workflow {
ch_bam = channel.of( ['1', '1.bam'], ['2', '2.bam'] )
ch_bai = channel.of( ['2', '2.bai'], ['1', '1.bai'] )
check_bam_bai(ch_bam, ch_bai)
}

process check_bam_bai {
input:
tuple val(id), file(bam)
tuple val(id), file(bai)

script:
"""
check_bam_bai $bam $bai
"""
}
```

In the above example, the inputs will be merged without matching on `id`, in a similar manner as the [`merge`][operator-merge] operator. As a result, the inputs are incorrect and non-deterministic.

To resolve this issue, use the `join` operator to join the channels into a single input channel before invoking the process:

```nextflow
workflow {
ch_bam = channel.of( ['1', '1.bam'], ['2', '2.bam'] )
ch_bai = channel.of( ['2', '2.bai'], ['1', '1.bai'] )
check_bam_bai(ch_bam.join(ch_bai))
}

process check_bam_bai {
input:
tuple val(id), file(bam), file(bai)

script:
"""
check_bam_bai $bam $bai
"""
}
```

## Tips

### Resuming from a specific run

Nextflow resumes from the previous run by default. If you want to resume from an earlier run, simply specify the session ID for that run with the `-resume` option:

```bash
nextflow run rnaseq-nf -resume 4dc656d2-c410-44c8-bc32-7dd0ea87bebf
```

You can use the [`log`][cli-log] command to view all previous runs as well as the task executions for each run.

### Comparing the hashes of two runs

One way to debug a resumed run is to compare the task hashes of each run using the `-dump-hashes` option.

1. Perform an initial run: `nextflow -log run_initial.log run <pipeline> -dump-hashes`
2. Perform a resumed run: `nextflow -log run_resumed.log run <pipeline> -dump-hashes -resume`
3. Extract the task hash lines from each log (search for `cache hash:`)
4. Compare the runs with a diff viewer

While some manual effort is required, the final diff can often reveal the exact change that caused a task to be re-executed.

<AddedInVersion version="23.10.0" />

When using `-dump-hashes json`, the task hashes can be more easily extracted into a diff. Here is an example Bash script to perform two runs and produce a diff:

```bash
nextflow -log run_1.log run $pipeline -dump-hashes json
nextflow -log run_2.log run $pipeline -dump-hashes json -resume

get_hashes() {
cat $1 \
| grep 'cache hash:' \
| cut -d ' ' -f 10- \
| sort \
| awk '{ print; print ""; }'
}

get_hashes run_1.log > run_1.tasks.log
get_hashes run_2.log > run_2.tasks.log

diff run_1.tasks.log run_2.tasks.log
```

You can then view the `diff` output or use a graphical diff viewer to compare `run_1.tasks.log` and `run_2.tasks.log`.

<AddedInVersion version="25.04.0">
Nextflow now has a built-in way to compare two task runs. See the [Data lineage][data-lineage-page] guide for details.
</AddedInVersion>

[bundling-executables]: /nextflow_docs/nextflow_repo/docs/sharing.mdx#the-bin-directory
[data-lineage-page]: /nextflow_docs/nextflow_repo/docs/tutorials/data-lineage.mdx
[cli-clean]: /nextflow_docs/nextflow_repo/docs/reference/cli.mdx#clean
[cli-log]: /nextflow_docs/nextflow_repo/docs/reference/cli.mdx#log
[operator-join]: /nextflow_docs/nextflow_repo/docs/reference/operator.mdx#join
[process-arch]: /nextflow_docs/nextflow_repo/docs/reference/process.mdx#arch
[process-cache]: /nextflow_docs/nextflow_repo/docs/reference/process.mdx#cache
[process-conda]: /nextflow_docs/nextflow_repo/docs/reference/process.mdx#conda
[process-ext]: /nextflow_docs/nextflow_repo/docs/reference/process.mdx#ext
[process-input]: /nextflow_docs/nextflow_repo/docs/process.mdx#inputs
[operator-merge]: /nextflow_docs/nextflow_repo/docs/reference/operator.mdx#merge
[process-module]: /nextflow_docs/nextflow_repo/docs/reference/process.mdx#module
[process-script]: /nextflow_docs/nextflow_repo/docs/process.mdx#script
[process-spack]: /nextflow_docs/nextflow_repo/docs/reference/process.mdx#spack
[process-stub]: /nextflow_docs/nextflow_repo/docs/process.mdx#stub
[stdlib-namespaces-workflow]: /nextflow_docs/nextflow_repo/docs/reference/stdlib-namespaces.mdx
[strict-syntax-page]: /nextflow_docs/nextflow_repo/docs/strict-syntax.mdx
[trace-report]: /nextflow_docs/nextflow_repo/docs/reports.mdx#trace-file
79 changes: 0 additions & 79 deletions docs/channel.md

This file was deleted.

Loading