From ca4570bfae58feca680fda715ac83b02a9c6aa54 Mon Sep 17 00:00:00 2001 From: Christopher Hakkaart Date: Mon, 23 Jun 2025 10:50:54 +1200 Subject: [PATCH 01/38] Migrate first five pages Signed-off-by: Christopher Hakkaart --- docs/cli.md | 158 +++++++++++++++++++++----------------- docs/developer-env.md | 156 +++++++++++++++++-------------------- docs/install.md | 65 ++++++---------- docs/overview.md | 16 ++-- docs/your-first-script.md | 38 ++++----- 5 files changed, 206 insertions(+), 227 deletions(-) diff --git a/docs/cli.md b/docs/cli.md index 979a1b1507..4b73036468 100644 --- a/docs/cli.md +++ b/docs/cli.md @@ -1,13 +1,11 @@ -(cli-page)= - # Command line Nextflow provides a robust command line interface (CLI) for the management and execution pipelines. -Simply run `nextflow` with no options or `nextflow -h` to see the list of available top-level options and commands. See {ref}`cli-reference` for the full list of subcommands with examples. +Simply run `nextflow` with no options or `nextflow -h` to see the list of available top-level options and commands. See [CLI reference][cli-reference] for the full list of subcommands with examples. :::{note} -Nextflow options use a single dash prefix, e.g. `-resume`. Do not confuse with double dash notation, e.g. `--resume`, which is instead used for {ref}`Pipeline parameters `. +Nextflow options use a single dash prefix, e.g. `-resume`. Do not confuse with double dash notation, e.g. `--resume`, which is instead used for [Pipeline parameters][cli-params]. ::: ## Basic usage @@ -16,74 +14,74 @@ Nextflow options use a single dash prefix, e.g. `-resume`. Do not confuse with d Use the specified configuration file(s) overriding any defaults. -```console -$ nextflow -C my.config COMMAND [arg...] +```bash +nextflow -C my.config COMMAND [arg...] ``` The `-C` option is used to override *all* settings specified in the default config file. For soft override, please refer the `-c` option. - Override **any** default configuration with a custom configuration file: - ```console - $ nextflow -C my.config run nextflow-io/hello + ```bash + nextflow -C my.config run nextflow-io/hello ``` ### JVM properties Set JVM properties. -```console -$ nextflow -Dkey=value COMMAND [arg...] +```bash +nextflow -Dkey=value COMMAND [arg...] ``` This options allows the definition of custom Java system properties that can be used to properly configure or fine tuning the JVM instance used by the Nextflow runtime. -For specifying other JVM level options, please refer to the {ref}`config-env-vars` section. +For specifying other JVM level options, see the [Environment variables][config-env-vars] section. - Add JVM properties to the invoked pipeline: - ```console - $ nextflow -Dfile.encoding=UTF-8 run nextflow-io/hello + ```bash + nextflow -Dfile.encoding=UTF-8 run nextflow-io/hello ``` ### Execution as a background job Execute `nextflow` in the background. -```console -$ nextflow -bg COMMAND [arg...] +```bash +nextflow -bg COMMAND [arg...] ``` The `-bg` option is used to invoke the nextflow execution in the background and allows the user to continue interacting with the terminal. This option is similar to `nohup` in behavior. - Invoke any execution as a background job: - ```console - $ nextflow -bg run nextflow-io/hello + ```bash + nextflow -bg run nextflow-io/hello ``` ### Soft configuration override Add the specified file to configuration set. -```console -$ nextflow -c nxf.config COMMAND [arg...] +```bash +nextflow -c nxf.config COMMAND [arg...] ``` The `-c` option is used to append a new configuration to the default configuration. The `-c` option allows us to update the config in an additive manner. For **hard override**, refer to the `-C` option. - Update *some* fields of the default config for any pipeline: - ```console - $ nextflow -c nxf.config run nextflow-io/hello + ```bash + nextflow -c nxf.config run nextflow-io/hello ``` ### Help Print the help message. -```console -$ nextflow -h +```bash +nextflow -h ``` The `-h` option prints out the overview of the CLI interface and enumerates the top-level *options* and *commands*. @@ -92,71 +90,78 @@ The `-h` option prints out the overview of the CLI interface and enumerates the Sets the path of the nextflow log file. -```console -$ nextflow -log custom.log COMMAND [arg...] +```bash +nextflow -log custom.log COMMAND [arg...] ``` The `-log` option takes a path of the new log file which to be used instead of the default `.nextflow.log` or to save logs files to another directory. - Save all execution logs to the custom `/var/log/nextflow.log` file: - ```console - $ nextflow -log /var/log/nextflow.log run nextflow-io/hello + ```bash + nextflow -log /var/log/nextflow.log run nextflow-io/hello ``` ### Quiet execution Disable the printing of information to the terminal. -```console -$ nextflow -q COMMAND [arg...] +```bash +nextflow -q COMMAND [arg...] ``` The `-q` option suppresses the banner and process-related info, and exits once the execution is completed. Please note that it does not affect any explicit print statement within a pipeline. - Invoke the pipeline execution without the banner and pipeline information: - ```console - $ nextflow -q run nextflow-io/hello + ```bash + nextflow -q run nextflow-io/hello ``` ### Logging to a syslog server Send logs to [Syslog](https://en.wikipedia.org/wiki/Syslog) server endpoint. -```console -$ nextflow -syslog localhost:1234 COMMAND [arg...] +```bash +nextflow -syslog localhost:1234 COMMAND [arg...] ``` The `-syslog` option is used to send logs to a Syslog logging server at the specified endpoint. - Send the logs to a Syslog server at specific endpoint: - ```console - $ nextflow -syslog localhost:1234 run nextflow-io/hello + ```bash + nextflow -syslog localhost:1234 run nextflow-io/hello ``` ### Version Print the Nextflow version information. -```console -$ nextflow -v +```bash +nextflow -v ``` The `-v` option prints out information about Nextflow, such as the version and build. The `-version` option in addition prints out the citation reference and official website. - The short version: + ```bash + nextflow -v + ``` + ```console - $ nextflow -v nextflow version 20.07.1.5412 ``` - The full version info with citation and website link: + ```bash + nextflow -version + ``` + ```console - $ nextflow -version + nextflow -version N E X T F L O W version 20.07.1 build 5412 created 24-07-2020 15:18 UTC (20:48 IDT) @@ -186,34 +191,40 @@ nextflow run http://github.com/nextflow-io/hello If the project is found, it will be automatically downloaded to the Nextflow home directory (`$HOME/.nextflow` by default) and cached for subsequent runs. -:::{note} +:::note You must use the `-hub` option to specify the hosting service if your project is hosted on a service other than GitHub, e.g. `-hub bitbucket`. However, the `-hub` option is not required if you use the project URL. ::: +Try this feature by running the following command: + +```bash +nextflow run nextflow-io/hello +``` + +It will download a trivial example from the repository published at [http://github.com/nextflow-io/hello](http://github.com/nextflow-io/hello) and execute it on your computer. + If the `owner` is omitted, Nextflow will search your cached pipelines for a pipeline that matches the name specified. If no pipeline is found, Nextflow will try to download it using the `organization` name defined by the `NXF_ORG` environment variable (`nextflow-io` by default). -:::{tip} -To access a private repository, specify the access credentials using the `-user` command line option. Then follow the interactive prompts to enter your password. Alternatively, define your private repository access credentials using Git. See {ref}`Git configuration ` for more information. +:::note +To access a private repository, specify the access credentials using the `-user` command line option. Then follow the interactive prompts to enter your password. Alternatively, define your private repository access credentials using Git. See [Git configuration][git-page] for more information. ::: ### Using a specific revision Any Git branch, tag, or commit of a project repository can be used when launching a pipeline by specifying the `-r` option: -```console -$ nextflow run nextflow-io/hello -r mybranch +```bash +nextflow run nextflow-io/hello -r mybranch ``` or -```console -$ nextflow run nextflow-io/hello -r v1.1 +```bash +nextflow run nextflow-io/hello -r v1.1 ``` These commands will execute two different project revisions based on the given Git branch/tag/commit. -(cli-params)= - ### Pipeline parameters Pipeline scripts can define *parameters* that can be overridden on the command line. @@ -238,36 +249,36 @@ params { The above parameter can be specified on the command line as `--alpha`: -```console -$ nextflow run main.nf --alpha Hello +```bash +nextflow run main.nf --alpha Hello ``` -:::{note} +:::note Parameters that are specified on the command line without a value are set to `true`. ::: -:::{note} +:::note Parameters that are specified on the command line in kebab case (e.g., `--alpha-beta`) are automatically converted to camel case (e.g., `--alphaBeta`). Because of this, a parameter defined as `alphaBeta` in the pipeline script can be specified on the command line as `--alphaBeta` or `--alpha-beta`. ::: -:::{warning} +:::note When a command line parameter includes one or more glob characters, i.e. wildcards like `*` or `?`, the parameter value must be enclosed in quotes to prevent Bash expansion and preserve the glob characters. For example: -```console -$ nextflow run --files "*.fasta" +```bash +nextflow run main.nf --files "*.fasta" ``` ::: Parameters specified on the command line can be also specified in a params file using the `-params-file` option. -```console -$ nextflow run main.nf -params-file pipeline_params.yml +```bash +nextflow run main.nf -params-file pipeline_params.yml ``` The `-params-file` option loads parameters for your Nextflow pipeline from a JSON or YAML file. Parameters defined in the file are equivalent to specifying them directly on the command line. For example, instead of specifying parameters on the command line: -```console -$ nextflow run main.nf --alpha 1 --beta two +```bash +nextflow run main.nf --alpha 1 --beta two ``` Parameters can be represented in YAML format: @@ -309,7 +320,7 @@ nextflow list This prints a list similar to the following: -``` +```console cbcrg/ampa-nf cbcrg/piper-nf nextflow-io/hello @@ -320,8 +331,11 @@ nextflow-io/examples By using the `info` command you can show information from a downloaded project. For example: +```bash +nextflow info hello +``` + ```console -$ nextflow info hello project name: nextflow-io/hello repository : http://github.com/nextflow-io/hello local path : $HOME/.nextflow/assets/nextflow-io/hello @@ -339,14 +353,14 @@ Starting from the top it shows: the project name; the Git repository URL; the lo The `pull` command allows you to download a project from a GitHub repository or to update it if that repository has already been downloaded. For example: -```console -$ nextflow pull nextflow-io/hello +```bash +nextflow pull nextflow-io/hello ``` Alternatively, you can use the repository URL as the name of the project to pull: -```console -$ nextflow pull https://github.com/nextflow-io/hello +```bash +nextflow pull https://github.com/nextflow-io/hello ``` Downloaded pipeline projects are stored in your directory `$HOME/.nextflow/assets` directory. @@ -355,8 +369,8 @@ Downloaded pipeline projects are stored in your directory `$HOME/.nextflow/asset The `view` command shows the content of the pipeline script you have pulled. For example: -```console -$ nextflow view nextflow-io/hello +```bash +nextflow view nextflow-io/hello ``` By adding the `-l` option to the example above it will list the content of the repository. @@ -365,8 +379,8 @@ By adding the `-l` option to the example above it will list the content of the r The `clone` command allows you to copy a Nextflow pipeline project to a directory of your choice. For example: -```console -$ nextflow clone nextflow-io/hello target-dir +```bash +nextflow clone nextflow-io/hello target-dir ``` If the destination directory is omitted the specified project is cloned to a directory with the same name as the pipeline base name (e.g. `hello`) in the current directory. @@ -380,3 +394,9 @@ Downloaded pipelines can be deleted by using the `drop` command, as shown below: ```bash nextflow drop nextflow-io/hello ``` + +[cli-params]: /nextflow_docs/nextflow_repo/docs/cli#pipeline-parameters +[cli-reference]: /nextflow_docs/nextflow_repo/docs/reference/cli +[config-env-vars]: /nextflow_docs/nextflow_repo/docs/reference/env-vars#environment-variables +[config-params]: /nextflow_docs/nextflow_repo/docs/config#parameters +[git-page]: /nextflow_docs/nextflow_repo/docs/cli \ No newline at end of file diff --git a/docs/developer-env.md b/docs/developer-env.md index 9a4c0e72cc..114031844c 100644 --- a/docs/developer-env.md +++ b/docs/developer-env.md @@ -1,4 +1,5 @@ -(devenv-page)= +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; # Environment setup @@ -6,32 +7,29 @@ Setting up a Nextflow development environment is a prerequisite for creating, te

Recommended tools

-- {ref}`devenv-vscode`: A versatile code editor that enhances your Nextflow development with features like syntax highlighting and debugging. -- {ref}`devenv-extensions`: The VS Code marketplace offers a variety of extensions to enhance development. The {ref}`Nextflow extension ` is specifically designed to enhance Nextflow development with diagnostics, hover hints, code navigation, code completion, and more. -- {ref}`devenv-docker`: A containerization platform that ensures your Nextflow workflows run consistently across different environments by packaging dependencies into isolated containers. -- {ref}`devenv-git`: A version control system that helps manage and track changes in your Nextflow projects, making collaboration, and code management more efficient. +- [VS Code][devenv-vscode]: A versatile code editor that enhances your Nextflow development with features like syntax highlighting and debugging. +- [Extensions][devenv-extensions]: The VS Code marketplace offers a variety of extensions to enhance development. The [Nextflow extension][devenv-nextflow] is specifically designed to enhance Nextflow development with diagnostics, hover hints, code navigation, code completion, and more. +- [Docker][devenv-docker]: A containerization platform that ensures your Nextflow workflows run consistently across different environments by packaging dependencies into isolated containers. +- [Git][devenv-git]: A version control system that helps manage and track changes in your Nextflow projects, making collaboration, and code management more efficient. The sections below outline the steps for setting up these tools. -:::{note} -Nextflow must be installed separately. See {ref}`install-page` for Nextflow installation instructions. +:::note +Nextflow must be installed separately. See [Installation][install-page] for Nextflow installation instructions. ::: -:::{note} -If you are using a Windows computer, first install and configure the Windows Subsystem for Linux (WSL). See {ref}`devenv-wsl` for installation instructions. +:::note +If you are using a Windows computer, first install and configure the Windows Subsystem for Linux (WSL). See [Windows Subsystem for Linux][devenv-wsl] for installation instructions. ::: -(devenv-vscode)= - ## VS Code An Integrated Development Environment (IDE) provides a user-friendly interface for writing, editing, and managing code. Installing one is an essential step for setting up your environment. Visual Studio Code (VS Code) is a popular lightweight IDE known for its versatility and extensibility. It offers features like syntax highlighting, intelligent code completion, and integrated debugging tools for various programming languages. VS Code supports Windows, macOS, and Linux, and is a good choice for both new and experienced Nextflow developers. -````{tabs} - -```{group-tab} Windows + + To install VS Code on Windows: @@ -39,9 +37,8 @@ To install VS Code on Windows: 1. Download VS Code for Windows. 1. Double-click the installer executable (`.exe`) file and follow the set up steps. -``` - -```{group-tab} macOS + + To install VS Code on macOS: @@ -49,9 +46,8 @@ To install VS Code on macOS: 1. Download VS Code for macOS. 1. Drag the `Visual Studio Code.app` application to the Applications folder to add it to the macOS Launchpad. -``` - -```{group-tab} Linux + + To install VS Code on Linux Debian/Ubuntu distributions: @@ -61,25 +57,20 @@ To install VS Code on Linux Debian/Ubuntu distributions: 1. Navigate to the folder where you downloaded VS Code. 1. Run `sudo apt install ./.deb`, replacing `` with the full file name. - :::{note} + :::note If you're using an older Linux distribution, run `sudo dpkg -i .deb` to install VS Code and `sudo apt-get install -f` to install dependencies. ::: See [Linux installation](https://code.visualstudio.com/docs/setup/linux#_installation) for information about installing VS Code on other distributions. -``` - -```` - -(devenv-extensions)= + + ## Extensions Extensions are a key feature of IDEs and allow you to customize your development environment by adding support for various programming languages, tools, and features. The [VS Code Marketplace](https://marketplace.visualstudio.com/vscode) offers thousands of extensions that can enhance your productivity and tailor the editor to your specific needs. Popular VS Code extensions for Nextflow developers are listed below: -(devenv-nextflow)= - -**Nextflow** +### Nextflow The VS Code [Nextflow extension](https://marketplace.visualstudio.com/items?itemName=nextflow.nextflow) adds Nextflow language support to the editor. The Nextflow extension enhances development with: @@ -92,31 +83,28 @@ The VS Code [Nextflow extension](https://marketplace.visualstudio.com/items?item - Parameter schemas - DAG previews -See {ref}`vscode-page` for more information about the Nextflow extension features and how it enforces the Nextflow syntax. +See [VS Code][vscode-page] for more information about the Nextflow extension features and how it enforces the Nextflow syntax. -**nf-core** +### nf-core The [nf-core extension pack](https://marketplace.visualstudio.com/items?itemName=nf-core.nf-core-extensionpack) adds a selection of tools that help develop with nf-core, a community effort to collect a curated set of analysis pipelines built using Nextflow. The nf-core extension pack includes several useful extensions. For example, [Code Spell Checker](https://marketplace.visualstudio.com/items?itemName=streetsidesoftware.code-spell-checker), [Prettier](https://marketplace.visualstudio.com/items?itemName=esbenp.prettier-vscode), [Todo Tree](https://marketplace.visualstudio.com/items?itemName=Gruntfuggly.todo-tree), and [Markdown Extended](https://marketplace.visualstudio.com/items?itemName=jebbs.markdown-extended). See [nf-core extension pack](https://marketplace.visualstudio.com/items?itemName=nf-core.nf-core-extensionpack) for more information about the tools included in the nf-core extension pack. -(devenv-remote)= - -**Remote development** +### Remote development The [Remote Development extension pack](https://marketplace.visualstudio.com/items?itemName=ms-vscode-remote.vscode-remote-extensionpack) enables you to run WSL, SSH, or a development container for editing and debugging with the full set of VS Code features. The pack includes the [Remote - SSH](https://marketplace.visualstudio.com/items?itemName=ms-vscode-remote.remote-ssh), [Remote - Tunnels](https://marketplace.visualstudio.com/items?itemName=ms-vscode.remote-server), [Dev Containers](https://marketplace.visualstudio.com/items?itemName=ms-vscode-remote.remote-containers), and [WSL](https://marketplace.visualstudio.com/items?itemName=ms-vscode-remote.remote-wsl) extensions. See [Remote Development](https://marketplace.visualstudio.com/items?itemName=ms-vscode-remote.vscode-remote-extensionpack) for more information about the tools included in the remote development extension pack. -:::{note} +:::note The Remote Development extension pack is required if you are developing using remote servers, Windows Subsystem for Linux, or Development Containers. ::: Installing VS Code extensions requires just a few clicks in the Extensions Marketplace. -````{tabs} - -```{group-tab} Windows + + To install a VS Code extension on Windows: @@ -125,9 +113,8 @@ To install a VS Code extension on Windows: 1. Search for the extension. 1. Select **Install**. -``` - -```{group-tab} macOS + + To install a VS Code extension on macOS: @@ -136,9 +123,8 @@ To install a VS Code extension on macOS: 1. Search for the extension. 1. Select **Install**. -``` - -```{group-tab} Linux + + To install a VS Code extension on Linux Debian/Ubuntu distributions: @@ -147,11 +133,8 @@ To install a VS Code extension on Linux Debian/Ubuntu distributions: 1. Search for the extension. 1. Select **Install**. -``` - -```` - -(devenv-docker)= + + ## Docker @@ -159,9 +142,8 @@ Docker is an open-source platform that simplifies application development, deplo Docker Desktop provides a Graphical User Interface (GUI) for managing Docker containers. Installing Docker Desktop is a straightforward process that allows you to create, deploy, and manage applications within containers. -````{tabs} - -```{group-tab} Windows + + To install Docker Desktop on Windows: @@ -170,7 +152,7 @@ To install Docker Desktop on Windows: 1. Double-click Docker Desktop `Installer.exe` to run the installer. By default, Docker Desktop is installed at `C:\Program Files\Docker\Docker`. 1. Depending on your choice of backend, select the **Use WSL 2 instead of Hyper-V** option on the Configuration page. - :::{note} + :::note You won't be able to select which backend to use if your system only supports one of the two options. ::: @@ -179,15 +161,14 @@ To install Docker Desktop on Windows: 1. Start Docker Desktop. 1. Review the Docker Subscription Service Agreement and select **Accept** to continue. - :::{note} + :::note Docker Desktop won't run if you do not agree to the terms. You can choose to accept the terms at a later date by opening Docker Desktop. ::: 1. Docker Desktop starts after you accept the terms. -``` - -```{group-tab} macOS + + To install Docker Desktop on macOS: @@ -198,20 +179,20 @@ To install Docker Desktop on macOS: 1. Double-click **Docker.app** in the **Applications** folder to start Docker. 1. Review the Docker Subscription Service Agreement and select **Accept** to continue. - :::{note} + :::note Docker Desktop won't run if you do not agree to the terms. You can choose to accept the terms at a later date by opening Docker Desktop. ::: 1. From the installation window, select **Use recommended settings (Requires password)**. - :::{note} The **recommended settings** allow Docker Desktop to automatically set the necessary configuration settings. Advanced settings allow you to set the location of the Docker CLI tools either in the system or user directory, enable the default Docker socket, and enable privileged port mapping. See [Settings](https://docs.docker.com/desktop/settings/#advanced), for more information and how to set the location of the Docker CLI tools. + :::note + The **recommended settings** allow Docker Desktop to automatically set the necessary configuration settings. Advanced settings allow you to set the location of the Docker CLI tools either in the system or user directory, enable the default Docker socket, and enable privileged port mapping. See [Settings](https://docs.docker.com/desktop/settings/#advanced), for more information and how to set the location of the Docker CLI tools. ::: 1. Select **Finish**. If you have applied any of the previous configurations that require a password, enter your password to confirm your choice. -``` - -```{group-tab} Linux + + To install Docker Desktop on Linux Debian/Ubuntu distributions: @@ -219,7 +200,7 @@ To install Docker Desktop on Linux Debian/Ubuntu distributions: 1. Download the latest Debian/Ubuntu (`.deb`) distribution. 1. In your terminal, run `sudo apt-get install ./docker-desktop-amd64.deb` - :::{note} + :::note By default, Docker Desktop is installed at `/opt/docker-desktop`. ::: @@ -227,17 +208,14 @@ To install Docker Desktop on Linux Debian/Ubuntu distributions: 1. Review the Docker Subscription Service Agreement and select **Accept** to continue. 1. From the installation window, select **Use recommended settings (Requires password)**. Docker Desktop starts after you accept the terms. - :::{note} + :::note Docker Desktop won't run if you do not agree to the terms. You can choose to accept the terms at a later date by opening Docker Desktop. ::: -``` - -```` - -Nextflow supports multiple container technologies (e.g., Singularity and Podman) so you can choose the one that best fits your needs. See {ref}`container-page` for more information about other supported container engines. + + -(devenv-git)= +Nextflow supports multiple container technologies (e.g., Singularity and Podman) so you can choose the one that best fits your needs. See [Containers][containers-page] for more information about other supported container engines. ## Git @@ -245,9 +223,8 @@ Git provides powerful version control that helps track code changes. Git operate Nextflow seamlessly integrates with Git for source code management providers for managing pipelines as version-controlled Git repositories. -````{tabs} - -```{group-tab} Windows + + Git is already installed on most WSL distributions. You can check if it is already installed by running `git version`. @@ -258,15 +235,14 @@ To install the latest stable Git version on Linux Debian/Ubuntu distributions: See [git-scm documentation](https://git-scm.com/downloads/linux) for more information about installing Git on other Linux distributions. -``` - -```{group-tab} macOS + + To install Git on macOS with [Homebrew](https://docs.brew.sh/): 1. Open a terminal window and run `brew install git`. - :::{note} + :::note You must have Homebrew installed. See [Homebrew installation](https://docs.brew.sh/Installation) for instructions. ::: @@ -279,9 +255,8 @@ To install Git on macOS with [Xcode](https://developer.apple.com/xcode/): 1. Select **Install**. 1. Once complete, open a new terminal window and run `git version` to verify Git was installed. -``` - -```{group-tab} Linux + + Git is already installed on most Linux Debian/Ubuntu distributions. @@ -292,11 +267,8 @@ To install the latest stable Git version on Linux Debian/Ubuntu distributions: See [git-scm documentation](https://git-scm.com/downloads/linux) for more information about installing Git on other Linux distributions. -``` - -```` - -(devenv-wsl)= + + ## Windows Subsystem for Linux @@ -309,15 +281,25 @@ To enable WSL on Windows using Powershell or Windows Command Prompt: 1. Right-click and select **Run as administrator** to use PowerShell or Windows Command Prompt in administrator mode. 1. Run `wsl --install`. - :::{note} + :::note This command will enable the features necessary to run WSL and install the Ubuntu distribution. ::: 1. When prompted, restart Windows. 1. After restarting Windows, open the Ubuntu distribution and create a new Linux **User Name** and **Password** when prompted. - :::{note} + :::note The **User Name** and **Password** is specific to each Linux distribution that you install and has no bearing on your Windows user name. ::: See [Set up a WSL development environment](https://learn.microsoft.com/en-us/windows/wsl/setup/environment) for more about installing WSL. + +[containers-page]: /nextflow_docs/nextflow_repo/docs/containers +[devenv-docker]: /nextflow_docs/nextflow_repo/docs/developer-env#docker +[devenv-extensions]: /nextflow_docs/nextflow_repo/docs/developer-env#extensions +[devenv-git]: /nextflow_docs/nextflow_repo/docs/developer-env#git +[devenv-nextflow]: /nextflow_docs/nextflow_repo/docs/developer-env#nextflow +[devenv-vscode]: /nextflow_docs/nextflow_repo/docs/developer-env#vs-code +[devenv-wsl]: /nextflow_docs/nextflow_repo/docs/developer-env#windows-subsystem-for-linux +[install-page]: /nextflow_docs/nextflow_repo/docs/install +[vscode-page]: /nextflow_docs/nextflow_repo/docs/vscode \ No newline at end of file diff --git a/docs/install.md b/docs/install.md index fa503cf8af..5b77541bea 100644 --- a/docs/install.md +++ b/docs/install.md @@ -1,25 +1,20 @@ -(install-page)= - # Installation Nextflow can be used on any POSIX-compatible system (Linux, macOS, etc), and on Windows through [WSL](https://en.wikipedia.org/wiki/Windows_Subsystem_for_Linux). This page describes how to install Nextflow. -:::{note} -New versions of Nextflow are released regularly. See {ref}`updating-nextflow-page` for more information about Nextflow release cadence, how to update Nextflow, and how select your version of Nextflow. +:::note +New versions of Nextflow are released regularly. See [Updating Nextflow][updating-nextflow] for more information about Nextflow release cadence, how to update Nextflow, and how select your version of Nextflow. ::: -(install-requirements)= - ## Requirements Nextflow requires Bash 3.2 (or later) and [Java 17 (or later, up to 24)](http://www.oracle.com/technetwork/java/javase/downloads/index.html) to be installed. To see which version of Java you have, run the following command: -```{code-block} bash -:class: copyable +```bash java -version ``` -:::{versionchanged} 24.11.0-edge +:::warning{title="24.11.0-edge"} Support for Java versions prior to 17 was dropped. ::: @@ -29,8 +24,7 @@ To install Java with SDKMAN: 1. [Install SDKMAN](https://sdkman.io/install): - ```{code-block} bash - :class: copyable + ``` curl -s https://get.sdkman.io | bash ``` @@ -38,20 +32,16 @@ To install Java with SDKMAN: 3. Install Java: - ```{code-block} bash - :class: copyable + ``` sdk install java 17.0.10-tem ``` 4. Confirm that Java is installed correctly: - ```{code-block} bash - :class: copyable + ``` java -version ``` -(install-nextflow)= - ## Install Nextflow Nextflow is distributed as an easy to use self-installing package. It is also distributed via Conda and as a standalone distribution. @@ -64,42 +54,38 @@ To install Nextflow with the self-installing package: 1. Download Nextflow: - ```{code-block} bash - :class: copyable + ```bash curl -s https://get.nextflow.io | bash ``` - :::{tip} + :::tip Set `export CAPSULE_LOG=none` to make the installation logs less verbose. ::: 2. Make Nextflow executable: - ```{code-block} bash - :class: copyable + ```bash chmod +x nextflow ``` 3. Move Nextflow into an executable path. For example: - ```{code-block} bash - :class: copyable + ```bash mkdir -p $HOME/.local/bin/ mv nextflow $HOME/.local/bin/ ``` - :::{tip} + :::tip Ensure the directory `$HOME/.local/bin/` is included in your `PATH` variable. Temporarily add this directory to `PATH` by setting `export PATH="$PATH:$HOME/.local/bin"`. Add the directory to `PATH` permanently by adding the export command to your shell configuration file, such as `~/.bashrc` or `~/.zshrc`. Alternatively, move the `nextflow` executable to a directory already in your `PATH`. ::: - :::{warning} + :::warning Nextflow updates its executable during the self-install process, therefore the update can fail if the executable is placed in a directory with restricted permissions. ::: 4. Confirm Nextflow is installed correctly: - ```{code-block} bash - :class: copyable + ```bash nextflow info ``` @@ -109,31 +95,26 @@ To install Nextflow with Conda: 1. Create an environment with Nextflow: - ```{code-block} bash - :class: copyable + ```bash conda create --name nf-env bioconda::nextflow ``` 2. Activate the environment: - ```{code-block} bash - :class: copyable + ```bash source activate nf_env ``` 3. Confirm Nextflow is installed correctly: - ```{code-block} bash - :class: copyable + ```bash nextflow info ``` -:::{warning} +:::warning Installing Nextflow via Conda may lead to outdated versions, dependency conflicts, and Java compatibility issues. Using the self-installing package is recommended for a more reliable and up-to-date installation. ::: -(install-standalone)= - ### Standalone distribution The Nextflow standalone distribution (i.e., the `dist` release) is a self-contained `nextflow` executable that can run without needing to download core dependencies at runtime. This distribution is useful for offline environments as well as building and testing Nextflow locally. @@ -144,19 +125,17 @@ To use the standalone distribution: 2. Grant execution permissions to the downloaded file. For example: - ```{code-block} bash - :class: copyable + ```bash chmod +x nextflow-24.10.1-dist ``` 3. Use it as a drop-in replacement for `nextflow` command. For example: - ```{code-block} bash - :class: copyable + ```bash ./nextflow-24.10.1-dist run info ``` -:::{note} +:::note The standalone distribution will still download core and third-party plugins as needed at runtime. ::: @@ -173,3 +152,5 @@ Launching from Seqera Platform provides you with: Seqera Cloud Basic is free for small teams. Researchers at qualifying academic institutions can apply for free access to Seqera Cloud Pro. See the [Seqera Platform documentation](https://docs.seqera.io/platform) for tutorials to get started. + +[updating-nextflow]: /nextflow_docs/nextflow_repo/docs/upating-nextflow.md \ No newline at end of file diff --git a/docs/overview.md b/docs/overview.md index 30181d04b6..859028b80e 100644 --- a/docs/overview.md +++ b/docs/overview.md @@ -1,5 +1,3 @@ -(overview-page)= - # Overview ## Why Nextflow? @@ -65,7 +63,7 @@ The above example defines two processes. Their execution order is not determined When the workflow is started, it will create two processes and one channel (`query_ch`) and it will link all of them. Both processes will be started at the same time and they will listen to their respective input channels. Whenever `blast_search` emits a value, `extract_top_hits` will receive it (i.e. `extract_top_hits` consumes the channel in a *reactive* way). -Read the {ref}`Channel ` and {ref}`Process ` sections to learn more about these features. +Read the [Channels][channels-page] and [Processes][process-page] sections to learn more about these features. ## Execution abstraction @@ -93,13 +91,13 @@ The following cloud platforms are supported: - [Google Cloud Platform (GCP)](https://cloud.google.com/) - [Kubernetes](https://kubernetes.io/) -Read the {ref}`executor-page` to learn more about the Nextflow executors. +See [Executors][executors-page] to learn more. ## Scripting language Nextflow is a workflow language based on [Java](https://en.wikipedia.org/wiki/Java_(programming_language)) and [Groovy](https://groovy-lang.org/). It is designed to simplify writing scalable and reproducible pipelines. In most cases, users can leverage their existing programming skills to develop Nextflow pipelines without the steep learning curve that usually comes with a new programming language. -See {ref}`script-page` for more information about the Nextflow scripting language. +See [Scripts][scripts-page] for more information about the Nextflow scripting language. ## Configuration options @@ -116,4 +114,10 @@ process { } ``` -Read the {ref}`config-page` section to learn more about the Nextflow configuration file and settings. +See [Configuration][configuration-page] to learn more about the Nextflow configuration file and settings. + +[channels-page]: /nextflow_docs/nextflow_repo/docs/channel.md +[process-page]: /nextflow_docs/nextflow_repo/docs/process.md +[executors-page]: /nextflow_docs/nextflow_repo/docs/executor.md +[scripts-page]: /nextflow_docs/nextflow_repo/docs/script.md +[configuration-page]: /nextflow_docs/nextflow_repo/docs/config.md \ No newline at end of file diff --git a/docs/your-first-script.md b/docs/your-first-script.md index ffa8c266f1..039c45cf4a 100644 --- a/docs/your-first-script.md +++ b/docs/your-first-script.md @@ -1,5 +1,3 @@ -(your-first-script)= - # Your first script This guide details fundamental skills to run a basic Nextflow pipeline. It includes: @@ -12,14 +10,13 @@ This guide details fundamental skills to run a basic Nextflow pipeline. It inclu You will need the following to get started: -- Nextflow: See {ref}`install-page` for installation instructions. +- Nextflow: See [Installation][install-page] for installation instructions. ## Run a pipeline You will run a basic Nextflow pipeline that splits a string of text into two files and then converts lowercase letters to uppercase letters. You can see the pipeline here: -```{code-block} groovy -:class: copyable +```groovy // Default parameter input params.str = "Hello world!" @@ -79,8 +76,7 @@ To run your pipeline: 2. Copy and save the above pipeline to your new file 3. Run your pipeline using the following command: - ```{code-block} - :class: copyable + ```bash nextflow run main.nf ``` @@ -98,11 +94,10 @@ executor > local (3) Nextflow creates a `work` directory to store files used during a pipeline run. Each execution of a process is run as a separate task. The `split` process is run as one task and the `convert_to_upper` process is run as two tasks. The hexadecimal string, for example, `82/457482`, is the beginning of a unique hash. It is a prefix used to identify the task directory where the script was executed. -:::{tip} +:::tip Run your pipeline with `-ansi-log false` to see each task printed on a separate line: -```{code-block} bash -:class: copyable +```bash nextflow run main.nf -ansi-log false ``` @@ -118,8 +113,6 @@ Launching `main.nf` [peaceful_watson] DSL2 - revision: 13a41a8946 ::: -(getstarted-resume)= - ## Modify and resume Nextflow tracks task executions in a task cache, a key-value store of previously executed tasks. The task cache is used in conjunction with the work directory to recover cached tasks. If you modify and resume your pipeline, only the processes that are changed will be re-executed. The cached results will be used for tasks that don't change. @@ -129,8 +122,7 @@ You can enable resumability using the `-resume` flag when running a pipeline. To 1. Open `main.nf` 2. Replace the `convert_to_upper` process with the following: - ```{code-block} groovy - :class: copyable + ```groovy process convert_to_upper { publishDir "results/upper" tag "$y" @@ -151,8 +143,7 @@ You can enable resumability using the `-resume` flag when running a pipeline. To 3. Save your changes 4. Run your updated pipeline using the following command: - ```{code-block} bash - :class: copyable + ```bash nextflow run main.nf -resume ``` @@ -170,9 +161,7 @@ executor > local (2) Nextflow skips the execution of the `split` process and retrieves the results from the cache. The `convert_to_upper` process is executed twice. -See {ref}`cache-resume-page` for more information about Nextflow cache and resume functionality. - -(getstarted-params)= +See [Caching and resuming][cache-resume-page] for more information about Nextflow cache and resume functionality. ## Pipeline parameters @@ -182,8 +171,7 @@ You can configure the `str` parameter in your pipeline. To modify your `str` par 1. Run your pipeline using the following command: - ```{code-block} bash - :class: copyable + ```bash nextflow run main.nf --str 'Bonjour le monde' ``` @@ -195,14 +183,18 @@ You will see output similar to the following: Launching `main.nf` [distracted_kalam] DSL2 - revision: 082867d4d6 executor > local (4) -[55/a3a700] process > split (1) [100%] 1 of 1 ✔ +[55/a3a700] process > split (1) [100%] 1 of 1 ✔ [f4/af5ddd] process > convert_to_upper (chunk_ac) [100%] 3 of 3 ✔ ``` The input string is now longer and the `split` process splits it into three chunks. The `convert_to_upper` process is run three times. -See {ref}`cli-params` for more information about modifying pipeline parameters. +See [Pipeline parameters][cli-params] for more information about modifying pipeline parameters.

Next steps

Your first script is a brief introduction to running pipelines, modifying and resuming pipelines, and pipeline parameters. See [training.nextflow.io](https://training.nextflow.io/) for further Nextflow training modules. + +[cache-resume-page]: /nextflow_docs/nextflow_repo/docs/cache-and-resume.md +[cli-params]: /nextflow_docs/nextflow_repo/docs/cli.md#pipeline-parameters +[install-page]: /nextflow_docs/nextflow_repo/docs/install \ No newline at end of file From 3af1dfcd77db5cae075d2f10aa4034c87962d8d7 Mon Sep 17 00:00:00 2001 From: Christopher Hakkaart Date: Mon, 23 Jun 2025 12:38:01 +1200 Subject: [PATCH 02/38] Push changes Signed-off-by: Christopher Hakkaart --- docs/config.md | 17 +++++++---------- docs/reference/env-vars.md | 2 -- 2 files changed, 7 insertions(+), 12 deletions(-) diff --git a/docs/config.md b/docs/config.md index b10730718b..f6cb7e3a2c 100644 --- a/docs/config.md +++ b/docs/config.md @@ -1,17 +1,15 @@ -(config-page)= - # Configuration ## Configuration file When a pipeline script is launched, Nextflow looks for configuration files in multiple locations. Since each configuration file may contain conflicting settings, they are applied in the following order (from lowest to highest priority): -1. The config file `$HOME/.nextflow/config` (or `$NXF_HOME/config` when {ref}`NXF_HOME ` is set). +1. The config file `$HOME/.nextflow/config` (or `$NXF_HOME/config` when [NXF_HOME][nxf-env-vars]{ref}`NXF_HOME ` is set). 2. The config file `nextflow.config` in the project directory 3. The config file `nextflow.config` in the launch directory 4. Config files specified using the `-c ` option -:::{tip} +:::tip You can alternatively use the `-C ` option to specify a fixed set of configuration files and ignore all other files. ::: @@ -89,7 +87,7 @@ includeConfig 'path/extra.config' Relative paths are resolved against the location of the including file. -:::{note} +:::note Config includes can also be specified within config blocks. However, config files should only be included at the top level or in a [profile](#config-profiles) so that the included config file is valid on its own and in the context in which it is included. ::: @@ -135,8 +133,6 @@ params { See {ref}`cli-params` for information about how to specify pipeline parameters. -(config-process)= - ## Process configuration The `process` scope allows you to specify {ref}`process directives ` separately from the pipeline code. @@ -245,13 +241,12 @@ process { ``` With the above configuration: + - All processes will use 4 cpus (unless otherwise specified in their process definition). - Processes annotated with the `hello` label will use 8 cpus. - Any process named `bye` (or imported as `bye`) will use 16 cpus. - Any process named `bye` (or imported as `bye`) invoked by a workflow named `mysub` will use 32 cpus. -(config-profiles)= - ## Config profiles Configuration files can define one or more *profiles*. A profile is a set of configuration settings that can be selected during pipeline execution using the `-profile` command line option. @@ -283,7 +278,7 @@ The above configuration defines three profiles: `standard`, `cluster`, and `clou Configuration profiles can be specified at runtime as a comma-separated list: ```bash -nextflow run -profile standard,cloud +nextflow run main.nf -profile standard,cloud ``` Config profiles are applied in the order in which they were defined in the config file, regardless of the order they are specified on the command line. @@ -335,3 +330,5 @@ workflow.onError = { ``` See {ref}`workflow-handlers` for more information. + +[nxf-env-vars]: /nextflow_docs/nextflow_repo/docs/reference/env-vars#nextflow-settings \ No newline at end of file diff --git a/docs/reference/env-vars.md b/docs/reference/env-vars.md index 999d7787ad..90471902fb 100644 --- a/docs/reference/env-vars.md +++ b/docs/reference/env-vars.md @@ -12,8 +12,6 @@ The following environment variables control the configuration of the Nextflow ru `JAVA_HOME` : Defines the path location of the Java VM installation used to run Nextflow. -(nxf-env-vars)= - ## Nextflow settings `NXF_ANSI_LOG` From 028fb644534f8ec492213a3894430b478444644d Mon Sep 17 00:00:00 2001 From: Christopher Hakkaart Date: Wed, 25 Jun 2025 11:03:11 +1200 Subject: [PATCH 03/38] Migrating reports testing Signed-off-by: Christopher Hakkaart --- docs/cache-and-resume.md | 80 +-- docs/{cli.md => cli.mdx} | 25 +- docs/{config.md => config.mdx} | 87 +-- docs/{developer-env.md => developer-env.mdx} | 6 +- docs/executor.md | 318 +++++------ docs/{index.md => nextflow.md} | 136 +---- docs/{plugins.md => plugins.mdx} | 0 docs/reference/{env-vars.md => env-vars.mdx} | 0 .../{feature-flags.md => feature-flags.mdx} | 0 docs/reference/{operator.md => operator.mdx} | 0 docs/reference/{process.md => process.mdx} | 0 ...ib-namespaces.md => stdlib-namespaces.mdx} | 0 .../{stdlib-types.md => stdlib-types.mdx} | 0 docs/{reports.md => reports.mdx} | 528 +++++++++++------- docs/your-first-script.md | 2 +- 15 files changed, 609 insertions(+), 573 deletions(-) rename docs/{cli.md => cli.mdx} (92%) rename docs/{config.md => config.mdx} (77%) rename docs/{developer-env.md => developer-env.mdx} (99%) rename docs/{index.md => nextflow.md} (66%) rename docs/{plugins.md => plugins.mdx} (100%) rename docs/reference/{env-vars.md => env-vars.mdx} (100%) rename docs/reference/{feature-flags.md => feature-flags.mdx} (100%) rename docs/reference/{operator.md => operator.mdx} (100%) rename docs/reference/{process.md => process.mdx} (100%) rename docs/reference/{stdlib-namespaces.md => stdlib-namespaces.mdx} (100%) rename docs/reference/{stdlib-types.md => stdlib-types.mdx} (100%) rename docs/{reports.md => reports.mdx} (51%) diff --git a/docs/cache-and-resume.md b/docs/cache-and-resume.md index 2fb73cc395..5e19b56496 100644 --- a/docs/cache-and-resume.md +++ b/docs/cache-and-resume.md @@ -1,5 +1,3 @@ -(cache-resume-page)= - # Caching and resuming One of the core features of Nextflow is the ability to cache task executions and re-use them in subsequent runs to minimize duplicate work. Resumability is useful both for recovering from errors and for iteratively developing a pipeline. It is similar to [checkpointing](https://en.wikipedia.org/wiki/Application_checkpointing), a common practice used by HPC applications. @@ -10,42 +8,40 @@ You can enable resumability in Nextflow with the `-resume` flag when launching a All task executions are automatically saved to the task cache, regardless of the `-resume` option (so that you always have the option to resume later). The task cache is a key-value store, where each key-value pair corresponds to a previously-executed task. -The task cache is used in conjunction with the [work directory](#work-directory) to recover cached tasks in a resumed run. It is also used by the {ref}`cli-log` sub-command to query task metadata. - -(cache-resume-task-hash)= +The task cache is used in conjunction with the [work directory](#work-directory) to recover cached tasks in a resumed run. It is also used by the [`log`][cli-log] sub-command to query task metadata. ### Task hash The task hash is computed from the following metadata: -- Session ID (see `workflow.sessionId` in the {ref}`stdlib-namespaces-workflow` namespace) -- Task name (see `name` in {ref}`trace-report`) +- Session ID (see `workflow.sessionId` in the [`workflow`][stdlib-namespaces-workflow] namespace) +- Task name (see `name` in [Trace file][trace-report]) - Task container image (if applicable) -- Task {ref}`environment modules ` (if applicable) -- Task {ref}`Conda environment ` (if applicable) -- Task {ref}`Spack environment ` and {ref}`CPU architecture ` (if applicable) -- Task {ref}`inputs ` -- Task {ref}`script ` +- Task [environment modules][process-module] (if applicable) +- Task [Conda environment][process-conda] (if applicable) +- Task [Spack environment][process-spack] and [CPU architecture][process-arch>] (if applicable) +- Task [inputs][process-input] +- Task [script][process-script] - Any global variables referenced in the task script -- Any task {ref}`process-ext` properties referenced in the task script -- Any {ref}`bundled scripts ` used in the task script -- Whether the task is a {ref}`stub run ` +- Any task [`ext`][process-ext] properties referenced in the task script +- Any [bundled scripts][bundling-executables] used in the task script +- Whether the task is a [stub run][process-stub] -:::{note} +:::note Nextflow also includes an incrementing component in the hash generation process, which allows it to iterate through multiple hash values until it finds one that does not match an existing execution directory. This mechanism typically usually aligns with task retries (i.e., task attempts), however this is not guaranteed. ::: -:::{versionchanged} 23.09.2-edge -The {ref}`process-ext` directive was added to the task hash. +:::note{title="Version changed 23.09.2-edge"} +The [`ext`][process-ext] directive was added to the task hash. ::: Nextflow computes this hash for every task when it is created but before it is executed. If resumability is enabled and there is an entry in the task cache with the same hash, Nextflow tries to recover the previous task execution. A cache hit does not guarantee that the task will be resumed, because it must also recover the task outputs from the [work directory](#work-directory). -Files are hashed differently depending on the caching mode. See the {ref}`process-cache` directive for more details. +Files are hashed differently depending on the caching mode. See the [`cache`][process-cache] directive for more details. ### Task entry -The task entry is a serialized blob of the task metadata required to resume a task, including the fields used by the {ref}`trace-report` and the task input variables. +The task entry is a serialized blob of the task metadata required to resume a task, including the fields used by the [Trace file][trace-report] and the task input variables. ### Cache stores @@ -53,7 +49,7 @@ The default cache store uses the `.nextflow/cache` directory, relative to the la Due to the limitations of LevelDB, the database for a given session ID can only be accessed by one reader/writer at a time. This means, for example, that you cannot use `nextflow log` to query the task metadata for a pipeline run while it is still running. -:::{versionadded} 23.07.0-edge +:::note{title="Version added 23.07.0-edge"} ::: The cloud cache is an alternative cache store that uses cloud storage instead of the local cache directory. You can use it by setting the `NXF_CLOUDCACHE_PATH` environment variable to the desired cache path (e.g. `s3://my-bucket/cache`) and providing the necessary credentials. @@ -68,7 +64,7 @@ Each task uses a unique directory based on its hash. When a task is created, Nex When a previous task is retrieved from the task cache on a resumed run, Nextflow then checks the corresponding task directory in the work directory. If all the required outputs are present and the exit code is valid, then the task is successfully cached; otherwise, the task is re-executed. -For this reason, it is important to preserve both the task cache (`.nextflow/cache`) and work directories in order to resume runs successfully. You can use the {ref}`cli-clean` command to delete specific runs from the cache. +For this reason, it is important to preserve both the task cache (`.nextflow/cache`) and work directories in order to resume runs successfully. You can use the [`clean`][cli-clean] command to delete specific runs from the cache. ## Troubleshooting @@ -77,7 +73,7 @@ Cache failures happen when either (1) a task that was supposed to be cached was When this happens, consider the following questions: - Is resume enabled via `-resume`? -- Is the {ref}`process-cache` directive set to a non-default value? +- Is the [`cache`][process-cache] directive set to a non-default value? - Is the task still present in the task cache and work directory? - Were any of the task inputs changed? @@ -91,7 +87,7 @@ Changing any of the inputs included in the [task hash](#task-hash) will invalida While the following examples would not invalidate the cache: -- Changing the value of a directive (other than {ref}`process-ext`), even if that directive is used in the task script +- Changing the value of a directive (other than [`ext`][process-ext], even if that directive is used in the task script In many cases, cache failures happen because of a change to the pipeline script or configuration, or because the pipeline itself has some non-deterministic behavior. @@ -107,9 +103,7 @@ If a process modifies its own input files, it cannot be resumed for the reasons ### Inconsistent file attributes -Some shared file systems, such as NFS, may report inconsistent file timestamps, which can invalidate the cache. If you encounter this problem, you can avoid it by using the `'lenient'` {ref}`caching mode `, which ignores the last modified timestamp and uses only the file path and size. - -(cache-global-var-race-condition)= +Some shared file systems, such as NFS, may report inconsistent file timestamps, which can invalidate the cache. If you encounter this problem, you can avoid it by using the `'lenient'` [caching mode][process-cache], which ignores the last modified timestamp and uses only the file path and size. ### Race condition on a global variable @@ -136,8 +130,6 @@ channel.of(1,2,3) | map { v -> def X=v; X+=2 } | view { v -> "ch1 = $v" } channel.of(1,2,3) | map { v -> v * 2 } | view { v -> "ch2 = $v" } ``` -(cache-nondeterministic-inputs)= - ### Non-deterministic process inputs Sometimes a process needs to merge inputs from different sources. Consider the following example: @@ -161,7 +153,7 @@ process check_bam_bai { } ``` -It is tempting to assume that the process inputs will be matched by `id` like the {ref}`operator-join` operator. But in reality, they are simply merged like the {ref}`operator-merge` operator. As a result, not only will the process inputs be incorrect, they will also be non-deterministic, thus invalidating the cache. +It is tempting to assume that the process inputs will be matched by `id` like the [`join`][operator-join] operator. But in reality, they are simply merged like the [`merge`][operator-merge] operator. As a result, not only will the process inputs be incorrect, they will also be non-deterministic, thus invalidating the cache. The solution is to explicitly join the two channels before the process invocation: @@ -193,9 +185,7 @@ Nextflow resumes from the previous run by default. If you want to resume from an nextflow run rnaseq-nf -resume 4dc656d2-c410-44c8-bc32-7dd0ea87bebf ``` -You can use the {ref}`cli-log` command to view all previous runs as well as the task executions for each run. - -(cache-compare-hashes)= +You can use the [`log`][cli-log] command to view all previous runs as well as the task executions for each run. ### Comparing the hashes of two runs @@ -208,7 +198,7 @@ One way to debug a resumed run is to compare the task hashes of each run using t While some manual effort is required, the final diff can often reveal the exact change that caused a task to be re-executed. -:::{versionadded} 23.10.0 +:::note{title="Version added 23.10.0"} ::: When using `-dump-hashes json`, the task hashes can be more easily extracted into a diff. Here is an example Bash script to perform two runs and produce a diff: @@ -233,6 +223,24 @@ diff run_1.tasks.log run_2.tasks.log You can then view the `diff` output or use a graphical diff viewer to compare `run_1.tasks.log` and `run_2.tasks.log`. -:::{versionadded} 25.04.0 -Nextflow now has a built-in way to compare two task runs. See the {ref}`data-lineage-page` guide for details. +:::note{title="Version added 25.04.0"} +Nextflow now has a built-in way to compare two task runs. See the [Data lineage][data-lineage-page] guide for details. ::: + +[bundling-executables]: /nextflow_docs/nextflow_repo/docs/sharing#the-bin-directory +[data-lineage-page]: /nextflow_docs/nextflow_repo/docs/data-lineage +[cli-clean]: /nextflow_docs/nextflow_repo/docs/reference/cli#clean +[cli-log]: /nextflow_docs/nextflow_repo/docs/reference/cli#log +[operator-join]: /nextflow_docs/nextflow_repo/docs/reference/operator#join +[process-arch]: /nextflow_docs/nextflow_repo/docs/reference/process#arch +[process-cache]: /nextflow_docs/nextflow_repo/docs/reference/process#cache +[process-conda]: /nextflow_docs/nextflow_repo/docs/reference/process#conda +[process-ext]: /nextflow_docs/nextflow_repo/docs/reference/process#ext +[process-input]: /nextflow_docs/nextflow_repo/docs/process#inputs +[operator-merge]: /nextflow_docs/nextflow_repo/docs/reference/operator#merge +[process-module]: /nextflow_docs/nextflow_repo/docs/reference/process#module +[process-script]: /nextflow_docs/nextflow_repo/docs/process#script +[process-spack]: /nextflow_docs/nextflow_repo/docs/reference/process#spack +[process-stub]: /nextflow_docs/nextflow_repo/docs/process#stub +[stdlib-namespaces-workflow]: /nextflow_docs/nextflow_repo/docs/reference/stdlib-namespaces#namespaces +[trace-report]: /nextflow_docs/nextflow_repo/docs/reports#trace-file \ No newline at end of file diff --git a/docs/cli.md b/docs/cli.mdx similarity index 92% rename from docs/cli.md rename to docs/cli.mdx index 4b73036468..42ce9b05c0 100644 --- a/docs/cli.md +++ b/docs/cli.mdx @@ -4,8 +4,8 @@ Nextflow provides a robust command line interface (CLI) for the management and e Simply run `nextflow` with no options or `nextflow -h` to see the list of available top-level options and commands. See [CLI reference][cli-reference] for the full list of subcommands with examples. -:::{note} -Nextflow options use a single dash prefix, e.g. `-resume`. Do not confuse with double dash notation, e.g. `--resume`, which is instead used for [Pipeline parameters][cli-params]. +:::note +Nextflow options use a single dash prefix, for example, `-resume`. Do not confuse with double dash notation, for example, `--resume`, which is instead used for [Pipeline parameters][cli-params]. ::: ## Basic usage @@ -144,24 +144,27 @@ nextflow -v The `-v` option prints out information about Nextflow, such as the version and build. The `-version` option in addition prints out the citation reference and official website. -- The short version: +- The `-v` option prints out information about Nextflow, such as the version and build.: ```bash nextflow -v ``` + This prints an output similar to the following: + ```console nextflow version 20.07.1.5412 ``` -- The full version info with citation and website link: +- The `-version` option in addition prints out the citation reference and official website.: ```bash nextflow -version ``` + This prints an output similar to the following: + ```console - nextflow -version N E X T F L O W version 20.07.1 build 5412 created 24-07-2020 15:18 UTC (20:48 IDT) @@ -171,7 +174,7 @@ The `-v` option prints out information about Nextflow, such as the version and b ## Running pipelines -The main purpose of the Nextflow CLI is to run Nextflow pipelines with the `run` command. Nextflow can execute a local script (e.g. `./main.nf`) or a remote project (e.g. `github.com/nextflow-io/hello`). +The main purpose of the Nextflow CLI is to run Nextflow pipelines with the `run` command. Nextflow can execute a local script (e.g., `./main.nf`) or a remote project (e.g., `github.com/nextflow-io/hello`). ### Launching a remote project @@ -192,7 +195,7 @@ nextflow run http://github.com/nextflow-io/hello If the project is found, it will be automatically downloaded to the Nextflow home directory (`$HOME/.nextflow` by default) and cached for subsequent runs. :::note -You must use the `-hub` option to specify the hosting service if your project is hosted on a service other than GitHub, e.g. `-hub bitbucket`. However, the `-hub` option is not required if you use the project URL. +You must use the `-hub` option to specify the hosting service if your project is hosted on a service other than GitHub, e.g., `-hub bitbucket`. However, the `-hub` option is not required if you use the project URL. ::: Try this feature by running the following command: @@ -299,8 +302,8 @@ Or in JSON format: Parameters are applied in the following order (from lowest to highest priority): -1. Parameters defined in pipeline scripts (e.g. `main.nf`) -2. Parameters defined in {ref}`config files ` +1. Parameters defined in pipeline scripts (e.g., `main.nf`) +2. Parameters defined in [config files][config-params] 3. Parameters specified in a params file (`-params-file`) 4. Parameters specified on the command line (`--something value`) @@ -335,6 +338,8 @@ By using the `info` command you can show information from a downloaded project. nextflow info hello ``` +This prints an output similar to the following: + ```console project name: nextflow-io/hello repository : http://github.com/nextflow-io/hello @@ -383,7 +388,7 @@ The `clone` command allows you to copy a Nextflow pipeline project to a director nextflow clone nextflow-io/hello target-dir ``` -If the destination directory is omitted the specified project is cloned to a directory with the same name as the pipeline base name (e.g. `hello`) in the current directory. +If the destination directory is omitted the specified project is cloned to a directory with the same name as the pipeline base name (e.g., `hello`) in the current directory. The `clone` command can be used to inspect or modify the source code of a pipeline project. You can eventually commit and push back your changes by using the usual Git/GitHub workflow. diff --git a/docs/config.md b/docs/config.mdx similarity index 77% rename from docs/config.md rename to docs/config.mdx index f6cb7e3a2c..f789c64b02 100644 --- a/docs/config.md +++ b/docs/config.mdx @@ -1,10 +1,12 @@ +import DefinitionList, { DefinitionTerm, DefinitionDescription } from '@site/src/components/DefinitionList'; + # Configuration ## Configuration file When a pipeline script is launched, Nextflow looks for configuration files in multiple locations. Since each configuration file may contain conflicting settings, they are applied in the following order (from lowest to highest priority): -1. The config file `$HOME/.nextflow/config` (or `$NXF_HOME/config` when [NXF_HOME][nxf-env-vars]{ref}`NXF_HOME ` is set). +1. The config file `$HOME/.nextflow/config` (or `$NXF_HOME/config` when [NXF_HOME][nxf-env-vars] is set). 2. The config file `nextflow.config` in the project directory 3. The config file `nextflow.config` in the launch directory 4. Config files specified using the `-c ` option @@ -13,15 +15,13 @@ When a pipeline script is launched, Nextflow looks for configuration files in mu You can alternatively use the `-C ` option to specify a fixed set of configuration files and ignore all other files. ::: -(config-syntax)= - ## Syntax The Nextflow configuration syntax is based on the Nextflow script syntax. It is designed for setting configuration options in a declarative manner while also allowing for dynamic expressions where appropriate. A Nextflow config file may consist of any number of *assignments*, *blocks*, and *includes*. Config files may also contain comments in the same manner as scripts. -See {ref}`syntax-page` for more information about the Nextflow script syntax. +See [syntax-page] for more information about the Nextflow script syntax. ### Assignments @@ -33,7 +33,7 @@ docker.enabled = true process.maxErrors = 10 ``` -A config option consists of an *option name* prefixed by any number of *scopes* separated by dots. Config scopes are used to group related config options. See {ref}`config-options` for the full set of config options. +A config option consists of an *option name* prefixed by any number of *scopes* separated by dots. Config scopes are used to group related config options. See [Configuration options][config-options] for the full set of config options. The expression is typically a literal value such as a number, boolean, or string. However, any expression can be used: @@ -95,27 +95,44 @@ Config includes can also be specified within config blocks. However, config file The following constants are globally available in a Nextflow configuration file: -`baseDir: Path` -: :::{deprecated} 20.04.0 - ::: -: Alias for `projectDir`. + + + `baseDir: Path` + + :::warning{title="Depreciated since version 20.04.0"} + ::: + Alias for `projectDir`. + -`launchDir: Path` -: The directory where the workflow was launched. + `launchDir: Path` + + The directory where the workflow was launched. + -`projectDir: Path` -: The directory where the main script is located. + `projectDir: Path` + + The directory where the main script is located. + + + `workDir: Path` + + The directory where the workflow work directory is located. + + + ## Functions The following functions are globally available in a Nextflow configuration file: -`env( name: String ) -> String` -: :::{versionadded} 24.11.0-edge - ::: -: Get the value of the environment variable with the specified name in the Nextflow launch environment. + + + `env( name: String ) -> String` + + Get the value of the environment variable with the specified name in the Nextflow launch environment. + -(config-params)= + ## Parameters @@ -131,11 +148,11 @@ params { } ``` -See {ref}`cli-params` for information about how to specify pipeline parameters. +See [Pipeline parameters][cli-params] for information about how to specify pipeline parameters. ## Process configuration -The `process` scope allows you to specify {ref}`process directives ` separately from the pipeline code. +The `process` scope allows you to specify [process directives][process-reference] separately from the pipeline code. For example: @@ -149,11 +166,9 @@ process { By using this configuration, all processes in your pipeline will be executed through the SGE cluster, with the specified settings. -(config-process-selectors)= - ### Process selectors -The `withLabel` selectors allow the configuration of all processes annotated with a {ref}`process-label` directive as shown below: +The `withLabel` selectors allow the configuration of all processes annotated with a [label][process-label] directive as shown below: ```groovy process { @@ -183,12 +198,10 @@ The `withName` selector applies both to processes defined with the same name and Furthermore, selectors for the alias of an included process take priority over selectors for the original name of the process. For example, given a process defined as `hello` and included as `sayHello`, the selectors `withName: hello` and `withName: sayHello` will both be applied to the process, with the second selector taking priority over the first. -:::{tip} +:::tip Label and process names do not need to be enclosed with quotes, provided the name does not include special characters (`-`, `!`, etc) and is not a keyword or a built-in type identifier. When in doubt, you can enclose the label name or process name with single or double quotes. ::: -(config-selector-expressions)= - ### Selector expressions Both label and process name selectors allow the use of a regular expression in order to apply the same configuration to all processes matching the specified pattern condition. For example: @@ -216,8 +229,6 @@ process { The above configuration snippet sets 2 cpus for every process labeled as `hello` and 4 cpus to every process *not* label as `hello`. It also specifies the `long` queue for every process whose name does *not* start with `align`. -(config-selector-priority)= - ### Selector priority Process configuration settings are applied to a process in the following order (from lowest to highest priority): @@ -283,11 +294,12 @@ nextflow run main.nf -profile standard,cloud Config profiles are applied in the order in which they were defined in the config file, regardless of the order they are specified on the command line. -:::{versionadded} 25.02.0-edge -When using the {ref}`strict config syntax `, profiles are applied in the order in which they are specified on the command line. + +:::note{title="Version added 25.02.0-edge"} +When using the [strict config syntax][updating-config-syntax], profiles are applied in the order in which they are specified on the command line. ::: -:::{danger} +:::warning When defining a profile in the config file, avoid using both the dot and block syntax for the same scope. For example: ```groovy @@ -310,7 +322,7 @@ process { } ``` -This limitation can be avoided by using the {ref}`strict config syntax `. +This limitation can be avoided by using the [strict config syntax][updating-config-syntax]. ::: ## Workflow handlers @@ -329,6 +341,13 @@ workflow.onError = { } ``` -See {ref}`workflow-handlers` for more information. +See [Workflow handlers][workflow-handlers] for more information. -[nxf-env-vars]: /nextflow_docs/nextflow_repo/docs/reference/env-vars#nextflow-settings \ No newline at end of file +[cli-params]: /nextflow_docs/nextflow_repo/docs/cli#pipeline-parameters +[config-options]: /nextflow_docs/nextflow_repo/docs/reference/config +[nxf-env-vars]: /nextflow_docs/nextflow_repo/docs/reference/env-vars#nextflow-settings +[process-reference]: /nextflow_docs/nextflow_repo/docs/reference/process +[process-label]: /nextflow_docs/nextflow_repo/docs/reference/process#label +[syntax-page]: /nextflow_docs/nextflow_repo/docs/reference/syntax +[updating-config-syntax]: /nextflow_docs/nextflow_repo/docs/strict-syntax#configuration-syntax +[workflow-handlers]: /nextflow_docs/nextflow_repo/docs/notifications#workflow-handlers \ No newline at end of file diff --git a/docs/developer-env.md b/docs/developer-env.mdx similarity index 99% rename from docs/developer-env.md rename to docs/developer-env.mdx index 114031844c..1397f386bb 100644 --- a/docs/developer-env.md +++ b/docs/developer-env.mdx @@ -242,9 +242,9 @@ To install Git on macOS with [Homebrew](https://docs.brew.sh/): 1. Open a terminal window and run `brew install git`. - :::note - You must have Homebrew installed. See [Homebrew installation](https://docs.brew.sh/Installation) for instructions. - ::: + :::note + You must have Homebrew installed. See [Homebrew installation](https://docs.brew.sh/Installation) for instructions. + ::: 1. Once complete, run `git version` to verify Git was installed. diff --git a/docs/executor.md b/docs/executor.md index 959aa220f7..fa314fbc77 100644 --- a/docs/executor.md +++ b/docs/executor.md @@ -1,5 +1,3 @@ -(executor-page)= - # Executors In the Nextflow framework architecture, the *executor* is the component that determines the system where a pipeline process is run and supervises its execution. @@ -8,8 +6,6 @@ The executor provides an abstraction between the pipeline processes and the unde In other words, you can write your pipeline script once and have it running on your computer, a cluster resource manager, or the cloud — simply change the executor definition in the Nextflow configuration file. -(awsbatch-executor)= - ## AWS Batch Nextflow supports the [AWS Batch](https://aws.amazon.com/batch/) service that allows job submission in the cloud without having to spin out and manage a cluster of virtual machines. AWS Batch uses Docker containers to run tasks, which greatly simplifies pipeline deployment. @@ -22,20 +18,18 @@ The pipeline can be launched either in a local computer, or an EC2 instance. EC2 Resource requests and other job characteristics can be controlled via the following process directives: -- {ref}`process-accelerator` -- {ref}`process-arch` (only when using Fargate platform type for AWS Batch) -- {ref}`process-container` -- {ref}`process-containerOptions` -- {ref}`process-cpus` -- {ref}`process-disk` (only when using Fargate platform type for AWS Batch) -- {ref}`process-memory` -- {ref}`process-queue` -- {ref}`process-resourcelabels` -- {ref}`process-time` +- [accelerator][process-accelerator] +- [arch][process-arch] (only when using Fargate platform type for AWS Batch) +- [container][process-container] +- [containerOptions][process-containeroptions] +- [cpus][process-cpus] +- [disk][process-disk] (only when using Fargate platform type for AWS Batch) +- [memory][process-memory] +- [queue][process-queue] +- [resourcelabels][process-resourcelabels] +- [time][process-time] -See {ref}`aws-batch` for more information. - -(azurebatch-executor)= +See [AWS Batch][aws-batch] for more information. ## Azure Batch @@ -49,23 +43,21 @@ The pipeline can be launched either in a local computer, or a cloud virtual mach Resource requests and other job characteristics can be controlled via the following process directives: -- {ref}`process-container` -- {ref}`process-containerOptions` -- {ref}`process-cpus` -- {ref}`process-disk` -- {ref}`process-machineType` -- {ref}`process-memory` -- {ref}`process-queue` -- {ref}`process-resourcelabels` -- {ref}`process-time` - -See {ref}`azure-batch` for more information. +- [container][process-container] +- [containerOptions][process-containeroptions] +- [cpus][process-cpus] +- [disk][process-disk] +- [machineType][process-machinetype] +- [memory][process-memory] +- [queue][process-queue] +- [resourcelabels][process-resourcelabels] +- [time][process-time] -(bridge-executor)= +See [Azure Batch][azure-batch] for more information. ## Bridge -:::{versionadded} 22.09.1-edge +:::note{title="Version added 22.09.1-edge"} ::: [Bridge](https://github.com/cea-hpc/bridge) is an abstraction layer to ease batch system and resource manager usage in heterogeneous HPC environments. @@ -78,17 +70,15 @@ To enable the Bridge executor, set `process.executor = 'bridge'` in the `nextflo Resource requests and other job characteristics can be controlled via the following process directives: -- {ref}`process-clusterOptions` -- {ref}`process-cpus` -- {ref}`process-memory` -- {ref}`process-queue` -- {ref}`process-time` - -(flux-executor)= +- [clusterOptions][process-clusteroptions] +- [cpus][process-cpus] +- [memory][process-memory] +- [queue][process-queue] +- [time][process-time] ## Flux Executor -:::{versionadded} 22.11.0-edge +:::note{title="Version added 22.11.0-edge"} ::: The `flux` executor allows you to run your pipeline script using the [Flux Framework](https://flux-framework.org). @@ -99,24 +89,22 @@ To enable the Flux executor, set `process.executor = 'flux'` in the `nextflow.co Resource requests and other job characteristics can be controlled via the following process directives: -- {ref}`process-clusterOptions` -- {ref}`process-cpus` -- {ref}`process-queue` -- {ref}`process-time` +- [clusterOptions][process-clusteroptions] +- [cpus][process-cpus] +- [queue][process-queue] +- [time][process-time] -:::{note} +:::note Flux does not support the `memory` directive. ::: -:::{note} +:::note By default, Flux will send all output to the `.command.log` file. To send this output to stdout and stderr instead, set `flux.terminalOutput = true` in your config file. ::: -(google-batch-executor)= - ## Google Cloud Batch -:::{versionadded} 22.07.1-edge +:::note{title="Version added 22.07.1-edge"} ::: [Google Cloud Batch](https://cloud.google.com/batch) is a managed computing service that allows the execution of containerized workloads in the Google Cloud Platform infrastructure. @@ -129,32 +117,31 @@ To enable this executor, set `process.executor = 'google-batch'` in the `nextflo Resource requests and other job characteristics can be controlled via the following process directives: -- {ref}`process-accelerator` -- {ref}`process-container` -- {ref}`process-containerOptions` -- {ref}`process-cpus` -- {ref}`process-disk` -- {ref}`process-machineType` -- {ref}`process-memory` -- {ref}`process-resourcelabels` -- {ref}`process-time` - -See the {ref}`Google Cloud Batch ` page for further configuration details. +- [accelerator][process-accelerator] +- [container][process-container] +- [containerOptions][process-containeroptions] +- [cpus][process-cpus] +- [disk][process-disk] +- [machineType][process-machinetype] +- [memory][process-memory] +- [resourcelabels][process-resourcelabels] +- [time][process-time] -(htcondor-executor)= +See [Cloud Batch][google-batch]for further configuration details. ## HTCondor -:::{warning} *Experimental: may change in a future release.* +:::warning{title="Experimental: may change in a future release"} ::: + The `condor` executor allows you to run your pipeline script by using the [HTCondor](https://research.cs.wisc.edu/htcondor/) resource manager. Nextflow manages each process as a separate job that is submitted to the cluster using the `condor_submit` command. The pipeline must be launched from a node where the `condor_submit` command is available, which is typically the cluster login node. -:::{note} +:::note The HTCondor executor for Nextflow does not currently support HTCondor's ability to transfer input/output data to the corresponding job's compute node. Therefore, the data must be made accessible to the compute nodes through a shared file system directory from where the Nextflow workflow is executed (or specified via the `-w` option). ::: @@ -162,24 +149,19 @@ To enable the HTCondor executor, set `process.executor = 'condor'` in the `nextf Resource requests and other job characteristics can be controlled via the following process directives: -- {ref}`process-clusterOptions` -- {ref}`process-cpus` -- {ref}`process-disk` -- {ref}`process-memory` -- {ref}`process-time` - -(hyperqueue-executor)= +- [clusterOptions][process-clusteroptions] +- [cpus][process-cpus] +- [disk][process-disk] +- [memory][process-memory] +- [time][process-time] ## HyperQueue -:::{versionadded} 22.05.0-edge -::: - -:::{versionchanged} 24.06.0-edge +:::note{title="Version changed 24.06.0-edge"} HyperQueue 0.17.0 or later is required. ::: -:::{versionchanged} 25.01.0-edge +:::note{title="Version changed 25.01.0-edge"} HyperQueue 0.20.0 or later is required. ::: @@ -193,13 +175,11 @@ To enable the HyperQueue executor, set `process.executor = 'hq'` in the `nextflo Resource requests and other job characteristics can be controlled via the following process directives: -- {ref}`process-accelerator` -- {ref}`process-clusterOptions` -- {ref}`process-cpus` -- {ref}`process-memory` -- {ref}`process-time` - -(k8s-executor)= +- [accelerator][process-accelerator] +- [clusterOptions][process-clusteroptions] +- [cpus][process-cpus] +- [memory][process-memory] +- [time][process-time] ## Kubernetes @@ -207,42 +187,38 @@ The `k8s` executor allows you to run a pipeline on a [Kubernetes](http://kuberne Resource requests and other job characteristics can be controlled via the following process directives: -- {ref}`process-accelerator` -- {ref}`process-cpus` -- {ref}`process-disk` -- {ref}`process-memory` -- {ref}`process-pod` -- {ref}`process-resourcelabels` -- {ref}`process-time` - -See the {ref}`Kubernetes ` page to learn how to set up a Kubernetes cluster to run Nextflow pipelines. +- [accelerator][process-accelerator] +- [cpus][process-cpus] +- [disk][process-disk] +- [memory][process-memory] +- [pod][process-pod] +- [resourcelabels][process-resourcelabels] +- [time][process-time] -(local-executor)= +See the [Kubernetes][k8s-page] page to learn how to set up a Kubernetes cluster to run Nextflow pipelines. ## Local -The `local` executor is used by default. It runs the pipeline processes on the computer where Nextflow is launched. The processes are parallelised by spawning multiple threads, taking advantage of the multi-core architecture of the CPU. +The `local` executor is used by default. It runs the pipeline processes on the computer where Nextflow is launched. The processes are parallelized by spawning multiple threads, taking advantage of the multi-core architecture of the CPU. The `local` executor is useful for developing and testing a pipeline script on your computer, before switching to a cluster or cloud environment with production data. Resource requests and other job characteristics can be controlled via the following process directives: -- {ref}`process-cpus` -- {ref}`process-memory` -- {ref}`process-time` -- {ref}`process-container` -- {ref}`process-containerOptions` +- [cpus][process-cpus] +- [memory][process-memory] +- [time][process-time] +- [container][process-container] +- [containerOptions][process-containeroptions] -:::{note} -While the `local` executor limits the number of concurrent tasks based on requested vs available resources, it does not enforce task resource requests. In other words, it is possible for a local task to use more CPUs and memory than it requested, in which case it may starve other tasks. An exception to this behavior is when using {ref}`container-docker` or {ref}`container-podman` containers, in which case the resource requests are enforced by the container runtime. +:::note +While the `local` executor limits the number of concurrent tasks based on requested vs available resources, it does not enforce task resource requests. In other words, it is possible for a local task to use more CPUs and memory than it requested, in which case it may starve other tasks. An exception to this behavior is when using [Docker][container-docker] or [Podman][container-podman] containers, in which case the resource requests are enforced by the container runtime. ::: The local executor supports two types of tasks: - Script tasks (processes with a `script` or `shell` block) - executed via a Bash wrapper - Native tasks (processes with an `exec` block) - executed directly in the JVM. -(lsf-executor)= - ## LSF The `lsf` executor allows you to run your pipeline script using a [Platform LSF](http://en.wikipedia.org/wiki/Platform_LSF) cluster. @@ -255,28 +231,23 @@ To enable the LSF executor, set `process.executor = 'lsf'` in the `nextflow.conf Resource requests and other job characteristics can be controlled via the following process directives: -- {ref}`process-clusterOptions` -- {ref}`process-cpus` -- {ref}`process-memory` -- {ref}`process-queue` -- {ref}`process-time` +- [clusterOptions][process-clusteroptions] +- [cpus][process-cpus] +- [memory][process-memory] +- [queue][process-queue] +- [time][process-time] -:::{note} -LSF supports both *per-core* and *per-job* memory limits. Nextflow assumes that LSF works in the *per-core* mode, thus it divides the requested {ref}`process-memory` by the number of requested {ref}`process-cpus`. +:::note +LSF supports both *per-core* and *per-job* memory limits. Nextflow assumes that LSF works in the *per-core* mode, thus it divides the requested [memory][process-memory] by the number of requested [cpus][process-cpus]. -When LSF is configured to work in the *per-job* memory limit mode, you must specify this limit with the `perJobMemLimit` option in the {ref}`config-executor` scope of your Nextflow config file. +When LSF is configured to work in the *per-job* memory limit mode, you must specify this limit with the `perJobMemLimit` option in the [`executor`][config-executor] scope of your Nextflow config file. See also the [Platform LSF documentation](https://www.ibm.com/support/knowledgecenter/SSETD4_9.1.3/lsf_config_ref/lsf.conf.lsb_job_memlimit.5.dita). ::: -(moab-executor)= - ## Moab -:::{versionadded} 19.07.0 -::: - -:::{warning} *Experimental: may change in a future release.* +:::warning{title="Experimental: may change in a future release"} ::: The `moab` executor allows you to run your pipeline script using the [Moab](https://en.wikipedia.org/wiki/Moab_Cluster_Suite) resource manager by [Adaptive Computing](http://www.adaptivecomputing.com/). @@ -289,13 +260,11 @@ To enable the `Moab` executor, set `process.executor = 'moab'` in the `nextflow. Resource requests and other job characteristics can be controlled via the following process directives: -- {ref}`process-clusterOptions` -- {ref}`process-cpus` -- {ref}`process-memory` -- {ref}`process-queue` -- {ref}`process-time` - -(nqsii-executor)= +- [clusterOptions][process-clusteroptions] +- [cpus][process-cpus] +- [memory][process-memory] +- [queue][process-queue] +- [time][process-time] ## NQSII @@ -309,19 +278,14 @@ To enable the NQSII executor, set `process.executor = 'nqsii'` in the `nextflow. Resource requests and other job characteristics can be controlled via the following process directives: -- {ref}`process-clusterOptions` -- {ref}`process-cpus` -- {ref}`process-memory` -- {ref}`process-queue` -- {ref}`process-time` - -(oar-executor)= +- [clusterOptions][process-clusteroptions] +- [cpus][process-cpus] +- [memory][process-memory] +- [queue][process-queue] +- [time][process-time] ## OAR -:::{versionadded} 19.11.0-edge -::: - The `oar` executor allows you to run your pipeline script using the [OAR](https://oar.imag.fr) resource manager. Nextflow manages each process as a separate job that is submitted to the cluster using the `oarsub` command. @@ -332,18 +296,18 @@ To enable the OAR executor set `process.executor = 'oar'` in the `nextflow.confi Resource requests and other job characteristics can be controlled via the following process directives: -- {ref}`process-clusterOptions` -- {ref}`process-cpus` -- {ref}`process-memory` -- {ref}`process-queue` -- {ref}`process-time` +- [clusterOptions][process-clusteroptions] +- [cpus][process-cpus] +- [memory][process-memory] +- [queue][process-queue] +- [time][process-time] When specifying `clusterOptions` as a string, multiple options must be separated by semicolons to ensure that the job script is formatted correctly: ```groovy clusterOptions = '-t besteffort;--project myproject' ``` -:::{versionadded} 24.04.0 +:::note{title="Version added 24.04.0"} ::: The same behavior can now be achieved using a string list: @@ -351,9 +315,7 @@ The same behavior can now be achieved using a string list: clusterOptions = [ '-t besteffort', '--project myproject' ] ``` -See {ref}`process-clusteroptions` for details. - -(pbs-executor)= +See [clusterOptions][process-clusteroptions] for details. ## PBS/Torque @@ -367,13 +329,11 @@ To enable the PBS executor, set `process.executor = 'pbs'` in the `nextflow.conf Resource requests and other job characteristics can be controlled via the following process directives: -- {ref}`process-clusterOptions` -- {ref}`process-cpus` -- {ref}`process-memory` -- {ref}`process-queue` -- {ref}`process-time` - -(pbspro-executor)= +- [clusterOptions][process-clusteroptions] +- [cpus][process-cpus] +- [memory][process-memory] +- [queue][process-queue] +- [time][process-time] ## PBS Pro @@ -387,13 +347,11 @@ To enable the PBS Pro executor, set `process.executor = 'pbspro'` in the `nextfl Resource requests and other job characteristics can be controlled via the following process directives: -- {ref}`process-clusterOptions` -- {ref}`process-cpus` -- {ref}`process-memory` -- {ref}`process-queue` -- {ref}`process-time` - -(sge-executor)= +- [clusterOptions][process-clusteroptions] +- [cpus][process-cpus] +- [memory][process-memory] +- [queue][process-queue] +- [time][process-time] ## SGE @@ -407,14 +365,12 @@ To enable the SGE executor, set `process.executor = 'sge'` in the `nextflow.conf Resource requests and other job characteristics can be controlled via the following process directives: -- {ref}`process-clusterOptions` -- {ref}`process-cpus` -- {ref}`process-memory` -- {ref}`process-penv` -- {ref}`process-queue` -- {ref}`process-time` - -(slurm-executor)= +- [clusterOptions][process-clusteroptions] +- [cpus][process-cpus] +- [memory][process-memory] +- [penv][process-penv] +- [queue][process-queue] +- [time][process-time] ## SLURM @@ -428,20 +384,42 @@ To enable the SLURM executor, set `process.executor = 'slurm'` in the `nextflow. Resource requests and other job characteristics can be controlled via the following process directives: -- {ref}`process-clusterOptions` -- {ref}`process-cpus` -- {ref}`process-memory` -- {ref}`process-queue` -- {ref}`process-time` +- [clusterOptions][process-clusteroptions] +- [cpus][process-cpus] +- [memory][process-memory] +- [queue][process-queue] +- [time][process-time] -:::{note} +:::note SLURM partitions can be specified with the `queue` directive. ::: -:::{note} +:::note Nextflow does not provide direct support for SLURM multi-clusters. If you need to submit workflow executions to a cluster other than the current one, specify it with the `SLURM_CLUSTERS` variable in the launch environment. ::: -:::{versionadded} 23.07.0-edge +:::note{title="Version added 23.07.0-edge"} Some SLURM clusters require memory allocations to be specified with `--mem-per-cpu` instead of `--mem`. You can specify `executor.perCpuMemAllocation = true` in the Nextflow configuration to enable this behavior. Nextflow will automatically compute the memory per CPU for each task (by default 1 CPU is used). ::: + +[aws-batch]: /nextflow_docs/nextflow_repo/docs/aws#aws-batch +[azure-batch]: /nextflow_docs/nextflow_repo/docs/azure#azure-batch +[config-executor]: /nextflow_docs/nextflow_repo/docs/reference/config#executor +[container-docker]: /nextflow_docs/nextflow_repo/docs/container#docker +[container-podman]: /nextflow_docs/nextflow_repo/docs/container#podman +[google-batch]: /nextflow_docs/nextflow_repo/docs/google#cloud-batch +[k8s-page]: /nextflow_docs/nextflow_repo/docs/kubernetes +[process-accelerator]: /nextflow_docs/nextflow_repo/docs/reference/process#accelerator +[process-arch]: /nextflow_docs/nextflow_repo/docs/reference/process#arch +[process-clusteroptions]: /nextflow_docs/nextflow_repo/docs/reference/process#clusteroptions +[process-container]: /nextflow_docs/nextflow_repo/docs/reference/process#container +[process-containeroptions]: /nextflow_docs/nextflow_repo/docs/reference/process#containeroptions +[process-cpus]: /nextflow_docs/nextflow_repo/docs/reference/process#cpus +[process-disk]: /nextflow_docs/nextflow_repo/docs/reference/process#disk +[process-machinetype]: /nextflow_docs/nextflow_repo/docs/reference/process#machinetype +[process-memory]: /nextflow_docs/nextflow_repo/docs/reference/process#memory +[process-penv]: /nextflow_docs/nextflow_repo/docs/reference/process#penv +[process-pod]: /nextflow_docs/nextflow_repo/docs/reference/process#pod +[process-queue]: /nextflow_docs/nextflow_repo/docs/reference/process#queue +[process-resourcelabels]: /nextflow_docs/nextflow_repo/docs/reference/process#resourcelabels +[process-time]: /nextflow_docs/nextflow_repo/docs/reference/process#time \ No newline at end of file diff --git a/docs/index.md b/docs/nextflow.md similarity index 66% rename from docs/index.md rename to docs/nextflow.md index 2014a48e5d..bc937bb7b1 100644 --- a/docs/index.md +++ b/docs/nextflow.md @@ -17,10 +17,10 @@ Nextflow is a workflow system for creating scalable, portable, and reproducible To get started with Nextflow: -1. See the Nextflow {ref}`overview ` to learn key concepts. -2. Download and {ref}`install ` Nextflow. -3. Set up an {ref}`environment ` with the {ref}`Nextflow VS Code extension `. -4. Run {ref}`your first script `. +1. See the Nextflow [overview][overview-page] to learn key concepts. +2. Download and [install][install-page] Nextflow. +3. Set up an [environment][devenv-page] with the [Nextflow VS Code extension][devenv-nextflow]. +4. Run [your first script][your-first-script]. To continue learning about Nextflow, visit the [Nextflow community training portal](https://training.nextflow.io/latest/) and find a training course that is right for you. Seqera, the company that develops Nextflow, also runs a variety of training events. See [Seqera Events](https://seqera.io/events/) for more information. @@ -34,7 +34,7 @@ The [nf-core](https://nf-co.re/) project is a community effort aggregating high- ## Contributing -Contributions to Nextflow are welcome. See {ref}`Contributing ` for more details. +Contributions to Nextflow are welcome. See [Contributing][contributing-page] for more details. ## License @@ -46,123 +46,9 @@ If you use Nextflow in your work, please cite: P. Di Tommaso, et al. Nextflow enables reproducible computational workflows. Nature Biotechnology 35, 316–319 (2017) doi:[10.1038/nbt.3820](http://www.nature.com/nbt/journal/v35/n4/full/nbt.3820.html) -```{toctree} -:hidden: -:caption: Get started -:maxdepth: 1 - -overview -install -developer-env -your-first-script -``` - -```{toctree} -:hidden: -:caption: Running pipelines -:maxdepth: 1 - -cli -config -executor -cache-and-resume -reports -plugins -``` - -```{toctree} -:hidden: -:caption: Developing pipelines -:maxdepth: 1 - -script -working-with-files -process -channel -workflow -module -notifications -secrets -sharing -vscode -``` - -```{toctree} -:hidden: -:caption: Software dependencies -:maxdepth: 1 - -git -container -conda -spack -wave -``` - -```{toctree} -:hidden: -:caption: Compute & storage -:maxdepth: 1 - -aws -amazons3 -azure -google -kubernetes -fusion -``` - -```{toctree} -:hidden: -:caption: Language Reference -:maxdepth: 1 - -reference/feature-flags -reference/syntax -reference/stdlib -reference/process -reference/channel -reference/operator -``` - -```{toctree} -:hidden: -:caption: Runtime Reference -:maxdepth: 1 - -reference/cli -reference/config -reference/env-vars -``` - -```{toctree} -:hidden: -:caption: Updates -:maxdepth: 1 - -updating-nextflow -strict-syntax -migrations/index -``` - -```{toctree} -:hidden: -:caption: Contributing -:maxdepth: 1 - -developer/index -developer/diagram -developer/packages -developer/plugins -``` - -```{toctree} -:hidden: -:caption: Guides -:maxdepth: 1 - -data-lineage -updating-spot-retries -metrics -flux -``` +[contributing-page]: /nextflow_docs/nextflow_repo/docs/developer/index +[overview-page]: /nextflow_docs/nextflow_repo/docs/overview +[install-page]: /nextflow_docs/nextflow_repo/docs/install.md +[devenv-page]: /nextflow_docs/nextflow_repo/docs/developer-env.mdx +[devenv-nextflow]: /nextflow_docs/nextflow_repo/docs/developer-env#nextflow +[your-first-script]: /nextflow_docs/nextflow_repo/docs/your-first-script \ No newline at end of file diff --git a/docs/plugins.md b/docs/plugins.mdx similarity index 100% rename from docs/plugins.md rename to docs/plugins.mdx diff --git a/docs/reference/env-vars.md b/docs/reference/env-vars.mdx similarity index 100% rename from docs/reference/env-vars.md rename to docs/reference/env-vars.mdx diff --git a/docs/reference/feature-flags.md b/docs/reference/feature-flags.mdx similarity index 100% rename from docs/reference/feature-flags.md rename to docs/reference/feature-flags.mdx diff --git a/docs/reference/operator.md b/docs/reference/operator.mdx similarity index 100% rename from docs/reference/operator.md rename to docs/reference/operator.mdx diff --git a/docs/reference/process.md b/docs/reference/process.mdx similarity index 100% rename from docs/reference/process.md rename to docs/reference/process.mdx diff --git a/docs/reference/stdlib-namespaces.md b/docs/reference/stdlib-namespaces.mdx similarity index 100% rename from docs/reference/stdlib-namespaces.md rename to docs/reference/stdlib-namespaces.mdx diff --git a/docs/reference/stdlib-types.md b/docs/reference/stdlib-types.mdx similarity index 100% rename from docs/reference/stdlib-types.md rename to docs/reference/stdlib-types.mdx diff --git a/docs/reports.md b/docs/reports.mdx similarity index 51% rename from docs/reports.md rename to docs/reports.mdx index 0db212672e..e595a3e124 100644 --- a/docs/reports.md +++ b/docs/reports.mdx @@ -1,15 +1,11 @@ -(tracing-page)= - # Reports -(execution-log)= - ## Execution log The `nextflow log` command shows information about executed pipelines in the current folder: ```bash -nextflow log [options] +nextflow log [options] ``` :::{note} @@ -39,7 +35,7 @@ $ nextflow log goofy_kilby ### Customizing fields -By default, only the task execution paths are printed. A custom list of fields to print can be provided via the `-f` (`-fields`) option. For example: +By default, only the task execution paths are printed. A custom list of fields to print can be provided via the `-f` (`-fields`) option: ```console $ nextflow log goofy_kilby -f hash,name,exit,status @@ -50,7 +46,7 @@ ec/3100e7 mapping (ggal_gut) 0 COMPLETED 94/dfdfb6 makeTranscript (ggal_gut) 0 COMPLETED ``` -The fields accepted by the `-f` options are the ones in the {ref}`trace report`, as well as: script, stdout, stderr, env. List available fields using the `-l` (`-list-fields`) option. +The fields accepted by the `-f` options are the ones in the [trace file report][trace-file], as well as: script, stdout, stderr, env. List available fields using the `-l` (`-list-fields`) option. The `script` field is useful for examining script commands run in each task: @@ -91,8 +87,6 @@ The `filter` option makes it possible to select which entries to include in the nextflow log goofy_kilby -filter 'name =~ /hello.*/ && status == "FAILED"' ``` -(execution-report)= - ## Execution report Nextflow can create an HTML execution report: a single document which includes many useful metrics about a workflow execution. The report is organised in the three main sections: `Summary`, `Resources` and `Tasks` (see below for details). @@ -100,7 +94,7 @@ Nextflow can create an HTML execution report: a single document which includes m To enable the creation of this report add the `-with-report` command line option when launching the pipeline execution. For example: ```bash -nextflow run -with-report [file name] +nextflow run main.nf -with-report [file name] ``` The report file name can be specified as an optional parameter following the report option. @@ -109,8 +103,7 @@ The report file name can be specified as an optional parameter following the rep The `Summary` section reports the execution status, the launch command, overall execution time and some other workflow metadata. You can see an example below: -```{image} _static/report-summary-min.png -``` +![Report summary](_static/report-summary-min.png) ### Resource Usage @@ -118,31 +111,25 @@ The `Resources` section plots the distribution of resource usage for each workfl Plots are shown for CPU, memory, job duration and disk I/O. They have two (or three) tabs with the raw values and a percentage representation showing what proportion of the requested resources were used. These plots are very helpful to check that task resources are used efficiently. -```{image} _static/report-resource-cpu.png -``` - -Learn more about how resource usage is computed in {ref}`this tutorial `. +![Usage CPUs](_static/report-resource-cpu.png) -(execution-report-tasks)= +See the [Understanding task resource metrics][metrics-page] guide to learn more about how resource usage is computed in Nextflow. ### Tasks The `Tasks` section lists all executed tasks, reporting for each of them the status, the actual command script, and many other metrics. You can see an example below: -```{image} _static/report-tasks-min.png -``` +![Tasks](_static/report-tasks-min.png) -:::{note} +:::note Nextflow collects these metrics through a background process for each job in the target environment. Make sure the following tools are available in the environment where tasks are executed: `awk`, `date`, `grep`, `ps`, `sed`, `tail`, `tee`. Moreover, some of these metrics are not reported when running on Mac OS X. See the corresponding note in the [trace file](#trace-file) section. ::: -:::{warning} +:::warning A common problem when using a third party container image is that it does not include one or more of the above utilities, resulting in an empty execution report. ::: -Please read {ref}`Report scope ` section to learn more about the execution report configuration details. - -(trace-report)= +See [Report scope][config-report] to learn more about the execution report configuration details. ## Trace file @@ -151,10 +138,10 @@ Nextflow creates an execution tracing file that contains some useful information In order to create the execution trace file add the `-with-trace` command line option when launching the pipeline execution. For example: ```bash -nextflow run -with-trace +nextflow run main.nf -with-trace ``` -It will create a file named `trace.txt` in the current directory. The content looks like the above example: +It will create a file named `trace.txt` in the current directory. The content looks similar the above example: | task_id | hash | native_id | name | status | exit | submit | duration | walltime | %cpu | rss | vmem | rchar | wchar | | ------- | --------- | --------- | -------------- | --------- | ---- | ----------------------- | -------- | -------- | ------ | -------- | -------- | -------- | -------- | @@ -178,141 +165,268 @@ It will create a file named `trace.txt` in the current directory. The content lo | 56 | c3/ec5f4a | 2066 | similarity (5) | COMPLETED | 0 | 2014-10-23 17:13:23.948 | 30s | 616ms | 0.0% | 10.4 MB | 34.6 MB | 238 MB | 8.4 MB | | 98 | de/d6c0a6 | 2099 | matrix (1) | COMPLETED | 0 | 2014-10-23 17:14:27.139 | 30s | 1s | 0.0% | 4.8 MB | 42 MB | 240.6 MB | 79 KB | -(trace-fields)= - ### Trace fields The following table shows the fields that can be included in the execution report: -`task_id` -: Task ID. - -`hash` -: Task hash code. - -`native_id` -: Task ID given by the underlying execution system e.g. POSIX process PID when executed locally, job ID when executed by a grid engine, etc. - -`process` -: Nextflow process name. - -`tag` -: User provided identifier associated this task. - -`name` -: Task name. - -`status` -: Task status. Possible values are: `NEW`, `SUBMITTED`, `RUNNING`, `COMPLETED`, `FAILED`, and `ABORTED`. - -`exit` -: POSIX process exit status. - -`module` -: Environment module used to run the task. - -`container` -: Docker image name used to execute the task. - -`cpus` -: The cpus number request for the task execution. - -`time` -: The time request for the task execution - -`disk` -: The disk space request for the task execution. - -`memory` -: The memory request for the task execution. - -`attempt` -: Attempt at which the task completed. - -`submit` -: Timestamp when the task has been submitted. - -`start` -: Timestamp when the task execution has started. - -`complete` -: Timestamp when task execution has completed. - -`duration` -: Time elapsed to complete since the submission. - -`realtime` -: Task execution time i.e. delta between completion and start timestamp. - -`queue` -: The queue that the executor attempted to run the process on. - -`%cpu` -: Percentage of CPU used by the process. - -`%mem` -: Percentage of memory used by the process. - -`rss` -: Real memory (resident set) size of the process. Equivalent to `ps -o rss` . - -`vmem` -: Virtual memory size of the process. Equivalent to `ps -o vsize` . - -`peak_rss` -: Peak of real memory. This data is read from field `VmHWM` in `/proc/$pid/status` file. - -`peak_vmem` -: Peak of virtual memory. This data is read from field `VmPeak` in `/proc/$pid/status` file. - -`rchar` -: Number of bytes the process read, using any read-like system call from files, pipes, tty, etc. This data is read from file `/proc/$pid/io`. - -`wchar` -: Number of bytes the process wrote, using any write-like system call. This data is read from file `/proc/$pid/io`. - -`syscr` -: Number of read-like system call invocations that the process performed. This data is read from file `/proc/$pid/io`. - -`syscw` -: Number of write-like system call invocations that the process performed. This data is read from file `/proc/$pid/io`. - -`read_bytes` -: Number of bytes the process directly read from disk. This data is read from file `/proc/$pid/io`. - -`write_bytes` -: Number of bytes the process originally dirtied in the page-cache (assuming they will go to disk later). This data is read from file `/proc/$pid/io`. - -`vol_ctxt` -: Number of voluntary context switches. This data is read from field `voluntary_ctxt_switches` in `/proc/$pid/status` file. - -`inv_ctxt` -: Number of involuntary context switches. This data is read from field `nonvoluntary_ctxt_switches` in `/proc/$pid/status` file. - -`env` -: The variables defined in task execution environment. - -`workdir` -: The directory path where the task was executed. - -`script` -: The task command script. - -`scratch` -: The value of the process `scratch` directive. - -`error_action` -: The action applied on errof task failure. - -`hostname` -: :::{versionadded} 22.05.0-edge - ::: -: The host on which the task was executed. Supported only for the Kubernetes executor yet. Activate with `k8s.fetchNodeName = true` in the Nextflow config file. - -`cpu_model` -: :::{versionadded} 22.07.0-edge - ::: -: The name of the CPU model used to execute the task. This data is read from file `/proc/cpuinfo`. + + + `task_id` + + + Task ID. + + + `hash` + + + Task hash code. + + + `native_id` + + + Task ID given by the underlying execution system e.g. POSIX process PID when executed locally, job ID when executed by a grid engine, etc. + + + `process` + + + Nextflow process name. + + + `tag` + + + User provided identifier associated this task. + + + `name` + + + Task name. + + + `status` + + + Task status. Possible values are: `NEW`, `SUBMITTED`, `RUNNING`, `COMPLETED`, `FAILED`, and `ABORTED`. + + + `exit` + + + POSIX process exit status. + + + `module` + + + Environment module used to run the task. + + + `container` + + + Docker image name used to execute the task. + + + `cpus` + + + The cpus number request for the task execution. + + + `time` + + + The time request for the task execution + + + `disk` + + + The disk space request for the task execution. + + + `memory` + + + The memory request for the task execution. + + + `attempt` + + + Attempt at which the task completed. + + + `submit` + + + Timestamp when the task has been submitted. + + + `start` + + + Timestamp when the task execution has started. + + + `complete` + + + Timestamp when task execution has completed. + + + `duration` + + + Time elapsed to complete since the submission. + + + `realtime` + + + Task execution time i.e. delta between completion and start timestamp. + + + `queue` + + + The queue that the executor attempted to run the process on. + + + `%cpu` + + + Percentage of CPU used by the process. + + + `%mem` + + + Percentage of memory used by the process. + + + `rss` + + + Real memory (resident set) size of the process. Equivalent to `ps -o rss` . + + + `vmem` + + + Virtual memory size of the process. Equivalent to `ps -o vsize` . + + + `peak_rss` + + + Peak of real memory. This data is read from field `VmHWM` in `/proc/$pid/status` file. + + + `peak_vmem` + + + Peak of virtual memory. This data is read from field `VmPeak` in `/proc/$pid/status` file. + + + `rchar` + + + Number of bytes the process read, using any read-like system call from files, pipes, tty, etc. This data is read from file `/proc/$pid/io`. + + + `wchar` + + + Number of bytes the process wrote, using any write-like system call. This data is read from file `/proc/$pid/io`. + + + `syscr` + + + Number of read-like system call invocations that the process performed. This data is read from file `/proc/$pid/io`. + + + `syscw` + + + Number of write-like system call invocations that the process performed. This data is read from file `/proc/$pid/io`. + + + `read_bytes` + + + Number of bytes the process directly read from disk. This data is read from file `/proc/$pid/io`. + + + `write_bytes` + + + Number of bytes the process originally dirtied in the page-cache (assuming they will go to disk later). This data is read from file `/proc/$pid/io`. + + + `vol_ctxt` + + + Number of voluntary context switches. This data is read from field `voluntary_ctxt_switches` in `/proc/$pid/status` file. + + + `inv_ctxt` + + + Number of involuntary context switches. This data is read from field `nonvoluntary_ctxt_switches` in `/proc/$pid/status` file. + + + `env` + + + The variables defined in task execution environment. + + + `workdir` + + + The directory path where the task was executed. + + + `script` + + + The task command script. + + + `scratch` + + + The value of the process `scratch` directive. + + + `error_action` + + + The action applied on errof task failure. + + + `hostname` + + + :::versionadded 22.05.0-edge + ::: + The host on which the task was executed. Supported only for the Kubernetes executor yet. Activate with `k8s.fetchNodeName = true` in the Nextflow config file. + + + `cpu_model` + + + :::versionadded 22.07.0-edge + ::: + The name of the CPU model used to execute the task. This data is read from file `/proc/cpuinfo`. + + :::{note} These metrics provide an estimation of the resources used by running tasks. They are not an alternative to low-level performance analysis tools, and they may not be completely accurate, especially for very short-lived tasks (running for less than a few seconds). @@ -320,16 +434,13 @@ These metrics provide an estimation of the resources used by running tasks. They Trace report layout and other configuration settings can be specified by using the `nextflow.config` configuration file. -Please read {ref}`Trace scope ` section to learn more about it. - -(timeline-report)= +See [Trace scope][config-trace] to learn more. ## Execution timeline Nextflow can render an HTML timeline for all processes executed in your pipeline. An example of the execution timeline is shown below: -```{image} _static/timeline-min.png -``` +![Execution timeline](_static/timeline-min.png) Each bar represents a process run in the pipeline execution. The bar length represents the task duration time (wall-time). The colored area in each bar represents the real execution time. The grey area to the *left* of the colored area represents the task scheduling wait time. The grey area to the *right* of the colored area represents the task termination time (clean-up and file un-staging). The numbers on the x-axis represent the time in absolute units e.g. minutes, hours, etc. @@ -340,13 +451,11 @@ As each process can spawn many tasks, colors are used to identify those tasks be To enable the creation of the execution timeline add the `-with-timeline` command line option when launching the pipeline execution. For example: ```bash -nextflow run -with-timeline [file name] +nextflow run main.nf -with-timeline [file name] ``` The report file name can be specified as an optional parameter following the timeline option. -(workflow-diagram)= - ## Workflow diagram A Nextflow pipeline can be represented as a direct acyclic graph (DAG). The vertices in the graph represent the pipeline's processes and operators, while the edges represent the data dependencies (i.e. channels) between them. @@ -356,7 +465,7 @@ To render the workflow DAG, run your pipeline with the `-with-dag` option. By de The workflow DAG can be rendered in a different format by specifying an output file name with a different extension based on the desired format. For example: ```bash -nextflow run -with-dag flowchart.png +nextflow run main.nf -with-dag flowchart.png ``` :::{versionadded} 22.06.0-edge @@ -369,34 +478,60 @@ The default output format was changed from DOT to HTML. The following file formats are supported: -`dot` -: Graphviz [DOT](http://www.graphviz.org/content/dot-language) file - -`gexf` -: Graph Exchange XML file (Gephi) - -`html` -: HTML file with Mermaid diagram -: :::{versionchanged} 23.10.0 - The HTML format was changed to render a Mermaid diagram instead of a Cytoscape diagram. - ::: - -`mmd` -: :::{versionadded} 22.04.0 - ::: -: Mermaid diagram - -`pdf` -: *Requires [Graphviz](http://www.graphviz.org) to be installed* -: Graphviz PDF file - -`png` -: *Requires [Graphviz](http://www.graphviz.org) to be installed* -: Graphviz PNG file - -`svg` -: *Requires [Graphviz](http://www.graphviz.org) to be installed* -: Graphviz SVG file + + + `dot` + + + Graphviz [DOT](http://www.graphviz.org/content/dot-language) file + + + `gexf` + + + Graph Exchange XML file (Gephi) + + + `html` + + + HTML file with Mermaid diagram +
+ :::versionchanged 23.10.0 + The HTML format was changed to render a Mermaid diagram instead of a Cytoscape diagram. + ::: +
+ + `mmd` + + + :::versionadded 22.04.0 + ::: + Mermaid diagram + + + `pdf` + + + *Requires [Graphviz](http://www.graphviz.org) to be installed* +
+ Graphviz PDF file +
+ + `png` + + + *Requires [Graphviz](http://www.graphviz.org) to be installed* + Graphviz PNG file + + + `svg` + + + *Requires [Graphviz](http://www.graphviz.org) to be installed* + Graphviz SVG file + +
Here is the Mermaid diagram produced by Nextflow for the [rnaseq-nf](https://github.com/nextflow-io/rnaseq-nf) pipeline (using the [Mermaid Live Editor](https://mermaid-js.github.io/mermaid-live-editor/edit) with the `default` theme): @@ -406,3 +541,8 @@ nextflow run rnaseq-nf -preview -with-dag ```{mermaid} _static/dag.mmd ``` + +[metrics-page]: /nextflow_docs/nextflow_repo/docs/metrics +[config-report]: /nextflow_docs/nextflow_repo/docs/reference/config#report +[config-trace]: /nextflow_docs/nextflow_repo/docs/reference/config#trace +[trace-file]: /nextflow_docs/nextflow_repo/docs/reports#trace-file \ No newline at end of file diff --git a/docs/your-first-script.md b/docs/your-first-script.md index 039c45cf4a..34ac23b834 100644 --- a/docs/your-first-script.md +++ b/docs/your-first-script.md @@ -10,7 +10,7 @@ This guide details fundamental skills to run a basic Nextflow pipeline. It inclu You will need the following to get started: -- Nextflow: See [Installation][install-page] for installation instructions. +- Nextflow: See [Installation][install-page] for more information. ## Run a pipeline From d6f20f4910c870d66d9b3f7d17daf911591dc4b7 Mon Sep 17 00:00:00 2001 From: Christopher Hakkaart Date: Thu, 26 Jun 2025 10:57:26 +1200 Subject: [PATCH 04/38] Migrate pages Signed-off-by: Christopher Hakkaart --- ...che-and-resume.md => cache-and-resume.mdx} | 2 + docs/overview.md | 4 +- docs/plugins.mdx | 27 ++- docs/process.md | 220 +++++++++--------- docs/reports.mdx | 68 ++++-- docs/script.md | 60 +++-- docs/working-with-files.md | 37 ++- 7 files changed, 219 insertions(+), 199 deletions(-) rename docs/{cache-and-resume.md => cache-and-resume.mdx} (99%) diff --git a/docs/cache-and-resume.md b/docs/cache-and-resume.mdx similarity index 99% rename from docs/cache-and-resume.md rename to docs/cache-and-resume.mdx index 5e19b56496..1f68da5845 100644 --- a/docs/cache-and-resume.md +++ b/docs/cache-and-resume.mdx @@ -1,3 +1,5 @@ +import Mermaid from '@theme/Mermaid'; + # Caching and resuming One of the core features of Nextflow is the ability to cache task executions and re-use them in subsequent runs to minimize duplicate work. Resumability is useful both for recovering from errors and for iteratively developing a pipeline. It is similar to [checkpointing](https://en.wikipedia.org/wiki/Application_checkpointing), a common practice used by HPC applications. diff --git a/docs/overview.md b/docs/overview.md index 859028b80e..b3966038ad 100644 --- a/docs/overview.md +++ b/docs/overview.md @@ -117,7 +117,7 @@ process { See [Configuration][configuration-page] to learn more about the Nextflow configuration file and settings. [channels-page]: /nextflow_docs/nextflow_repo/docs/channel.md -[process-page]: /nextflow_docs/nextflow_repo/docs/process.md +[configuration-page]: /nextflow_docs/nextflow_repo/docs/config.md [executors-page]: /nextflow_docs/nextflow_repo/docs/executor.md +[process-page]: /nextflow_docs/nextflow_repo/docs/process.md [scripts-page]: /nextflow_docs/nextflow_repo/docs/script.md -[configuration-page]: /nextflow_docs/nextflow_repo/docs/config.md \ No newline at end of file diff --git a/docs/plugins.mdx b/docs/plugins.mdx index 68d33151c1..a6a750ae58 100644 --- a/docs/plugins.mdx +++ b/docs/plugins.mdx @@ -1,23 +1,19 @@ -(plugins-page)= - # Plugins Nextflow has a plugin system that allows the use of extensible components that are downloaded and installed at runtime. -(plugins-core)= - ## Core plugins The following functionalities are provided via plugin components, and they make part of the Nextflow *core* plugins: -- `nf-amazon`: Support for Amazon Web Services. -- `nf-azure`: Support for Microsoft Azure. -- `nf-cloudcache`: Support for the cloud cache (see `NXF_CLOUDCACHE_PATH` under {ref}`config-env-vars`). -- `nf-console`: Implement Nextflow [REPL console](https://www.nextflow.io/blog/2015/introducing-nextflow-console.html). -- `nf-k8s`: Support for Kubernetes. -- `nf-google`: Support for Google Cloud. -- `nf-tower`: Support for [Seqera Platform](https://seqera.io) (formerly Tower Cloud). -- `nf-wave`: Support for [Wave containers](https://seqera.io/wave/) service. +- `nf-amazon`: Support for Amazon Web Services +- `nf-azure`: Support for Microsoft Azure +- `nf-cloudcache`: Support for the cloud cache (see `NXF_CLOUDCACHE_PATH` under [Environment variables][config-env-vars]) +- `nf-console`: Implement Nextflow [REPL console](https://www.nextflow.io/blog/2015/introducing-nextflow-console.html) +- `nf-k8s`: Support for Kubernetes +- `nf-google`: Support for Google Cloud +- `nf-tower`: Support for [Seqera Platform](https://seqera.io) (formerly Tower Cloud) +- `nf-wave`: Support for [Wave containers](https://seqera.io/wave/) service ## Using plugins @@ -43,7 +39,7 @@ The plugin version is optional. If it is not specified, Nextflow will download t The core plugins are documented in this documentation. For all other plugins, please refer to the plugin's code repository for documentation and support. -:::{versionadded} 25.02.0-edge +:::note{title="Version added 25.02.0-edge"} ::: The plugin version can be prefixed with `~` to pin the major and minor version while allowing the latest patch release to be used. For example, `nf-amazon@~2.9.0` will resolve to the latest version matching `2.9.x`, which is `2.9.2`. When working offline, Nextflow will resolve version ranges against the local plugin cache defined by `NXF_PLUGINS_DIR`. @@ -52,7 +48,7 @@ The plugin version can be prefixed with `~` to pin the major and minor version w To use Nextflow plugins in an offline environment: -1. {ref}`Install Nextflow ` on a system with an internet connection. +1. [Install Nextflow][install-nextflow] on a system with an internet connection. 2. Download any additional plugins by running `nextflow plugin install `. Alternatively, simply run your pipeline once and Nextflow will download all of the plugins that it needs. @@ -61,3 +57,6 @@ To use Nextflow plugins in an offline environment: 4. In your Nextflow configuration file, specify each plugin that you downloaded, both name and version, including default plugins. This will prevent Nextflow from trying to download newer versions of plugins. Nextflow caches the plugins that it downloads, so as long as you keep using the same Nextflow version and pin your plugin versions in your config file, Nextflow will use the locally installed plugins and won't try to download them from the Internet. + +[config-env-vars]: /nextflow_docs/nextflow_repo/docs/reference/env-vars +[install-nextflow]: /nextflow_docs/nextflow_repo/docs/install diff --git a/docs/process.md b/docs/process.md index d680024a90..e8a737fd6d 100644 --- a/docs/process.md +++ b/docs/process.md @@ -1,5 +1,3 @@ -(process-page)= - # Processes In Nextflow, a **process** is a specialized function for executing scripts in a scalable and portable manner. @@ -18,9 +16,7 @@ process hello { } ``` -See {ref}`syntax-process` for a full description of the process syntax. - -(process-script)= +See [Process][syntax-process] for a full description of the process syntax. ## Script @@ -48,7 +44,7 @@ There is a subtle but important difference between them. Like in Bash, strings d In the above code fragment, the `$db` variable is replaced by the actual value defined elsewhere in the pipeline script. -:::{warning} +:::warning Since Nextflow uses the same Bash syntax for variable substitutions in strings, you must manage them carefully depending on whether you want to evaluate a *Nextflow* variable or a *Bash* variable. ::: @@ -113,7 +109,7 @@ workflow { } ``` -:::{tip} +:::tip Since the actual location of the interpreter binary file can differ across platforms, it is wise to use the `env` command followed by the interpreter name, e.g. `#!/usr/bin/env perl`, instead of the absolute path, in order to make your script more portable. ::: @@ -153,8 +149,6 @@ process align { In the above example, the process will execute one of several scripts depending on the value of the `mode` parameter. By default it will execute the `tcoffee` command. -(process-template)= - ### Template Process scripts can be externalized to **template** files, which allows them to be reused across different processes and tested independently from the pipeline execution. @@ -200,19 +194,17 @@ The following caveats should be considered: - Template variables are evaluated even if they are commented out in the template script. If a template variable is missing, it will cause the pipeline to fail regardless of where it occurs in the template. -:::{tip} +:::tip Template scripts are generally discouraged due to the caveats described above. The best practice for using a custom script is to embed it in the process definition at first and move it to a separate file with its own command line interface once the code matures. ::: -(process-shell)= - ### Shell -:::{deprecated} 24.11.0-edge -Use the `script` section instead. Consider using the {ref}`strict syntax `, which provides error checking to help distinguish between Nextflow variables and Bash variables in the process script. +:::note{title="Version depreciated 24.11.0-edge"} +Use the `script` section instead. Consider using the [strict syntax][strict-syntax-page], which provides error checking to help distinguish between Nextflow variables and Bash variables in the process script. ::: -The `shell` section is a string expression that defines the script that is executed by the process. It is an alternative to the {ref}`process-script` definition with one important difference: it uses the exclamation mark `!` character, instead of the usual dollar `$` character, to denote Nextflow variables. +The `shell` section is a string expression that defines the script that is executed by the process. It is an alternative to the [Script][process-script] definition with one important difference: it uses the exclamation mark `!` character, instead of the usual dollar `$` character, to denote Nextflow variables. This way, it is possible to use both Nextflow and Bash variables in the same script without having to escape the latter, which makes process scripts easier to read and maintain. For example: @@ -234,14 +226,12 @@ workflow { In the above example, `$USER` is treated as a Bash variable, while `!{str}` is treated as a Nextflow variable. -:::{note} -- Shell script definitions require the use of single-quote `'` delimited strings. When using double-quote `"` delimited strings, dollar variables are interpreted as Nextflow variables as usual. See {ref}`string-interpolation`. +:::note +- Shell script definitions require the use of single-quote `'` delimited strings. When using double-quote `"` delimited strings, dollar variables are interpreted as Nextflow variables as usual. See [String interpolation][string-interpolation] for more information. - Variables prefixed with `!` must always be enclosed in curly brackets, i.e. `!{str}` is a valid variable whereas `!str` is ignored. -- Shell scripts support the use of the {ref}`process-template` mechanism. The same rules are applied to the variables defined in the template script. +- Shell scripts support the use of the [Template][process-template] mechanism. The same rules are applied to the variables defined in the template script. ::: -(process-native)= - ### Native execution The `exec` section executes the given code without launching a job. @@ -270,13 +260,11 @@ Hello Mr. a Hello Mr. c ``` -A native process is very similar to a {ref}`function `. However, it provides additional capabilities such as parallelism, caching, and progress logging. - -(process-stub)= +A native process is very similar to a [function][syntax-function]. However, it provides additional capabilities such as parallelism, caching, and progress logging. ## Stub -:::{versionadded} 20.11.0-edge +:::note{title="Version added 20.11.0-edge"} ::: You can define a command *stub*, which replaces the actual process command when the `-stub-run` or `-stub` command-line option is enabled: @@ -308,8 +296,6 @@ The `stub` section can be defined before or after the `script` section. When the This feature makes it easier to quickly prototype the workflow logic without using the real commands. The developer can use it to provide a dummy script that mimics the execution of the real one in a quicker manner. In other words, it is a way to perform a dry-run. -(process-input)= - ## Inputs The `input` section allows you to define the input channels of a process, similar to function arguments. A process may have at most one input section, which must contain at least one input declaration. @@ -334,7 +320,7 @@ The following input qualifiers are available: - `tuple`: Handle a group of input values having any of the above qualifiers. - `each`: Execute the process for each element in the input collection. -See {ref}`process reference ` for the full list of input methods and options. +See [process reference][process-reference-inputs] for the full list of input methods and options. ### Input variables (`val`) @@ -365,11 +351,11 @@ process job 1 process job 2 ``` -:::{note} +:::note While channels do emit items in the order that they are received, *processes* do not necessarily *process* items in the order that they are received. In the above example, the value `3` was processed before the others. ::: -:::{note} +:::note When the process declares exactly one input, the pipe `|` operator can be used to provide inputs to the process, instead of passing it as a parameter. Both methods have identical semantics: ```nextflow @@ -389,8 +375,6 @@ workflow { ``` ::: -(process-input-path)= - ### Input files (`path`) The `path` qualifier allows you to provide input files to the process execution context. Nextflow will stage the files into the process execution directory, and they can be accessed in the script by using the specified input name. For example: @@ -451,7 +435,7 @@ workflow { In this example, each file received by the process is staged with the name `query.fa` in a different execution context (i.e. the folder where a task is executed). -:::{tip} +:::tip This feature allows you to execute the process command multiple times without worrying about the file names changing. In other words, Nextflow helps you write pipeline tasks that are self-contained and decoupled from the execution environment. As a best practice, you should avoid referencing files in your process script other than those defined in your input section. ::: @@ -473,8 +457,8 @@ workflow { } ``` -:::{note} -Process `path` inputs have nearly the same interface as described in {ref}`stdlib-types-path`, with one difference which is relevant when files are staged into a subdirectory. Given the following input: +:::note +Process `path` inputs have nearly the same interface as described in [Path][stdlib-types-path], with one difference which is relevant when files are staged into a subdirectory. Given the following input: ```nextflow path x, name: 'my-dir/file.txt' @@ -548,11 +532,11 @@ workflow { } ``` -:::{note} -Rewriting input file names according to a named pattern is an extra feature and not at all required. The normal file input syntax introduced in the {ref}`process-input-path` section is valid for collections of multiple files as well. To handle multiple input files while preserving the original file names, use a variable identifier or the `*` wildcard. +:::note +Rewriting input file names according to a named pattern is an extra feature and not at all required. The normal file input syntax introduced in the [Input files (`path`)][process-input-path] section is valid for collections of multiple files as well. To handle multiple input files while preserving the original file names, use a variable identifier or the `*` wildcard. ::: -:::{versionadded} 23.09.0-edge +:::note{title="Version added 23.09.0-edge"} ::: The `arity` option can be used to enforce the expected number of files, either as a number or a range. @@ -589,7 +573,7 @@ In the above example, the input file name is determined by the current value of This approach allows input files to be staged in the task directory with a name that is coherent with the current execution context. -:::{tip} +:::tip In most cases, you won't need to use dynamic file names, because each task is executed in its own directory, and input files are automatically staged into this directory by Nextflow. This behavior guarantees that input files with the same name won't overwrite each other. The above example is useful specifically when there are potential file name conflicts within a single task. ::: @@ -651,8 +635,6 @@ ciao hello ``` -(process-input-tuple)= - ### Input tuples (`tuple`) The `tuple` qualifier allows you to group multiple values into a single input definition. It can be useful when a channel emits tuples of values that need to be handled separately. Each element in the tuple is associated with a corresponding element in the `tuple` definition. For example: @@ -730,16 +712,14 @@ workflow { In the above example, each sequence input file emitted by the `sequences` channel triggers six alignment tasks, three with the `regular` method against each library file, and three with the `espresso` method. -:::{note} +:::note When multiple repeaters are defined, the process is executed for each *combination* of them. ::: -:::{note} -Input repeaters currently do not support tuples. However, you can emulate an input repeater on a channel of tuples by using the {ref}`operator-combine` or {ref}`operator-cross` operator with other input channels to produce all of the desired input combinations. +:::note +Input repeaters currently do not support tuples. However, you can emulate an input repeater on a channel of tuples by using the [`combine`][operator-combine] or [`cross`][operator-cross] operator with other input channels to produce all of the desired input combinations. ::: -(process-multiple-input-channels)= - ### Multiple input channels A key feature of processes is the ability to handle inputs from multiple channels. @@ -776,7 +756,7 @@ The process `echo` is executed two times because the `x` channel emits only two 2 and b ``` -A different semantic is applied when using a {ref}`value channel `. This kind of channel is created by the {ref}`channel.value ` factory method or implicitly when a process is invoked with an argument that is not a channel. By definition, a value channel is bound to a single value and it can be read an unlimited number of times without consuming its content. Therefore, when mixing a value channel with one or more (queue) channels, it does not affect the process termination because the underlying value is applied repeatedly. +A different semantic is applied when using a [value channel][channel-type-value]. This kind of channel is created by the [channel.value][channel-value] factory method or implicitly when a process is invoked with an argument that is not a channel. By definition, a value channel is bound to a single value and it can be read an unlimited number of times without consuming its content. Therefore, when mixing a value channel with one or more (queue) channels, it does not affect the process termination because the underlying value is applied repeatedly. To better understand this behavior, compare the previous example with the following one: @@ -807,13 +787,11 @@ The above example executes the `echo` process three times because `x` is a value 1 and c ``` -:::{note} -In general, multiple input channels should be used to process *combinations* of different inputs, using the `each` qualifier or value channels. Having multiple queue channels as inputs is equivalent to using the {ref}`operator-merge` operator, which is not recommended as it may lead to {ref}`non-deterministic process inputs `. +:::note +In general, multiple input channels should be used to process *combinations* of different inputs, using the `each` qualifier or value channels. Having multiple queue channels as inputs is equivalent to using the [`merge`][operator-merge] operator, which is not recommended as it may lead to [non-deterministic process inputs][cache-nondeterministic-inputs]. ::: -See also: {ref}`channel-types`. - -(process-output)= +See also: [Channel types][channel-types]. ## Outputs @@ -839,7 +817,7 @@ The following output qualifiers are available: - `tuple`: Emit multiple values. - `eval`: Emit the result of a script or command evaluated in the task execution context. -Refer to the {ref}`process reference ` for the full list of available output methods and options. +Refer to the [process reference][process-reference-outputs] for the full list of available output methods and options. ### Output variables (`val`) @@ -919,7 +897,7 @@ workflow { In the above example, the `random_number` process creates a file named `result.txt` which contains a random number. Since a `path` output with the same name is declared, that file is emitted by the corresponding output channel. A downstream process with a compatible input channel will be able to receive it. -Refer to the {ref}`process reference ` for the list of available options for `path` outputs. +Refer to the [process reference][process-reference-outputs] for the list of available options for `path` outputs. ### Multiple output files @@ -952,20 +930,20 @@ File: chunk_ac => l File: chunk_ad => a ``` -By default, all the files matching the specified glob pattern are emitted as a single list. However, as the above example demonstrates, the {ref}`operator-flatten` operator can be used to transform the list of files into a channel that emits each file individually. +By default, all the files matching the specified glob pattern are emitted as a single list. However, as the above example demonstrates, the [`flatten`][operator-flatten] operator can be used to transform the list of files into a channel that emits each file individually. Some caveats on glob pattern behavior: - Input files are not included (unless `includeInputs` is `true`) - Directories are included, unless the `**` pattern is used to recurse through directories -:::{warning} +:::warning Although the input files matching a glob output declaration are not included in the resulting output channel, these files may still be transferred from the task scratch directory to the original task work directory. Therefore, to avoid unnecessary file copies, avoid using loose wildcards when defining output files, e.g. `path '*'`. Instead, use a prefix or a suffix to restrict the set of matching files to only the expected ones, e.g. `path 'prefix_*.sorted.bam'`. ::: Read more about glob syntax at the following link [What is a glob?][glob] -:::{versionadded} 23.09.0-edge +:::note{title="Version added 23.09.0-edge"} ::: The `arity` option can be used to enforce the expected number of files, either as a number or a range. @@ -1003,57 +981,46 @@ process align { In the above example, each process execution produces an alignment file whose name depends on the actual value of the `species` input. -:::{tip} +:::tip The management of output files in Nextflow is often misunderstood. With other tools it is generally necessary to organize the output files into some kind of directory structure or to guarantee a unique file name scheme, so that result files don't overwrite each other and so they can be referenced unequivocally by downstream tasks. -With Nextflow, in most cases, you don't need to manage the naming of output files, because each task is executed in its own unique directory, so files produced by different tasks can't overwrite each other. Also, metadata can be associated with outputs by using the {ref}`tuple output ` qualifier, instead of including them in the output file name. +With Nextflow, in most cases, you don't need to manage the naming of output files, because each task is executed in its own unique directory, so files produced by different tasks can't overwrite each other. Also, metadata can be associated with outputs by using the [tuple output][process-out-tuple] qualifier, instead of including them in the output file name. One example in which you'd need to manage the naming of output files is when you use the `publishDir` directive to have output files also in a specific path of your choice. If two tasks have the same filename for their output and you want them to be in the same path specified by `publishDir`, the last task to finish will overwrite the output of the task that finished before. You can dynamically change that by adding the `saveAs` option to your `publishDir` directive. To sum up, the use of output files with static names over dynamic ones is preferable whenever possible, because it will result in simpler and more portable code. ::: -(process-env)= - ### Output environment variables (`env`) The `env` qualifier allows you to output a variable defined in the process execution environment: -```{literalinclude} snippets/process-out-env.nf -:language: nextflow +```nextflow file=./snippets/process-out-env.nf ``` -(process-stdout)= - ### Standard output (`stdout`) The `stdout` qualifier allows you to output the `stdout` of the executed process: -```{literalinclude} snippets/process-stdout.nf -:language: nextflow +```nextflow file=./snippets/process-stdout.nf ``` -(process-out-eval)= - ### Eval output (`eval`) -:::{versionadded} 24.02.0-edge +:::note{title="Version added 24.02.0-edge"} ::: The `eval` qualifier allows you to capture the standard output of an arbitrary command evaluated the task shell interpreter context: -```{literalinclude} snippets/process-out-eval.nf -:language: nextflow +```nextflow file=./snippets/process-out-eval.nf ``` Only one-line Bash commands are supported. You can use a semi-colon `;` to specify multiple Bash commands on a single line, and many interpreters can execute arbitrary code on the command line, e.g. `python -c 'print("Hello world!")'`. If the command fails, the task will also fail. In Bash, you can append `|| true` to a command to suppress any command failure. -(process-out-tuple)= - ### Output tuples (`tuple`) The `tuple` qualifier allows you to output multiple values in a single channel. It is useful when you need to associate outputs with metadata, for example: @@ -1085,7 +1052,7 @@ In the above example, a `blast` task is executed for each pair of `species` and A `tuple` definition may contain any of the following qualifiers, as previously described: `val`, `path`, `env` and `stdout`. Files specified with the `path` qualifier are treated exactly the same as standalone `path` inputs. -:::{note} +:::note While parentheses for input and output qualifiers are generally optional, they are required when specifying elements in an input/output tuple. Here's an example with a single path output (parentheses optional): @@ -1118,8 +1085,6 @@ process hello { ``` ::: -(process-naming-outputs)= - ### Naming outputs The `emit` option can be used on a process output to define a name for the corresponding output channel, which can be used to access the channel by name from the process output. For example: @@ -1143,7 +1108,7 @@ workflow { } ``` -See {ref}`workflow-process-invocation` for more details. +See [Calling processes and workflows][workflow-process-invocation] for more details. ### Optional outputs @@ -1156,16 +1121,14 @@ path("output.txt"), optional: true In this example, the process is normally expected to produce an `output.txt` file, but in this case, if the file is missing, the task will not fail. The output channel will only contain values for those tasks that produced `output.txt`. -:::{note} +:::note While this option can be used with any process output, it cannot be applied to individual elements of a [tuple](#output-tuples-tuple) output. The entire tuple must be optional or not optional. ::: -(process-when)= - ## When -:::{note} -As a best practice, conditional logic should be implemented in the calling workflow (e.g. using an `if` statement or {ref}`operator-filter` operator) instead of the process definition. +:::note +As a best practice, conditional logic should be implemented in the calling workflow (e.g. using an `if` statement or [`filter`][operator-filter] operator) instead of the process definition. ::: The `when` section allows you to define a condition that must be satisfied in order to execute the process. The condition can be any expression that returns a boolean value. @@ -1188,31 +1151,29 @@ process blast_search { } ``` -(process-directives)= - ## Directives Directives are optional settings that affect the execution of the current process. By default, directives are evaluated when the process is defined. However, if the value is a dynamic string or closure, it will be evaluated separately for each task, which allows task-specific variables like `task` and `val` inputs to be used. -Some directives are only supported by specific executors. Refer to the {ref}`executor-page` page for more information about each executor. +Some directives are only supported by specific executors. See [Executors][executor-page] for more information about each executor. -Refer to the {ref}`process reference ` for the full list of process directives. If you are new to Nextflow, here are some commonly-used operators to learn first: +Refer to the [process reference][process-reference-directives] for the full list of process directives. If you are new to Nextflow, here are some commonly-used operators to learn first: General: -- {ref}`process-error-strategy`: strategy for handling task failures -- {ref}`process-executor`: the {ref}`executor ` with which to execute tasks -- {ref}`process-tag`: a semantic name used to differentiate between task executions of the same process +- [errorStrategy][process-error-strategy]]: strategy for handling task failures +- [tag][process-executor]: the [executor][executor-page] to execute tasks +- [executor][process-tag]: a semantic name used to differentiate between task executions of the same process Resource requirements: -- {ref}`process-cpus`: the number of CPUs to request for each task -- {ref}`process-memory`: the amount of memory to request for each task -- {ref}`process-time`: the amount of walltime to request for each task +- [cpus][process-cpus]: the number of CPUs to request for each task +- [memory][process-memory]: the amount of memory to request for each task +- [time][process-time]: the amount of walltime to request for each task Software dependencies: -- {ref}`process-conda`: list of conda packages to provision for tasks -- {ref}`process-container`: container image to use for tasks +- [conda][process-conda]: list of conda packages to provision for tasks +- [container][process-container]: container image to use for tasks ### Using task directive values @@ -1227,15 +1188,13 @@ process hello { } ``` -In the above snippet, `task.cpus` and `task.memory` hold the values for the {ref}`cpus directive` and {ref}`memory directive` directives, respectively, which were resolved for this task based on the process configuration. - -(dynamic-directives)= +In the above snippet, `task.cpus` and `task.memory` hold the values for the [cpus directive][process-cpus] and [memory directive][process-memory] directives, respectively, which were resolved for this task based on the process configuration. ### Dynamic directives A directive can be assigned *dynamically*, during the process execution, so that its actual value can be evaluated based on the process inputs. -To be defined dynamically, the directive's value needs to be expressed using a {ref}`closure `. For example: +To be defined dynamically, the directive's value needs to be expressed using a [closure][script-closure]. For example: ```nextflow process hello { @@ -1252,15 +1211,15 @@ process hello { } ``` -In the above example, the {ref}`process-queue` directive is evaluated dynamically, depending on the input value `entries`. When it is larger than 100, jobs will be submitted to the `long` queue, otherwise the `short` queue will be used. +In the above example, the [queue][process-queue] directive is evaluated dynamically, depending on the input value `entries`. When it is larger than 100, jobs will be submitted to the `long` queue, otherwise the `short` queue will be used. All directives can be assigned a dynamic value except the following: -- {ref}`process-executor` -- {ref}`process-label` -- {ref}`process-maxforks` +- [executor][process-executor] +- [label][process-label] +- [maxForks][process-maxforks] -:::{tip} +:::tip Assigning a string value with one or more variables is always resolved in a dynamic manner, and therefore is equivalent to the above syntax. For example, the above directive can also be written as: ```nextflow @@ -1270,8 +1229,6 @@ queue "${ entries > 100 ? 'long' : 'short' }" Note, however, that the latter syntax can be used both for a directive's main argument (as in the above example) and for a directive's optional named attributes, whereas the closure syntax is only resolved dynamically for a directive's main argument. ::: -(dynamic-task-resources)= - ### Dynamic task resources It's a very common scenario that different instances of the same process may have very different needs in terms of computing resources. In such situations requesting, for example, an amount of memory too low will cause some tasks to fail. Instead, using a higher limit that fits all the tasks in your execution could significantly decrease the execution priority of your jobs. @@ -1293,13 +1250,13 @@ process hello { } ``` -In the above example the {ref}`process-memory` and execution {ref}`process-time` limits are defined dynamically. The first time the process is executed the `task.attempt` is set to `1`, thus it will request 2 GB of memory and 1 hour of walltime. +In the above example the [memory][process-memory] and execution [time][process-time] limits are defined dynamically. The first time the process is executed the `task.attempt` is set to `1`, thus it will request 2 GB of memory and 1 hour of wall time. -If the task execution fails with an exit status between 137 and 140, the task is re-executed; otherwise, the run is terminated immediately. The re-executed task will have `task.attempt` set to `2`, and will request 4 GB of memory and 2 hours of walltime. +If the task execution fails with an exit status between 137 and 140, the task is re-executed; otherwise, the run is terminated immediately. The re-executed task will have `task.attempt` set to `2`, and will request 4 GB of memory and 2 hours of wall time. -The {ref}`process-maxretries` directive sets the maximum number of times the same task can be re-executed. +The [ maxRetries][process-maxretries] directive sets the maximum number of times the same task can be re-executed. -:::{tip} +:::tip Directives with named arguments, such as `accelerator` and `disk`, must use a more verbose syntax when they are dynamic. For example: ```nextflow @@ -1329,14 +1286,12 @@ process hello { In this example, each task requests 8 GB of memory, plus the size of the input file rounded up to the next GB. This way, each task requests only as much memory as it needs based on the size of the inputs. The specific function that you use should be tuned for each process. -(task-previous-execution-trace)= - ### Dynamic task resources with previous execution trace -:::{versionadded} 24.10.0 +:::note{title="Version added 24.10.0"} ::: -Task resource requests can be updated relative to the {ref}`trace record ` metrics of the previous task attempt. The metrics can be accessed through the `task.previousTrace` variable. For example: +Task resource requests can be updated relative to the [trace file][trace-report] metrics of the previous task attempt. The metrics can be accessed through the `task.previousTrace` variable. For example: ```nextflow process hello { @@ -1351,7 +1306,7 @@ process hello { } ``` -In the above example, the {ref}`process-memory` is set according to previous trace record metrics. In the first attempt, when no trace metrics are available, it is set to 1 GB. In each subsequent attempt, the requested memory is doubled. See {ref}`trace-report` for more information about trace records. +In the above example, the [memory][process-memory] is set according to previous trace record metrics. In the first attempt, when no trace metrics are available, it is set to 1 GB. In each subsequent attempt, the requested memory is doubled. See [Trace file][trace-report] for more information about trace records. ### Dynamic retry with backoff @@ -1369,4 +1324,41 @@ process hello { } ``` +[cache-nondeterministic-inputs]: /nextflow_docs/nextflow_repo/docs/cache-and-resume#non-deterministic-process-inputs +[channel-types]: /nextflow_docs/nextflow_repo/docs/channel#channel-types +[channel-type-value]: /nextflow_docs/nextflow_repo/docs/channel#value-channel +[channel-value]: /nextflow_docs/nextflow_repo/docs/reference/channel#value +[executor-page]: /nextflow_docs/nextflow_repo/docs/executor [glob]: http://docs.oracle.com/javase/tutorial/essential/io/fileOps.html#glob +[operator-combine]: /nextflow_docs/nextflow_repo/docs/reference/operator#combine +[operator-filter]: /nextflow_docs/nextflow_repo/docs/reference/operator#filter +[operator-flatten]: /nextflow_docs/nextflow_repo/docs/reference/operator#flatten +[operator-cross]: /nextflow_docs/nextflow_repo/docs/reference/operator#cross +[operator-merge]: /nextflow_docs/nextflow_repo/docs/reference/operator#merge +[process-conda]: /nextflow_docs/nextflow_repo/docs/reference/process#conda +[process-container]: /nextflow_docs/nextflow_repo/docs/reference/process#container +[process-cpus]: /nextflow_docs/nextflow_repo/docs/reference/process#cpus +[process-error-strategy]: /nextflow_docs/nextflow_repo/docs/reference/process#errorStrategy +[process-executor]: /nextflow_docs/nextflow_repo/docs/reference/process#executor +[process-input-path]: /nextflow_docs/nextflow_repo/docs/process#input-files-path +[process-label]: /nextflow_docs/nextflow_repo/docs/reference/process#label +[process-maxforks]: /nextflow_docs/nextflow_repo/docs/reference/process#maxforks +[process-maxretries]: /nextflow_docs/nextflow_repo/docs/reference/process#maxretries +[process-memory]: /nextflow_docs/nextflow_repo/docs/reference/process#memory +[process-out-tuple]: /nextflow_docs/nextflow_repo/docs/process#output-tuples-tuple +[process-queue]: nextflow_docs/nextflow_repo/docs/reference/process#queue +[process-tag]: /nextflow_docs/nextflow_repo/docs/reference/process#tag +[process-time]: /nextflow_docs/nextflow_repo/docs/reference/process#time +[process-script]: /nextflow_docs/nextflow_repo/docs/process#script +[process-reference-directives]: /nextflow_docs/nextflow_repo/docs/reference/process#directives +[process-reference-inputs]: /nextflow_docs/nextflow_repo/docs/reference/process#inputs +[process-reference-outputs]: /nextflow_docs/nextflow_repo/docs/reference/process#outputs +[process-template]: /nextflow_docs/nextflow_repo/docs/process#template +[script-closure]: /nextflow_docs/nextflow_repo/docs/script#closures +[stdlib-types-path]: /nextflow_docs/nextflow_repo/docs/reference/stdlib-types#path +[strict-syntax-page]: /nextflow_docs/nextflow_repo/docs/strict-syntax.md +[string-interpolation]: /nextflow_docs/nextflow_repo/docs/script#string-interpolation +[syntax-function]: /nextflow_docs/nextflow_repo/docs/reference/syntax#function +[syntax-process]: /nextflow_docs/nextflow_repo/docs/reference/syntax#process +[trace-report]: /nextflow_docs/nextflow_repo/docs/reports#trace-file +[workflow-process-invocation]: /nextflow_docs/nextflow_repo/docs/workflow#calling-processes-and-workflows diff --git a/docs/reports.mdx b/docs/reports.mdx index e595a3e124..57660ea51f 100644 --- a/docs/reports.mdx +++ b/docs/reports.mdx @@ -1,3 +1,5 @@ +import DefinitionList, { DefinitionTerm, DefinitionDescription } from '@site/src/components/DefinitionList'; + # Reports ## Execution log @@ -8,7 +10,7 @@ The `nextflow log` command shows information about executed pipelines in the cur nextflow log [options] ``` -:::{note} +:::note Both the [execution report](#execution-report) and the [trace file](#trace-file) must be specified when the pipeline is first called. By contrast, the `log` option is useful after a pipeline has already run and is available for every executed pipeline. ::: @@ -414,27 +416,26 @@ The following table shows the fields that can be included in the execution repor `hostname` - :::versionadded 22.05.0-edge + :::note{title="Version added 25.05.0-edge"} ::: + The host on which the task was executed. Supported only for the Kubernetes executor yet. Activate with `k8s.fetchNodeName = true` in the Nextflow config file. `cpu_model` - :::versionadded 22.07.0-edge + :::note{title="Version added 22.07.0-edge"} ::: The name of the CPU model used to execute the task. This data is read from file `/proc/cpuinfo`. -:::{note} +:::note These metrics provide an estimation of the resources used by running tasks. They are not an alternative to low-level performance analysis tools, and they may not be completely accurate, especially for very short-lived tasks (running for less than a few seconds). ::: -Trace report layout and other configuration settings can be specified by using the `nextflow.config` configuration file. - -See [Trace scope][config-trace] to learn more. +Trace report layout and other configuration settings can be specified by using the `nextflow.config` configuration file. See [Trace scope][config-trace] to learn more. ## Execution timeline @@ -468,11 +469,7 @@ The workflow DAG can be rendered in a different format by specifying an output f nextflow run main.nf -with-dag flowchart.png ``` -:::{versionadded} 22.06.0-edge -You can use the `-preview` option with `-with-dag` to render the workflow DAG without executing any tasks. -::: - -:::{versionchanged} 23.10.0 +:::note{title="Version changed 23.10.0"} The default output format was changed from DOT to HTML. ::: @@ -496,8 +493,8 @@ The following file formats are supported: HTML file with Mermaid diagram -
- :::versionchanged 23.10.0 + + :::note{title="Version changed 23.10.0"} The HTML format was changed to render a Mermaid diagram instead of a Cytoscape diagram. :::
@@ -505,8 +502,6 @@ The following file formats are supported: `mmd` - :::versionadded 22.04.0 - ::: Mermaid diagram @@ -539,7 +534,46 @@ Here is the Mermaid diagram produced by Nextflow for the [rnaseq-nf](https://git nextflow run rnaseq-nf -preview -with-dag ``` -```{mermaid} _static/dag.mmd +```mermaid +%%{ + init: { + 'theme': 'base', + 'themeVariables': { + 'primaryColor': '#B6ECE2', + 'primaryTextColor': '#160F26', + 'primaryBorderColor': '#065647', + 'lineColor': '#545555', + 'clusterBkg': '#BABCBD22', + 'clusterBorder': '#DDDEDE', + 'fontFamily': 'degular' + } + } +}%% +flowchart TB + subgraph " " + v0["Channel.fromFilePairs"] + v1["transcriptome"] + v7["config"] + end + subgraph RNASEQ + v2([INDEX]) + v3([FASTQC]) + v4([QUANT]) + end + v8([MULTIQC]) + subgraph " " + v9[" "] + end + v5(( )) + v0 --> v3 + v0 --> v4 + v1 --> v2 + v2 --> v4 + v3 --> v5 + v4 --> v5 + v7 --> v8 + v5 --> v8 + v8 --> v9 ``` [metrics-page]: /nextflow_docs/nextflow_repo/docs/metrics diff --git a/docs/script.md b/docs/script.md index 578f56b50a..f27e805a8c 100644 --- a/docs/script.md +++ b/docs/script.md @@ -1,16 +1,14 @@ -(script-page)= - # Scripts -Nextflow is a workflow language that runs on the Java virtual machine (JVM). Nextflow's syntax is very similar to [Groovy](https://groovy-lang.org/), a scripting language for the JVM. However, Nextflow is specialized for writing computational pipelines in a declarative manner. See {ref}`syntax-page` for a full description of the Nextflow language. +Nextflow is a workflow language that runs on the Java virtual machine (JVM). Nextflow's syntax is very similar to [Groovy](https://groovy-lang.org/), a scripting language for the JVM. However, Nextflow is specialized for writing computational pipelines in a declarative manner. See [Syntax][syntax-page] for a full description of the Nextflow language. -Nextflow scripts can also make full use of the Java and Groovy standard libraries. See {ref}`stdlib-page` for more information. +Nextflow scripts can also make full use of the Java and Groovy standard libraries. See [Standard library][stdlib-page] for more information. -:::{warning} +:::warning Nextflow uses UTF-8 as the default character encoding for source files. Make sure to use UTF-8 encoding when editing Nextflow scripts with your preferred text editor. ::: -:::{warning} +:::warning Nextflow scripts have a maximum size of 64 KiB. To avoid this limit for large pipelines, consider moving pipeline components into separate files and including them as modules. ::: @@ -43,12 +41,10 @@ def str = "Hi" println str ``` -:::{warning} -Variables can also be declared without `def` in some cases. However, this practice is discouraged outside of simple code snippets because it can lead to a {ref}`race condition `. +:::warning +Variables can also be declared without `def` in some cases. However, this practice is discouraged outside of simple code snippets because it can lead to a [race condition][cache-global-var-race-condition]. ::: -(script-list)= - ## Lists Lists are defined using square brackets: @@ -69,9 +65,7 @@ In order to get the length of the list use the `size` method: println myList.size() ``` -See {ref}`stdlib-types-list` for the set of available list operations. - -(script-map)= +See [List\][stdlib-types-list] for the set of available list operations. ## Maps @@ -105,20 +99,18 @@ def new_scores = scores + ["Pete": 3, "Cedric": 120] When adding two maps, the first map is copied and then appended with the keys from the second map. Any conflicting keys are overwritten by the second map. -:::{tip} +:::tip Copying a map with the `+` operator is a safer way to modify maps in Nextflow, specifically when passing maps through channels. This way, a new instance of the map will be created, and any references to the original map won't be affected. ::: -See {ref}`stdlib-types-map` for the set of available map operations. - -(script-operators)= +See [Map\][stdlib-types-map] for the set of available map operations. ## Operators Operators are symbols that perform specific functions on one or more values, and generally make code easier to read. This section highlights some of the most commonly used operators. -:::{note} -Operators in this context are different from *channel operators*, which are specialized functions for working with channels. See {ref}`channel-page` for more information. +:::note +Operators in this context are different from *channel operators*, which are specialized functions for working with channels. See [Channels][channel-page] for more information. ::: The `==` and `!=` operators can be used to test whether any two values are equal (or not equal): @@ -129,7 +121,7 @@ assert [2, 2] != [4] assert 'two plus two' != 'four' ``` -:::{tip} +:::tip The `assert` keyword simply tests a condition and raises an error if the condition is false. Every assert that you see on this page will succeed if executed. ::: @@ -203,12 +195,10 @@ def counts = ['A': 1, 'B', 2] assert counts['C'] ?: 0 == 0 // x is "truthy" if !!x == true ``` -:::{tip} +:::tip The `?:` operator is also known as the [elvis operator](https://en.wikipedia.org/wiki/Elvis_operator). ::: -(script-string)= - ## Strings Strings can be defined by enclosing text in single or double quotes (`'` or `"` characters): @@ -225,8 +215,6 @@ def a = "world" print "hello " + a + "\n" ``` -(string-interpolation)= - ### String interpolation There is an important difference between single-quoted and double-quoted strings: Double-quoted strings support variable interpolations, while single-quoted strings do not. @@ -260,7 +248,7 @@ def text = """ """ ``` -:::{note} +:::note Like before, multi-line strings inside double quotes support variable interpolation, while single-quoted multi-line strings do not. ::: @@ -280,7 +268,7 @@ def result = myLongCmdline.execute().text In the preceding example, `blastp` and its `-in`, `-out`, `-db` and `-html` switches and their arguments are effectively a single line. -:::{warning} +:::warning Do not put any spaces after the backslash when using backslashes to continue a multi-line command. Spaces after the backslash will be interpreted as an escaped space and will make your script incorrect. It will also print this warning: ``` @@ -288,8 +276,6 @@ unknown recognition error type: groovyjarjarantlr4.v4.runtime.LexerNoViableAltEx ``` ::: -(script-regexp)= - ## Regular expressions Regular expressions are the Swiss Army knife of text processing. They provide the ability to match and extract patterns from strings. @@ -360,8 +346,6 @@ println patch // 3 println flavor // beta ``` -(script-closure)= - ## Closures A closure is a function that can be used like a regular value. Typically, closures are passed as arguments to *higher-order functions* to express computations in a declarative manner. @@ -447,7 +431,7 @@ def result = counts.values().inject { sum, v -> sum + v } This way, the closure is fully "self-contained" because it doesn't access or mutate any variables outside of its scope. -:::{note} +:::note When a closure takes a single parameter, the parameter can be omitted, in which case the implicit `it` parameter will be used: ```nextflow @@ -488,4 +472,14 @@ workflow { } ``` -See {ref}`workflow-page`, {ref}`process-page`, and {ref}`module-page` for more information about how to use these features in your Nextflow scripts. +See [Workflows][workflow-page], [Processes][process-page], and [Modules][module-page] for more information about how to use these features in your Nextflow scripts. + +[cache-global-var-race-condition]: /nextflow_docs/nextflow_repo/docs/cache-and-resume#race-condition-on-a-global-variable +[channel-page]: /nextflow_docs/nextflow_repo/docs/channel.md +[module-page]: /nextflow_docs/nextflow_repo/docs/module +[process-page]: /nextflow_docs/nextflow_repo/docs/process +[stdlib-page]: /nextflow_docs/nextflow_repo/docs/reference/stdlib +[stdlib-types-list]: /nextflow_docs/nextflow_repo/docs/reference/stdlib-types#list-e +[stdlib-types-map]: /nextflow_docs/nextflow_repo/docs/reference/stdlib-types#map-k-v +[syntax-page]: /nextflow_docs/nextflow_repo/docs/reference/syntax +[workflow-page]: /nextflow_docs/nextflow_repo/docs/workflow \ No newline at end of file diff --git a/docs/working-with-files.md b/docs/working-with-files.md index 8135226e39..35a9d263f5 100644 --- a/docs/working-with-files.md +++ b/docs/working-with-files.md @@ -1,5 +1,3 @@ -(working-with-files)= - # Working with files ## Opening files @@ -18,11 +16,11 @@ When using the wildcard characters `*`, `?`, `[]` and `{}`, the argument is inte listOfFiles = file('some/path/*.fa') ``` -:::{note} +:::note The `file()` method does not return a list if only one file is matched. Use the `files()` method to always return a list. ::: -:::{note} +:::note A double asterisk (`**`) in a glob pattern works like `*` but also searches through subdirectories. ::: @@ -32,7 +30,7 @@ By default, wildcard characters do not match directories or hidden files. For ex listWithHidden = file('some/path/*.fa', hidden: true) ``` -:::{note} +:::note To compose paths, instead of string interpolation, use the `resolve()` method or the `/` operator: ```nextflow @@ -57,11 +55,11 @@ assert path.name == 'file.txt' assert path.parent == '/some/path' ``` -:::{tip} +:::tip When calling an object method, any method that looks like `get*()` can also be accessed as a field. For example, `path.getName()` is equivalent to `path.name`, `path.getBaseName()` is equivalent to `path.baseName`, and so on. ::: -See the {ref}`stdlib-types-path` reference for the list of available methods. +See the [Path][stdlib-types-path] reference for the list of available methods. ## Reading and writing @@ -91,11 +89,11 @@ Or you can save a byte array to a file: myFile.bytes = binaryContent ``` -:::{note} +:::note The above assignment overwrites any existing file contents, and implicitly creates the file if it doesn't exist. ::: -:::{warning} +:::warning The above methods read and write the **entire** file contents at once, in a single variable or buffer. For this reason, when dealing with large files it is recommended that you use a more memory efficient approach, such as reading/writing a file line by line or using a fixed size buffer. ::: @@ -133,7 +131,7 @@ file('some/my_file.txt') .each { println it } ``` -:::{warning} +:::warning The method `readLines()` reads the **entire** file at once and returns a list containing all the lines. For this reason, do not use it to read big files. ::: @@ -174,7 +172,7 @@ myFile.withReader { The methods `newInputStream()` and `withInputStream()` work similarly. The main difference is that they create an [InputStream](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/io/InputStream.html) object useful for writing binary data. -See the {ref}`stdlib-types-path` reference for the list of available methods. +See the [Path][stdlib-types-path] reference for the list of available methods. ### Advanced file writing @@ -193,11 +191,11 @@ sourceFile.withReader { source -> } ``` -See the {ref}`stdlib-types-path` reference for the list of available methods. +See the [Path][stdlib-types-path] reference for the list of available methods. ## Filesystem operations -Methods for performing filesystem operations such as copying, deleting, and directory listing are documented in the {ref}`stdlib-types-path` reference. +Methods for performing filesystem operations such as copying, deleting, and directory listing are documented in the [Path][stdlib-types-path] reference. ### Listing directories @@ -226,8 +224,6 @@ myDir.eachFile { item -> In general, you should not need to manually copy files, because Nextflow will automatically stage files in and out of the task environment based on the definition of process inputs and outputs. Ideally, any operation which transforms files should be encapsulated in a process, in order to leverage Nextflow's staging capabilities as much as possible. -(remote-files)= - ## Remote files Nextflow works with many types of remote files and objects using the same interface as for local files. The following protocols are supported: @@ -249,11 +245,11 @@ It can then be used in the same way as a local file: println pdb.text ``` -:::{note} +:::note Not all operations are supported for all protocols. For example, writing and directory listing is not supported for HTTP(S) and FTP paths. ::: -:::{note} +:::note Additional configuration may be necessary for cloud object storage, such as authenticating with a private bucket. See the documentation for each cloud storage provider for further details. ::: @@ -263,10 +259,13 @@ When a process input file resides on a different file system than the work direc Remote files are staged in a subdirectory of the work directory with the form `stage-//`, where `` is determined by the remote file path. If multiple tasks request the same remote file, the file will be downloaded once and reused by each task. These files can be reused by resumed runs with the same session ID. -:::{note} +:::note Remote file staging can be a bottleneck during large-scale runs, particularly when input files are stored in object storage but need to be staged in a shared filesystem work directory. This bottleneck occurs because Nextflow handles all of these file transfers. To mitigate this, you can implement a custom process to download the required files, allowing you to stage multiple files efficiently through parallel jobs. Files should be given as a `val` input instead of a `path` input to bypass Nextflow's built-in remote file staging. -Alternatively, use {ref}`fusion-page` with the work directory set to object storage. In this case, tasks can access remote files directly without any prior staging, eliminating the bottleneck. +Alternatively, use [Fusion][fusion-page] with the work directory set to object storage. In this case, tasks can access remote files directly without any prior staging, eliminating the bottleneck. ::: + +[fusion-page]: /nextflow_docs/nextflow_repo/docs/fusion +[stdlib-types-path]: /nextflow_docs/nextflow_repo/docs/reference/stdlib-types#path From bef5afd0c2721bc6f4a73534cbe395333c785f22 Mon Sep 17 00:00:00 2001 From: Christopher Hakkaart Date: Thu, 26 Jun 2025 14:01:30 +1200 Subject: [PATCH 05/38] Migrate more pages Signed-off-by: Christopher Hakkaart --- docs/channel.md | 69 ++++++++------- docs/module.md | 37 ++++---- docs/notifications.md | 195 ++++++++++++++++++++++++++---------------- docs/secrets.md | 56 +++++++----- docs/sharing.md | 28 +++--- docs/workflow.md | 79 +++++++++-------- 6 files changed, 268 insertions(+), 196 deletions(-) diff --git a/docs/channel.md b/docs/channel.md index 28c7c4b031..2ef0d1f54f 100644 --- a/docs/channel.md +++ b/docs/channel.md @@ -1,5 +1,3 @@ -(channel-page)= - # Channels Nextflow is based on the dataflow programming model in which processes communicate through channels. @@ -9,30 +7,24 @@ A channel has two major properties: 1. Sending a message is an *asynchronous* (i.e. non-blocking) operation, which means the sender doesn't have to wait for the receiving process. 2. Receiving a message is a *synchronous* (i.e. blocking) operation, which means the receiving process must wait until a message has arrived. -(channel-types)= - ## Channel types In Nextflow there are two kinds of channels: *queue channels* and *value channels*. -(channel-type-queue)= - ### Queue channel A *queue channel* is a non-blocking unidirectional FIFO queue connecting a *producer* process (i.e. outputting a value) to a consumer process or an operator. -A queue channel can be created by factory methods ({ref}`channel-of`, {ref}`channel-path`, etc), operators ({ref}`operator-map`, {ref}`operator-flatmap`, etc), and processes (see {ref}`Process outputs `). - -(channel-type-value)= +A queue channel can be created by factory methods ([of][channel-of], [fromPath][channel-path], etc), operators ([map][operator-map], [flatMap][operator-flatmap], etc), and processes (see [Outputs][process-output]). ### Value channel A *value channel* can be bound (i.e. assigned) with one and only one value, and can be consumed any number of times by a process or an operator. -A value channel can be created with the {ref}`channel-value` factory method or by any operator that produces a single value -({ref}`operator-first`, {ref}`operator-collect`, {ref}`operator-reduce`, etc). Additionally, a process will emit value +A value channel can be created with the [value][channel-value] factory method or by any operator that produces a single value +([first][operator-first], [collect][operator-collect], [reduce][operator-reduce], etc). Additionally, a process will emit value channels if it is invoked with all value channels, including simple values which are implicitly wrapped in a value channel. For example: @@ -60,7 +52,7 @@ workflow { In the above example, since the `echo` process is invoked with a simple value instead of a channel, the input is implicitly wrapped in a value channel, and the output is also emitted as a value channel. -See also: {ref}`process-multiple-input-channels`. +See also: [Multiple input channels][process-multiple-input-channels]. ## Channel factories @@ -72,7 +64,7 @@ For example, the `channel.of()` factory can be used to create a channel from an channel.of(1, 2, 3).view() ``` -See {ref}`channel-factory` for the full list of channel factories. +See [Channel factories][channel-factory] for the full list of channel factories. ## Operators @@ -80,23 +72,34 @@ Channel operators, or _operators_ for short, are functions that consume and prod Commonly used operators include: -- {ref}`operator-combine`: emit the combinations of two channels - -- {ref}`operator-collect`: collect the values from a channel into a list - -- {ref}`operator-filter`: select the values in a channel that satisfy a condition - -- {ref}`operator-flatMap`: transform each value from a channel into a list and emit each list -element separately - -- {ref}`operator-grouptuple`: group the values from a channel based on a grouping key - -- {ref}`operator-join`: join the values from two channels based on a matching key - -- {ref}`operator-map`: transform each value from a channel with a mapping function - -- {ref}`operator-mix`: emit the values from multiple channels - -- {ref}`operator-view`: print each value in a channel to standard output - -See {ref}`operator-page` for the full list of operators. +- [combine][operator-combine]: emit the combinations of two channels +- [collect][operator-collect]: collect the values from a channel into a list +- [filter][operator-filter]: select the values in a channel that satisfy a condition +- [flatMap][operator-flatMap]: transform each value from a channel into a list and emit each list element separately +- [groupTuple][operator-grouptuple]: group the values from a channel based on a grouping key +- [join][operator-join]: join the values from two channels based on a matching key +- [map][operator-map]: transform each value from a channel with a mapping function +- [mix][operator-mix]: emit the values from multiple channels +- [view][operator-view]: print each value in a channel to standard output + +See [Operators][operator-page] for the full list of operators. + +[channel-of]: /nextflow_docs/nextflow_repo/docs/reference/channel#of +[channel-path]: /nextflow_docs/nextflow_repo/docs/reference/channel#frompath +[operator-map]: /nextflow_docs/nextflow_repo/docs/reference/operator#map +[operator-flatmap]: /nextflow_docs/nextflow_repo/docs/reference/operator#flatmap +[process-output]: /nextflow_docs/nextflow_repo/docs/process#outputs +[channel-value]: /nextflow_docs/nextflow_repo/docs/reference/channel#value +[operator-first]: /nextflow_docs/nextflow_repo/docs/reference/operator#first +[operator-collect]: /nextflow_docs/nextflow_repo/docs/reference/operator#collect +[operator-reduce]: /nextflow_docs/nextflow_repo/docs/reference/operator#reduce +[process-multiple-input-channels]: /nextflow_docs/nextflow_repo/docs/process#multiple-input-channels +[channel-factory]: /nextflow_docs/nextflow_repo/docs/channel +[operator-combine]: /nextflow_docs/nextflow_repo/docs/reference/operator#combine +[operator-filter]: /nextflow_docs/nextflow_repo/docs/reference/operator#filter +[operator-flatMap]: /nextflow_docs/nextflow_repo/docs/reference/operator#flatmap +[operator-grouptuple]: /nextflow_docs/nextflow_repo/docs/reference/operator#grouptuple +[operator-join]: /nextflow_docs/nextflow_repo/docs/reference/operator#join +[operator-mix]: /nextflow_docs/nextflow_repo/docs/reference/operator#mix +[operator-view]: /nextflow_docs/nextflow_repo/docs/reference/operator#view +[operator-page]: /nextflow_docs/nextflow_repo/docs/reference/operator#page diff --git a/docs/module.md b/docs/module.md index 9820ae5434..c4ef0b33a2 100644 --- a/docs/module.md +++ b/docs/module.md @@ -1,11 +1,9 @@ -(module-page)= - # Modules Nextflow scripts can include **definitions** (workflows, processes, and functions) from other scripts. When a script is included in this way, it is referred to as a **module**. Modules can be included by other modules or pipeline scripts and can even be shared across workflows. -:::{note} -Modules were introduced in DSL2. If you are still using DSL1, see the {ref}`dsl1-page` page to learn how to migrate your Nextflow pipelines to DSL2. +:::note +Modules were introduced in DSL2. If you are still using DSL1, see [Migrating from DSL1][dsl1-page] to learn how to migrate your Nextflow pipelines to DSL2. ::: ## Module inclusion @@ -32,11 +30,9 @@ Module includes are subject to the following rules: - Relative paths must begin with the `./` prefix. - Include statements are not allowed from within a workflow. They must occur at the script level. -(module-directory)= - ## Module directory -:::{versionadded} 22.10.0 +:::note{title="Version added 22.10.0"} ::: A module can be defined as a directory with the same name as the module and with a script named `main.nf`. For example: @@ -69,8 +65,6 @@ workflow { } ``` -(module-aliases)= - ## Module aliases When including definition from a module, it's possible to specify an *alias* with the `as` keyword. Aliasing allows you to avoid module name clashes, by assigning them different names in the including context. For example: @@ -96,11 +90,10 @@ workflow { } ``` -(module-params)= - ## Module parameters -:::{deprecated} 24.07.0-edge + +:::note{title="Version depreciated 24.07.0-edge"} As a best practice, parameters should be used in the entry workflow and passed to workflows, processes, and functions as explicit inputs. ::: @@ -134,11 +127,11 @@ The above snippet prints: Hola Mundo ``` -:::{note} +:::note The module inherits the parameters defined *before* the `include` statement, therefore any parameters set afterwards will not be used by the module. ::: -:::{tip} +:::tip It is best to define all pipeline parameters *before* any `include` statements. ::: @@ -180,11 +173,9 @@ The above snippet prints: Ciao world! ``` -(module-templates)= - ## Module templates -Process script {ref}`templates ` can be included alongside a module in the `templates` directory. +Process script [templates][process-template] can be included alongside a module in the `templates` directory. For example, suppose we have a project L with a module that defines two processes, P1 and P2, both of which use templates. The template files can be made available in the local `templates` directory: @@ -244,11 +235,9 @@ baseDir └── P7-template.sh ``` -(module-binaries)= - ## Module binaries -:::{versionadded} 22.10.0 +:::note{title="Version added 22.10.0"} ::: Modules can define binary scripts that are locally scoped to the processes defined by the tasks. @@ -273,8 +262,8 @@ The binary scripts must be placed in the module directory names `/re Those scripts will be made accessible like any other command in the task environment, provided they have been granted the Linux execute permissions. -:::{note} -This feature requires the use of a local or shared file system for the pipeline work directory, or {ref}`wave-page` when using cloud-based executors. +:::note +This feature requires the use of a local or shared file system for the pipeline work directory, or [Wave containers][wave-page] when using cloud-based executors. ::: ## Sharing modules @@ -284,3 +273,7 @@ Modules are designed to be easy to share and re-use across different pipelines, - Simply copy the module files into your pipeline repository - Use [Git submodules](https://git-scm.com/book/en/v2/Git-Tools-Submodules) to fetch modules from other Git repositories without maintaining a separate copy - Use the [nf-core](https://nf-co.re/tools#modules) CLI to install and update modules with a standard approach used by the nf-core community + +[dsl1-page]: /nextflow_docs/nextflow_repo/docs/migrations/dsl1.md +[process-template]: /nextflow_docs/nextflow_repo/docs/process#template +[wave-page]: /nextflow_docs/nextflow_repo/docs/wave diff --git a/docs/notifications.md b/docs/notifications.md index 9f58f56fd4..9686de9a84 100644 --- a/docs/notifications.md +++ b/docs/notifications.md @@ -1,15 +1,11 @@ -(mail-page)= +import DefinitionList, { DefinitionTerm, DefinitionDescription } from '@site/src/components/DefinitionList'; # Notifications This page documents how to handle workflow events and send notifications. -(workflow-handlers)= - ## Workflow handlers -(metadata-completion-handler)= - ### Completion handler Due to the asynchronous nature of Nextflow the termination of a script does not correspond to the termination of the running workflow. Thus some information, only available on execution completion, needs to be accessed by using an asynchronous handler. @@ -23,8 +19,6 @@ workflow.onComplete { } ``` -(metadata-error-handler)= - ### Error handler The `onError` event handler is invoked by Nextflow when a runtime or process error caused the pipeline execution to stop. For example: @@ -35,16 +29,14 @@ workflow.onError { } ``` -:::{note} -Both the `onError` and `onComplete` handlers are invoked when an error condition is encountered. The first is called as soon as the error is raised, while the second is called just before the pipeline execution is about to terminate. When using the `finish` {ref}`process-error-strategy`, there may be a significant gap between the two, depending on the time required to complete any pending job. +:::note +Both the `onError` and `onComplete` handlers are invoked when an error condition is encountered. The first is called as soon as the error is raised, while the second is called just before the pipeline execution is about to terminate. When using the `finish` [errorStrategy][process-error-strategy], there may be a significant gap between the two, depending on the time required to complete any pending job. ::: ## Mail The built-in function `sendMail` allows you to send a mail message from a workflow script. -(mail-basic)= - ### Basic mail The mail attributes are specified as named parameters or an equivalent map. For example: @@ -73,41 +65,85 @@ sendMail(mail) The following parameters can be specified: -`to` -: *Multiple email addresses can be specified separating them with a comma.* -: The mail target recipients. - -`cc` -: *Multiple email addresses can be specified separating them with a comma.* -: The mail CC recipients. - -`bcc` -: *Multiple email addresses can be specified separating them with a comma.* -: The mail BCC recipients. - -`from` -: *Multiple email addresses can be specified separating them with a comma.* -: The mail sender address. - -`subject` -: The mail subject. - -`charset` -: The mail content charset (default: `UTF-8`). - -`text` -: The mail plain text content. - -`body` -: The mail body content. It can be either plain text or HTML content. - -`type` -: The mail body mime type. If not specified it's automatically detected. - -`attach` -: Single file or a list of files to be included as mail attachments. - -(mail-advanced)= + + + `to` + + + *Multiple email addresses can be specified separating them with a comma.* + + The mail target recipients. + + + + `cc` + + + *Multiple email addresses can be specified separating them with a comma.* + + The mail CC recipients. + + + + `bcc` + + + *Multiple email addresses can be specified separating them with a comma.* + + The mail BCC recipients. + + + + `from` + + + *Multiple email addresses can be specified separating them with a comma.* + + The mail sender address. + + + + `subject` + + + The mail subject. + + + + `charset` + + + The mail content charset (default: `UTF-8`). + + + + `text` + + + The mail plain text content. + + + + `body` + + + The mail body content. It can be either plain text or HTML content. + + + + `type` + + + The mail body mime type. If not specified it's automatically detected. + + + + `attach` + + + Single file or a list of files to be included as mail attachments. + + ### Advanced mail @@ -131,33 +167,49 @@ sendMail { The same attributes listed in the table in the previous section are allowed. -:::{tip} +:::tip A string expression at the end is implicitly interpreted as the mail body content, therefore the `body` parameter can be omitted as shown above. ::: -:::{tip} +:::tip To send an email that includes text and HTML content, use both the `text` and `body` attributes. The first is used for the plain text content, while the second is used for the rich HTML content. ::: -(mail-attachments)= - ### Mail attachments When using the curly brackets syntax, the `attach` parameter can be repeated two or more times to include multiple attachments in the mail message. Moreover for each attachment it's possible to specify any of the following options: -`contentId` -: Defines the `Content-ID` header field for the attachment. - -`disposition` -: Defines the `Content-Disposition` header field for the attachment. - -`fileName` -: Defines the `filename` parameter of the `Content-Disposition` header field. - -`description` -: Defines the `Content-Description` header field for the attachment. + + + `contentId` + + + Defines the `Content-ID` header field for the attachment. + + + + `disposition` + + + Defines the `Content-Disposition` header field for the attachment. + + + + `fileName` + + + Defines the `filename` parameter of the `Content-Disposition` header field. + + + + `description` + + + Defines the `Content-Description` header field for the attachment. + + For example: @@ -173,8 +225,6 @@ sendMail { } ``` -(mail-config)= - ### Mail configuration If no mail server configuration is provided, Nextflow tries to send the email by using the external mail command eventually provided by the underlying system (e.g. `sendmail` or `mail`). @@ -189,11 +239,11 @@ mail { } ``` -See the {ref}`mail scope ` section to learn more the mail server configuration options. +See the [mail scope][config-mail] section to learn more the mail server configuration options. ### AWS SES configuration -:::{versionadded} 23.06.0-edge +:::note{title="Version added 23.06.0-edge"} ::: Nextflow supports [AWS SES](https://aws.amazon.com/ses/) native API as an alternative @@ -213,7 +263,7 @@ ses:SendRawEmail ## Mail notification -You can use the `sendMail` function with a {ref}`workflow completion handler ` to notify the completion of a workflow completion. For example: +You can use the `sendMail` function with a [workflow completion handler][metadata-completion-handler] to notify the completion of a workflow completion. For example: ```nextflow workflow.onComplete { @@ -245,15 +295,16 @@ To enable simply specify the `-N` option when launching the pipeline execution. nextflow run -N ``` -It will send a notification mail when the execution completes similar to the one shown below: +It will send a notification mail when the execution completes. -```{image} _static/workflow-notification-min.png -``` - -:::{warning} +:::warning By default the notification message is sent with the `sendmail` system tool, which is assumed to be available in the environment where Nextflow is running. Make sure it's properly installed and configured. Alternatively, you can provide the SMTP server configuration settings to use the Nextflow built-in mail support, which doesn't require any external system tool. ::: See the [Mail configuration](#mail-configuration) section to learn about the available mail delivery options and configuration settings. -Read {ref}`Notification scope ` section to learn more about the workflow notification configuration details. +See [Completion handler](#completion-handler) to learn more about the workflow notification configuration details. + +[config-mail]: /nextflow_docs/nextflow_repo/docs/reference/config#mail +[config-notification]: /nextflow_docs/nextflow_repo/docs/reference/config#notification +[process-error-strategy]: /nextflow_docs/nextflow_repo/docs/reference/process#errorstrategy diff --git a/docs/secrets.md b/docs/secrets.md index 1860468913..249ec211d5 100644 --- a/docs/secrets.md +++ b/docs/secrets.md @@ -1,8 +1,6 @@ -(secrets-page)= - # Secrets -:::{versionadded} 22.10.0 +:::note{title="Version added 22.10.0"} Previewed in `21.09.0-edge`. ::: @@ -18,17 +16,35 @@ When the pipeline execution is launched Nextflow inject the secrets in pipeline Nextflow provides a command named `secrets`. This command allows four simple operations: -`list` -: List secrets available in the current store e.g. `nextflow secrets list`. - -`get` -: Retrieve a secret value e.g. `nextflow secrets get FOO`. - -`set` -: Create or update a secret e.g. `nextflow secrets set FOO "Hello world"` - -`delete` -: Delete a secret e.g. `nextflow secrets delete FOO`. + + + `list` + + + List secrets available in the current store. For example, `nextflow secrets list`. + + + + `get` + + + Retrieve a secret value. For example, `nextflow secrets get FOO`. + + + + `set` + + + Create or update a secret. For example, `nextflow secrets set FOO "Hello world"` + + + + `delete` + + + Delete a secret. For example, `nextflow secrets delete FOO`. + + ## Configuration file @@ -43,7 +59,7 @@ aws { The above snippet access the secrets `MY_ACCESS_KEY` and `MY_SECRET_KEY` previously and assign them to the corresponding AWS credentials settings. -:::{warning} +:::warning Secrets **cannot** be assigned to pipeline parameters. ::: @@ -65,19 +81,17 @@ process my_task { The above snippet runs a command in with the variables `MY_ACCESS_KEY` and `MY_SECRET_KEY` are injected in the process execution environment holding the values defines in the secret store. -:::{warning} +:::warning The secrets are made available in the process context running the command script as environment variables. Therefore make sure to escape the variable name identifier with a backslash as shown in the example above, otherwise a variable with the same will be evaluated in the Nextflow script context instead of the command script. ::: -:::{note} +:::note This feature is only available when using the local or grid executors (Slurm, Grid Engine, etc). The AWS Batch executor allows the use of secrets when deploying the pipeline execution via [Seqera Platform](https://seqera.io/blog/pipeline-secrets-secure-handling-of-sensitive-information-in-tower/). ::: -(secrets-pipeline-script)= - ## Pipeline script -:::{versionadded} 24.03.0-edge +:::note{title="Version added 24.03.0-edge"} ::: Secrets can be accessed in the pipeline script using the `secrets` variable. For example: @@ -88,6 +102,6 @@ workflow.onComplete { } ``` -:::{note} +:::note This feature is only available when using the local or grid executors (Slurm, Grid Engine, etc). The AWS Batch executor allows the use of secrets when deploying the pipeline execution via [Seqera Platform](https://seqera.io/blog/pipeline-secrets-secure-handling-of-sensitive-information-in-tower/). ::: diff --git a/docs/sharing.md b/docs/sharing.md index 58c836c2ae..9fc0d97b21 100644 --- a/docs/sharing.md +++ b/docs/sharing.md @@ -1,16 +1,14 @@ -(sharing-page)= - # Sharing pipelines Nextflow seamlessly integrates with popular Git providers, including [BitBucket](http://bitbucket.org/), [GitHub](http://github.com), and [GitLab](http://gitlab.com) for managing Nextflow pipelines as version-controlled Git repositories. This feature allows you to easily use other people's Nextflow pipelines and publish your own pipelines. -:::{note} +:::note Nextflow is not meant to completely replace the [Git](https://git-scm.com/) tool. You may still need `git` to create new repositories or commit changes, etc. ::: ## Git configuration -You can configure your credentials for various Git providers in the Git configuration file, located at `$HOME/.nextflow/scm`. See {ref}`git-page` for more information. +You can configure your credentials for various Git providers in the Git configuration file, located at `$HOME/.nextflow/scm`. See [Git][git-page] for more information. ## Using a local repository @@ -34,7 +32,7 @@ Nextflow only requires that the main script in your pipeline project is called ` manifest.mainScript = 'my_very_long_script_name.nf' ``` -To learn more about this and other project metadata information, that can be defined in the Nextflow configuration file, read the {ref}`Manifest ` section on the Nextflow configuration page. +To learn more about this and other project metadata information, that can be defined in the Nextflow configuration file, read the [Manifest][config-manifest] section on the Nextflow configuration page. Once you have uploaded your pipeline project to GitHub other people can execute it simply using the project name or the repository URL. @@ -50,7 +48,7 @@ or nextflow run http://github.com/acme/hello ``` -See the {ref}`CLI ` page to learn how to use the Nextflow command line to run pipelines and manage pipeline projects. +See [Command line][cli-page] to learn how to use the Nextflow command line to run pipelines and manage pipeline projects. ## Managing dependencies @@ -87,9 +85,9 @@ conda.enabled = true This way, when you launch your pipeline, Nextflow will automatically download the necessary dependencies to run your tasks based on this configuration. -Read the {ref}`container-page` page to learn more about how to use containers with Nextflow, and the {ref}`conda-page` page for Conda packages. +Read the [Containers][container-page] page to learn more about how to use containers with Nextflow, and the [Conda environments][conda-page] page for Conda packages. -:::{tip} +:::tip For maximal reproducibility, make sure to define a specific version for each tool. Otherwise, your pipeline might use different versions across subsequent runs, which can introduce subtle differences to your results. ::: @@ -105,7 +103,7 @@ To configure a custom script: 2. Specify a portable shebang (see note below for details). 3. Make the script executable. For example: `chmod a+x bin/my_script.py` -:::{tip} +:::tip To maximize the portability of your bundled script, use `env` to dynamically resolve the location of the interpreter instead of hard-coding it in the shebang line. For example, shebang definitions `#!/usr/bin/python` and `#!/usr/local/bin/python` both hard-code specific paths to the Python interpreter. Instead, the following approach is more portable: @@ -171,7 +169,7 @@ Found sequence 'DNA[ACGTTGCAATGCCGTA]' with melting temperaure 48.0°C Found sequence 'DNA[GCGTACGGTACGTTAC]' with melting temperaure 50.0°C ``` -:::{note} +:::note Package declarations in the `lib` directory are ignored. The package of a class is determined by the directory structure within the `lib` directory. For example, if the above example were defined in `lib/utils/DNASequence.groovy`, the class would need to be referenced in pipeline scripts as `utils.DNASequence`. @@ -222,4 +220,12 @@ env { Similarly, if you use an HPC scheduler like SLURM or a cloud batch service like AWS Batch to execute tasks in a distributed manner, you can use a configuration profile to define the settings for a given environment. -See {ref}`config-page` for more information about Nextflow configuration and {ref}`executor-page` for more information about executors. +See [Configuration][config-page] for more information about Nextflow configuration and [Executors][executor-page] for more information about executors. + +[git-page]: /nextflow_docs/nextflow_repo/docs/git +[config-manifest]: /nextflow_docs/nextflow_repo/docs/reference/config#manifest +[cli-page]: /nextflow_docs/nextflow_repo/docs/cli +[container-page]: /nextflow_docs/nextflow_repo/docs/container +[conda-page]: /nextflow_docs/nextflow_repo/docs/conda +[config-page]: /nextflow_docs/nextflow_repo/docs/config +[executor-page]: /nextflow_docs/nextflow_repo/docs/executor diff --git a/docs/workflow.md b/docs/workflow.md index 0996facf64..1bd53f00c9 100644 --- a/docs/workflow.md +++ b/docs/workflow.md @@ -1,13 +1,11 @@ -(workflow-page)= - # Workflows In Nextflow, a **workflow** is a function that is specialized for composing processes and dataflow logic (i.e. channels and operators). -See {ref}`syntax-workflow` for a full description of the workflow syntax. +See [Workflow][syntax-workflow] for a full description of the workflow syntax. -:::{note} -Workflows were introduced in DSL2. If you are still using DSL1, see {ref}`dsl1-page` for more information about how to migrate your Nextflow pipelines to DSL2. +:::note +Workflows were introduced in DSL2. If you are still using DSL1, see [Migrating from DSL1][dsl1-page] for more information about how to migrate your Nextflow pipelines to DSL2. ::: ## Entry workflow @@ -38,11 +36,11 @@ workflow { } ``` -:::{note} +:::note As a best practice, params should be used only in the entry workflow and passed to workflows and processes as explicit inputs. ::: -The default value can be overridden by the command line, params file, or config file. Parameters from multiple sources are resolved in the order described in {ref}`cli-params`. +The default value can be overridden by the command line, params file, or config file. Parameters from multiple sources are resolved in the order described in [Pipeline parameters][cli-params]. ## Named workflows @@ -115,12 +113,10 @@ workflow my_workflow { The result of the above workflow can be accessed using `my_workflow.out.my_data`. -:::{note} +:::note Every output must be assigned to a name when multiple outputs are declared. ::: -(workflow-process-invocation)= - ## Calling processes and workflows Processes and workflows are called like functions, passing their inputs as arguments: @@ -168,7 +164,7 @@ Processes and workflows have a few extra rules for how they can be called: - Processes and workflows can only be called by workflows -- A given process or workflow can only be called once in a given workflow. To use a process or workflow multiple times in the same workflow, use {ref}`module-aliases`. +- A given process or workflow can only be called once in a given workflow. To use a process or workflow multiple times in the same workflow, use [Module aliases][module-aliases]. The "return value" of a process or workflow call is the process outputs or workflow emits, respectively. The return value can be assigned to a variable or passed into another call: @@ -234,11 +230,11 @@ workflow { } ``` -:::{note} -Process named outputs are defined using the `emit` option on a process output. See {ref}`naming process outputs ` for more information. +:::note +Process named outputs are defined using the `emit` option on a process output. See [naming process outputs][process-naming-outputs] for more information. ::: -:::{note} +:::note Process and workflow outputs can also be accessed by index (e.g., `hello.out[0]`, `hello.out[1]`, etc.). As a best practice, multiple outputs should be accessed by name. ::: @@ -276,12 +272,12 @@ workflow { } ``` -:::{note} +:::note The same process can be called in different workflows without using an alias, like `tick` in the above example, which is used in both `flow1` and `flow2`. The workflow call stack determines the *fully qualified process name*, which is used to distinguish the different process calls, i.e. `flow1:tick` and `flow2:tick` in the above example. ::: -:::{tip} -The fully qualified process name can be used as a {ref}`process selector ` in a Nextflow configuration file, and it takes priority over the simple process name. +:::tip +The fully qualified process name can be used as a [process selector][config-process-selectors] in a Nextflow configuration file, and it takes priority over the simple process name. ::: ## Special operators @@ -312,7 +308,7 @@ workflow { } ``` -The above snippet defines a process named `greet` and invokes it with the input channel. The result is then piped to the {ref}`operator-map` operator, which converts each string to uppercase, and finally to the {ref}`operator-view` operator which prints it. +The above snippet defines a process named `greet` and invokes it with the input channel. The result is then piped to the [map][operator-map] operator, which converts each string to uppercase, and finally to the [view operator][operator-view] operator which prints it. The same code can also be written as: @@ -360,7 +356,7 @@ workflow { } ``` -In the above snippet, the initial channel is piped to the {ref}`operator-map` operator, which reverses the string value. Then, the result is passed to the processes `greet` and `to_upper`, which are executed in parallel. Each process outputs a channel, and the two channels are combined using the {ref}`operator-mix` operator. Finally, the result is printed using the {ref}`operator-view` operator. +In the above snippet, the initial channel is piped to the [map][operator-map] operator, which reverses the string value. Then, the result is passed to the processes `greet` and `to_upper`, which are executed in parallel. Each process outputs a channel, and the two channels are combined using the [mix][operator-mix] operator. Finally, the result is printed using the [view][operator-view] operator. The same code can also be written as: @@ -373,14 +369,12 @@ workflow { } ``` -(workflow-recursion)= - ## Process and workflow recursion -:::{versionadded} 21.11.0-edge +:::note{title="Version added 21.11.0-edge"} ::: -:::{note} +:::note This feature requires the `nextflow.preview.recursion` feature flag to be enabled. ::: @@ -407,11 +401,11 @@ count_down Workflows can also be invoked recursively: -```{literalinclude} snippets/recurse-workflow.nf +```nextflow file=./snippets/recurse-workflow.nf :language: nextflow ``` -```{literalinclude} snippets/recurse-workflow.out +```console file=./snippets/recurse-workflow.out :language: console ``` @@ -421,26 +415,24 @@ Workflows can also be invoked recursively: - Recursive workflows cannot use *reduction* operators such as `collect`, `reduce`, and `toList`, because these operators cause the recursion to hang indefinitely after the initial iteration. -(workflow-output-def)= - ## Workflow outputs -:::{versionadded} 24.04.0 +:::note{title="Version added 24.04.0"} ::: -:::{versionchanged} 24.10.0 -A second preview version was introduced. See the {ref}`migration notes ` for details. +:::note{title="Version changed 24.10.0"} +A second preview version was introduced. See the [migration notes][workflow-outputs-second-preview] for details. ::: -:::{versionchanged} 25.04.0 -A third preview version was introduced. See the {ref}`migration notes ` for details. +:::note{title="Version changed 25.04.0-edge"} +A third preview version was introduced. See the [migration notes][workflow-outputs-third-preview] for details. ::: -:::{note} +:::note This feature requires the `nextflow.preview.output` feature flag to be enabled. ::: -A script can define an *output block* which declares the top-level outputs of the workflow. Each output should be assigned in the `publish` section of the entry workflow. Any channel in the workflow can be assigned to an output, including process and subworkflow outputs. This approach is intended to replace the {ref}`publishDir ` directive. +A script can define an *output block* which declares the top-level outputs of the workflow. Each output should be assigned in the `publish` section of the entry workflow. Any channel in the workflow can be assigned to an output, including process and subworkflow outputs. This approach is intended to replace the [publishDir][process-publishdir] directive. Here is a basic example: @@ -471,8 +463,6 @@ output { In the above example, the output of process `fetch` is assigned to the `samples` workflow output. How this output is published to a directory structure is described in the next section. -(workflow-publishing-files)= - ### Publishing files The top-level output directory of a workflow run can be set using the `-output-dir` command-line option or the `outputDir` config option: @@ -654,7 +644,8 @@ The following directives are available for each output in the output block: `path` : Specify the publish path relative to the output directory (default: `'.'`). Can be a path, a closure that defines a custom directory for each published value, or a closure that publishes individual files using the `>>` operator. -Additionally, the following options from the {ref}`workflow ` config scope can be specified as directives: +Additionally, the following options from the [workflow][config-workflow] config scope can be specified as directives: + - `contentType` - `enabled` - `ignoreErrors` @@ -672,3 +663,17 @@ output { } } ``` + +[cli-params]: /nextflow_docs/nextflow_repo/docs/cli#pipeline-parameters +[config-process-selectors]: /nextflow_docs/nextflow_repo/docs/config#process-selectors +[config-workflow]: /nextflow_docs/nextflow_repo/docs/reference/config#workflow +[dsl1-page]: /nextflow_docs/nextflow_repo/docs/migrations/dsl1 +[workflow-outputs-second-preview]: /nextflow_docs/nextflow_repo/docs/migrations/24-10 +[workflow-outputs-third-preview]: /nextflow_docs/nextflow_repo/docs/migrations/25-04 +[module-aliases]: /nextflow_docs/nextflow_repo/docs/module#module-aliases +[process-naming-outputs]: /nextflow_docs/nextflow_repo/docs/process#naming-outputs +[operator-map]: /nextflow_docs/nextflow_repo/docs/reference/operator#map +[operator-mix]: /nextflow_docs/nextflow_repo/docs/reference/operator#mix +[operator-view]: /nextflow_docs/nextflow_repo/docsreference/operator#view +[process-publishdir]: /nextflow_docs/nextflow_repo/docs/reference/process#publishdir +[syntax-workflow]: /nextflow_docs/nextflow_repo/docs/reference/syntax#workflow From 27712ef1efb4788e67f4c8900233415bce7ac794 Mon Sep 17 00:00:00 2001 From: Christopher Hakkaart Date: Thu, 26 Jun 2025 17:21:38 +1200 Subject: [PATCH 06/38] Migrate VS code Signed-off-by: Christopher Hakkaart --- docs/vscode.md | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/docs/vscode.md b/docs/vscode.md index 76f1367594..7e6017f142 100644 --- a/docs/vscode.md +++ b/docs/vscode.md @@ -1,5 +1,3 @@ -(vscode-page)= - # VS Code integration The [Nextflow VS Code extension](https://marketplace.visualstudio.com/items?itemName=nextflow.nextflow) provides language support for Nextflow pipelines in [VS Code](https://code.visualstudio.com/). @@ -18,8 +16,8 @@ The extension highlights source code in red for errors and yellow for warnings. To view all diagnostics for the workspace, open the **Problems** tab. Here, you can search for diagnostics by diagnostic message, filename, and so on. -:::{note} -The language server parses scripts and config files according to the {ref}`Nextflow language specification `, which is more strict than the Nextflow CLI. See {ref}`strict-syntax-page` for more information. +:::note +The language server parses scripts and config files according to the [Nextflow language specification][syntax-page], which is more strict than the Nextflow CLI. See [Preparing for strict syntax][strict-syntax-page] for more information. ::: ### Hover hints @@ -85,10 +83,11 @@ Report issues at [nextflow-io/vscode-language-nextflow](https://github.com/nextf - The language server provides limited support for Groovy scripts in the `lib` directory. Errors in Groovy scripts are not reported as diagnostics, and changing a Groovy script does not automatically re-compile the Nextflow scripts that reference it. Edit the Nextflow script or close and re-open it to refresh the diagnostics. -(vscode-language-server)= - ## Language server The Nextflow language server implements the [Language Server Protocol (LSP)](https://microsoft.github.io/language-server-protocol/) for Nextflow scripts and config files. It is distributed as a standalone Java application and can be integrated with any editor that functions as an LSP client. Currently, only the VS Code integration is officially supported, but community contributions for other editors are welcome. Visit the [GitHub issues](https://github.com/nextflow-io/language-server/issues) page for the latest updates on community-led integrations. + +[syntax-page]: /nextflow_docs/nextflow_repo/docs/reference/syntax.md +[strict-syntax-page]: /nextflow_docs/nextflow_repo/docs/strict-syntax.md From 8c3953a82bce94473c52556f7109c0715e3dbf99 Mon Sep 17 00:00:00 2001 From: Christopher Hakkaart Date: Fri, 27 Jun 2025 15:32:14 +1200 Subject: [PATCH 07/38] Update admonitions Signed-off-by: Christopher Hakkaart --- docs/cache-and-resume.mdx | 8 +-- docs/config.mdx | 2 +- docs/executor.md | 14 +++--- docs/git.md | 101 +++++++++++++++++++++++--------------- docs/install.md | 4 +- docs/module.md | 6 +-- docs/notifications.md | 2 +- docs/plugins.mdx | 2 +- docs/process.md | 12 ++--- docs/reports.mdx | 8 +-- docs/secrets.md | 4 +- docs/workflow.md | 8 +-- docs/your-first-script.md | 4 +- 13 files changed, 98 insertions(+), 77 deletions(-) diff --git a/docs/cache-and-resume.mdx b/docs/cache-and-resume.mdx index 1f68da5845..9725b56650 100644 --- a/docs/cache-and-resume.mdx +++ b/docs/cache-and-resume.mdx @@ -33,7 +33,7 @@ The task hash is computed from the following metadata: Nextflow also includes an incrementing component in the hash generation process, which allows it to iterate through multiple hash values until it finds one that does not match an existing execution directory. This mechanism typically usually aligns with task retries (i.e., task attempts), however this is not guaranteed. ::: -:::note{title="Version changed 23.09.2-edge"} +:::warning{title="Changed in version 23.09.2-edge"} The [`ext`][process-ext] directive was added to the task hash. ::: @@ -51,7 +51,7 @@ The default cache store uses the `.nextflow/cache` directory, relative to the la Due to the limitations of LevelDB, the database for a given session ID can only be accessed by one reader/writer at a time. This means, for example, that you cannot use `nextflow log` to query the task metadata for a pipeline run while it is still running. -:::note{title="Version added 23.07.0-edge"} +:::note{title="New in version 23.07.0-edge"} ::: The cloud cache is an alternative cache store that uses cloud storage instead of the local cache directory. You can use it by setting the `NXF_CLOUDCACHE_PATH` environment variable to the desired cache path (e.g. `s3://my-bucket/cache`) and providing the necessary credentials. @@ -200,7 +200,7 @@ One way to debug a resumed run is to compare the task hashes of each run using t While some manual effort is required, the final diff can often reveal the exact change that caused a task to be re-executed. -:::note{title="Version added 23.10.0"} +:::note{title="New in version 23.10.0"} ::: When using `-dump-hashes json`, the task hashes can be more easily extracted into a diff. Here is an example Bash script to perform two runs and produce a diff: @@ -225,7 +225,7 @@ diff run_1.tasks.log run_2.tasks.log You can then view the `diff` output or use a graphical diff viewer to compare `run_1.tasks.log` and `run_2.tasks.log`. -:::note{title="Version added 25.04.0"} +:::note{title="New in version 25.04.0"} Nextflow now has a built-in way to compare two task runs. See the [Data lineage][data-lineage-page] guide for details. ::: diff --git a/docs/config.mdx b/docs/config.mdx index f789c64b02..1eb4b0e2af 100644 --- a/docs/config.mdx +++ b/docs/config.mdx @@ -295,7 +295,7 @@ nextflow run main.nf -profile standard,cloud Config profiles are applied in the order in which they were defined in the config file, regardless of the order they are specified on the command line. -:::note{title="Version added 25.02.0-edge"} +:::note{title="New in version 25.02.0-edge"} When using the [strict config syntax][updating-config-syntax], profiles are applied in the order in which they are specified on the command line. ::: diff --git a/docs/executor.md b/docs/executor.md index fa314fbc77..e6d7473549 100644 --- a/docs/executor.md +++ b/docs/executor.md @@ -57,7 +57,7 @@ See [Azure Batch][azure-batch] for more information. ## Bridge -:::note{title="Version added 22.09.1-edge"} +:::note{title="New in version 22.09.1-edge"} ::: [Bridge](https://github.com/cea-hpc/bridge) is an abstraction layer to ease batch system and resource manager usage in heterogeneous HPC environments. @@ -78,7 +78,7 @@ Resource requests and other job characteristics can be controlled via the follow ## Flux Executor -:::note{title="Version added 22.11.0-edge"} +:::note{title="New in version 22.11.0-edge"} ::: The `flux` executor allows you to run your pipeline script using the [Flux Framework](https://flux-framework.org). @@ -104,7 +104,7 @@ By default, Flux will send all output to the `.command.log` file. To send this o ## Google Cloud Batch -:::note{title="Version added 22.07.1-edge"} +:::note{title="New in version 22.07.1-edge"} ::: [Google Cloud Batch](https://cloud.google.com/batch) is a managed computing service that allows the execution of containerized workloads in the Google Cloud Platform infrastructure. @@ -157,11 +157,11 @@ Resource requests and other job characteristics can be controlled via the follow ## HyperQueue -:::note{title="Version changed 24.06.0-edge"} +:::warning{title="Changed in version 24.06.0-edge"} HyperQueue 0.17.0 or later is required. ::: -:::note{title="Version changed 25.01.0-edge"} +:::warning{title="Changed in version 25.01.0-edge"} HyperQueue 0.20.0 or later is required. ::: @@ -307,7 +307,7 @@ When specifying `clusterOptions` as a string, multiple options must be separated clusterOptions = '-t besteffort;--project myproject' ``` -:::note{title="Version added 24.04.0"} +:::note{title="New in version 24.04.0"} ::: The same behavior can now be achieved using a string list: @@ -398,7 +398,7 @@ SLURM partitions can be specified with the `queue` directive. Nextflow does not provide direct support for SLURM multi-clusters. If you need to submit workflow executions to a cluster other than the current one, specify it with the `SLURM_CLUSTERS` variable in the launch environment. ::: -:::note{title="Version added 23.07.0-edge"} +:::note{title="New in version 23.07.0-edge"} Some SLURM clusters require memory allocations to be specified with `--mem-per-cpu` instead of `--mem`. You can specify `executor.perCpuMemAllocation = true` in the Nextflow configuration to enable this behavior. Nextflow will automatically compute the memory per CPU for each task (by default 1 CPU is used). ::: diff --git a/docs/git.md b/docs/git.md index cddfcae4c7..fef4f68a71 100644 --- a/docs/git.md +++ b/docs/git.md @@ -1,4 +1,4 @@ -(git-page)= +import DefinitionList, { DefinitionTerm, DefinitionDescription } from '@site/src/components/DefinitionList'; # Git @@ -19,33 +19,59 @@ providers { In the above template replace `` with one of the "default" servers (i.e. `bitbucket`, `github` or `gitlab`) or a custom identifier representing a private SCM server installation. -:::{versionadded} 20.10.0 -A custom location for the SCM file can be specified using the `NXF_SCM_FILE` environment variable. -::: - The following configuration properties are supported for each provider configuration: -`providers..user` -: User name required to access private repositories on the SCM server. - -`providers..password` -: User password required to access private repositories on the SCM server. - -`providers..token` -: *Required only for private Gitlab servers* -: Private API access token. - -`providers..platform` -: *Required only for private SCM servers* -: Git provider name, either: `github`, `gitlab` or `bitbucket`. - -`providers..server` -: *Required only for private SCM servers* -: SCM server name including the protocol prefix e.g. `https://github.com`. - -`providers..endpoint` -: *Required only for private SCM servers* -: SCM API `endpoint` URL e.g. `https://api.github.com` (default: the same as `providers..server`). + + + `providers..user` + + + User name required to access private repositories on the SCM server. + + + + `providers..password` + + + User password required to access private repositories on the SCM server. + + + + `providers..token` + + + *Required only for private Gitlab servers.* + + Private API access token. + + + + `providers..platform` + + + *Required only for private SCM servers.* + + Git provider name, either: `github`, `gitlab`, or `bitbucket`. + + + + `providers..server` + + + *Required only for private SCM servers.* + + SCM server name including the protocol prefix, e.g., `https://github.com`. + + + + `providers..endpoint` + + + *Required only for private SCM servers.* + + SCM API `endpoint` URL, e.g., `https://api.github.com` (default: the same as `providers..server`). + + ## Git providers @@ -62,7 +88,7 @@ providers { } ``` -:::{note} +:::note App passwords are substitute passwords for a user account which you can use for scripts and integrating tools in order to avoid putting your real password into configuration files. Learn more at [this link](https://support.atlassian.com/bitbucket-cloud/docs/app-passwords/). ::: @@ -70,7 +96,7 @@ App passwords are substitute passwords for a user account which you can use for [BitBucket Server](https://confluence.atlassian.com/bitbucketserver) is a self-hosted Git repository and management platform. -:::{note} +:::note BitBucket Server uses a different API from the [BitBucket](https://bitbucket.org/) Cloud service. Make sure to use the right configuration whether you are using the cloud service or a self-hosted installation. ::: @@ -103,7 +129,7 @@ providers { GitHub requires the use of a personal access token (PAT) in place of a password when accessing APIs. Learn more about PAT and how to create it at [this link](https://docs.github.com/en/github/authenticating-to-github/keeping-your-account-and-data-secure/creating-a-personal-access-token). -:::{versionadded} 23.01.0-edge +:::note{title="New in version 23.01.0-edge"} Nextflow automatically uses the `GITHUB_TOKEN` environment variable to authenticate access to the GitHub repository if no credentials are provided via the `scm` file. This is useful especially when accessing pipeline code from a GitHub Action. Read more about the token authentication in the [GitHub documentation](https://docs.github.com/en/actions/security-guides/automatic-token-authentication). ::: @@ -121,7 +147,7 @@ providers { } ``` -:::{tip} +:::tip The GitLab *token* string can be used as the `password` value in the above setting. When doing that the `token` field can be omitted. ::: @@ -157,17 +183,12 @@ providers { } ``` -:::{tip} +:::tip The Personal access token can be generated in the repository `Clone Repository` dialog. ::: -(aws-codecommit)= - ### AWS CodeCommit -:::{versionadded} 22.06.0-edge -::: - Nextflow supports [AWS CodeCommit](https://aws.amazon.com/codecommit/) as a Git provider to access and to share pipelines code. To access your project hosted on AWS CodeCommit with Nextflow provide the repository credentials using the configuration snippet shown below: @@ -184,7 +205,7 @@ providers { In the above snippet replace `` and `` with your AWS credentials, and `my_aws_repo` with a name of your choice. -:::{tip} +:::tip The `user` and `password` settings are optional. If omitted, the [AWS default credentials provider chain](https://docs.aws.amazon.com/sdk-for-java/v1/developer-guide/credentials.html) is used. ::: @@ -196,7 +217,7 @@ nextflow run https://git-codecommit.eu-west-1.amazonaws.com/v1/repos/my-repo In the above example replace `my-repo` with your own repository. Note also that AWS CodeCommit has different URLs depending the region in which you are working. -:::{note} +:::note The support for protocols other than HTTPS is not available at this time. ::: @@ -232,10 +253,10 @@ Or, alternatively, using the Git clone URL: nextflow run http://gitlab.acme.org/acme/hello.git ``` -:::{note} +:::note You must also specify the server API endpoint URL if it differs from the server base URL. For example, for GitHub Enterprise V3, add `endpoint = 'https://git.your-domain.com/api/v3'`. ::: -:::{warning} +:::warning When accessing a private SCM installation over `https` from a server that uses a custom SSL certificate, you may need to import the certificate into your local Java keystore. See [Import the Certificate as a Trusted Certificate](https://docs.oracle.com/javase/tutorial/security/toolsign/rstep2.html) for more information. ::: diff --git a/docs/install.md b/docs/install.md index 5b77541bea..806aeb5101 100644 --- a/docs/install.md +++ b/docs/install.md @@ -14,7 +14,7 @@ Nextflow requires Bash 3.2 (or later) and [Java 17 (or later, up to 24)](http:// java -version ``` -:::warning{title="24.11.0-edge"} +:::warning{title="Depreciated in version 24.11.0-edge"} Support for Java versions prior to 17 was dropped. ::: @@ -153,4 +153,4 @@ Launching from Seqera Platform provides you with: Seqera Cloud Basic is free for small teams. Researchers at qualifying academic institutions can apply for free access to Seqera Cloud Pro. See the [Seqera Platform documentation](https://docs.seqera.io/platform) for tutorials to get started. -[updating-nextflow]: /nextflow_docs/nextflow_repo/docs/upating-nextflow.md \ No newline at end of file +[updating-nextflow]: /nextflow_docs/nextflow_repo/docs/upating-nextflow \ No newline at end of file diff --git a/docs/module.md b/docs/module.md index c4ef0b33a2..c7c7fded48 100644 --- a/docs/module.md +++ b/docs/module.md @@ -32,7 +32,7 @@ Module includes are subject to the following rules: ## Module directory -:::note{title="Version added 22.10.0"} +:::note{title="New in version 22.10.0"} ::: A module can be defined as a directory with the same name as the module and with a script named `main.nf`. For example: @@ -93,7 +93,7 @@ workflow { ## Module parameters -:::note{title="Version depreciated 24.07.0-edge"} +:::danger{title="Depreciated in version 24.07.0-edge"} As a best practice, parameters should be used in the entry workflow and passed to workflows, processes, and functions as explicit inputs. ::: @@ -237,7 +237,7 @@ baseDir ## Module binaries -:::note{title="Version added 22.10.0"} +:::note{title="New in version 22.10.0"} ::: Modules can define binary scripts that are locally scoped to the processes defined by the tasks. diff --git a/docs/notifications.md b/docs/notifications.md index 9686de9a84..3929d9a236 100644 --- a/docs/notifications.md +++ b/docs/notifications.md @@ -243,7 +243,7 @@ See the [mail scope][config-mail] section to learn more the mail server configur ### AWS SES configuration -:::note{title="Version added 23.06.0-edge"} +:::note{title="New in version 23.06.0-edge"} ::: Nextflow supports [AWS SES](https://aws.amazon.com/ses/) native API as an alternative diff --git a/docs/plugins.mdx b/docs/plugins.mdx index a6a750ae58..0e5e260d82 100644 --- a/docs/plugins.mdx +++ b/docs/plugins.mdx @@ -39,7 +39,7 @@ The plugin version is optional. If it is not specified, Nextflow will download t The core plugins are documented in this documentation. For all other plugins, please refer to the plugin's code repository for documentation and support. -:::note{title="Version added 25.02.0-edge"} +:::note{title="New in version 25.02.0-edge"} ::: The plugin version can be prefixed with `~` to pin the major and minor version while allowing the latest patch release to be used. For example, `nf-amazon@~2.9.0` will resolve to the latest version matching `2.9.x`, which is `2.9.2`. When working offline, Nextflow will resolve version ranges against the local plugin cache defined by `NXF_PLUGINS_DIR`. diff --git a/docs/process.md b/docs/process.md index e8a737fd6d..b652a1b031 100644 --- a/docs/process.md +++ b/docs/process.md @@ -200,7 +200,7 @@ Template scripts are generally discouraged due to the caveats described above. T ### Shell -:::note{title="Version depreciated 24.11.0-edge"} +:::danger{title="Depreciated in version 24.11.0-edge"} Use the `script` section instead. Consider using the [strict syntax][strict-syntax-page], which provides error checking to help distinguish between Nextflow variables and Bash variables in the process script. ::: @@ -264,7 +264,7 @@ A native process is very similar to a [function][syntax-function]. However, it p ## Stub -:::note{title="Version added 20.11.0-edge"} +:::note{title="New in version 20.11.0-edge"} ::: You can define a command *stub*, which replaces the actual process command when the `-stub-run` or `-stub` command-line option is enabled: @@ -536,7 +536,7 @@ workflow { Rewriting input file names according to a named pattern is an extra feature and not at all required. The normal file input syntax introduced in the [Input files (`path`)][process-input-path] section is valid for collections of multiple files as well. To handle multiple input files while preserving the original file names, use a variable identifier or the `*` wildcard. ::: -:::note{title="Version added 23.09.0-edge"} +:::note{title="New in version 23.09.0-edge"} ::: The `arity` option can be used to enforce the expected number of files, either as a number or a range. @@ -943,7 +943,7 @@ Although the input files matching a glob output declaration are not included in Read more about glob syntax at the following link [What is a glob?][glob] -:::note{title="Version added 23.09.0-edge"} +:::note{title="New in version 23.09.0-edge"} ::: The `arity` option can be used to enforce the expected number of files, either as a number or a range. @@ -1009,7 +1009,7 @@ The `stdout` qualifier allows you to output the `stdout` of the executed process ### Eval output (`eval`) -:::note{title="Version added 24.02.0-edge"} +:::note{title="New in version 24.02.0-edge"} ::: The `eval` qualifier allows you to capture the standard output of an arbitrary command evaluated the task shell interpreter context: @@ -1288,7 +1288,7 @@ In this example, each task requests 8 GB of memory, plus the size of the input f ### Dynamic task resources with previous execution trace -:::note{title="Version added 24.10.0"} +:::note{title="New in version 24.10.0"} ::: Task resource requests can be updated relative to the [trace file][trace-report] metrics of the previous task attempt. The metrics can be accessed through the `task.previousTrace` variable. For example: diff --git a/docs/reports.mdx b/docs/reports.mdx index 57660ea51f..7707335306 100644 --- a/docs/reports.mdx +++ b/docs/reports.mdx @@ -416,7 +416,7 @@ The following table shows the fields that can be included in the execution repor `hostname` - :::note{title="Version added 25.05.0-edge"} + :::note{title="New in version 25.05.0-edge"} ::: The host on which the task was executed. Supported only for the Kubernetes executor yet. Activate with `k8s.fetchNodeName = true` in the Nextflow config file. @@ -425,7 +425,7 @@ The following table shows the fields that can be included in the execution repor `cpu_model` - :::note{title="Version added 22.07.0-edge"} + :::note{title="New in version 22.07.0-edge"} ::: The name of the CPU model used to execute the task. This data is read from file `/proc/cpuinfo`. @@ -469,7 +469,7 @@ The workflow DAG can be rendered in a different format by specifying an output f nextflow run main.nf -with-dag flowchart.png ``` -:::note{title="Version changed 23.10.0"} +:::warning{title="Changed in version 23.10.0"} The default output format was changed from DOT to HTML. ::: @@ -494,7 +494,7 @@ The following file formats are supported: HTML file with Mermaid diagram - :::note{title="Version changed 23.10.0"} + :::warning{title="Changed in version 23.10.0"} The HTML format was changed to render a Mermaid diagram instead of a Cytoscape diagram. ::: diff --git a/docs/secrets.md b/docs/secrets.md index 249ec211d5..33c974ffb9 100644 --- a/docs/secrets.md +++ b/docs/secrets.md @@ -1,6 +1,6 @@ # Secrets -:::note{title="Version added 22.10.0"} +:::note{title="New in version 22.10.0"} Previewed in `21.09.0-edge`. ::: @@ -91,7 +91,7 @@ This feature is only available when using the local or grid executors (Slurm, Gr ## Pipeline script -:::note{title="Version added 24.03.0-edge"} +:::note{title="New in version 24.03.0-edge"} ::: Secrets can be accessed in the pipeline script using the `secrets` variable. For example: diff --git a/docs/workflow.md b/docs/workflow.md index 1bd53f00c9..87f57a675d 100644 --- a/docs/workflow.md +++ b/docs/workflow.md @@ -371,7 +371,7 @@ workflow { ## Process and workflow recursion -:::note{title="Version added 21.11.0-edge"} +:::note{title="New in version 21.11.0-edge"} ::: :::note @@ -417,14 +417,14 @@ Workflows can also be invoked recursively: ## Workflow outputs -:::note{title="Version added 24.04.0"} +:::note{title="New in version 24.04.0"} ::: -:::note{title="Version changed 24.10.0"} +:::warning{title="Changed in version 24.10.0"} A second preview version was introduced. See the [migration notes][workflow-outputs-second-preview] for details. ::: -:::note{title="Version changed 25.04.0-edge"} +:::warning{title="Changed in version 25.04.0-edge"} A third preview version was introduced. See the [migration notes][workflow-outputs-third-preview] for details. ::: diff --git a/docs/your-first-script.md b/docs/your-first-script.md index 34ac23b834..f580d4b305 100644 --- a/docs/your-first-script.md +++ b/docs/your-first-script.md @@ -195,6 +195,6 @@ See [Pipeline parameters][cli-params] for more information about modifying pipel Your first script is a brief introduction to running pipelines, modifying and resuming pipelines, and pipeline parameters. See [training.nextflow.io](https://training.nextflow.io/) for further Nextflow training modules. -[cache-resume-page]: /nextflow_docs/nextflow_repo/docs/cache-and-resume.md -[cli-params]: /nextflow_docs/nextflow_repo/docs/cli.md#pipeline-parameters +[cache-resume-page]: /nextflow_docs/nextflow_repo/docs/cache-and-resume +[cli-params]: /nextflow_docs/nextflow_repo/docs/cli#pipeline-parameters [install-page]: /nextflow_docs/nextflow_repo/docs/install \ No newline at end of file From f0467b6b48e7652a389cccc27035510be5c29da7 Mon Sep 17 00:00:00 2001 From: Christopher Hakkaart Date: Fri, 27 Jun 2025 16:27:01 +1200 Subject: [PATCH 08/38] Migrate conda, container, and git pages Signed-off-by: Christopher Hakkaart --- docs/conda.md | 76 +++++++++---------- docs/container.md | 157 ++++++++++++++++++--------------------- docs/{git.md => git.mdx} | 0 docs/install.md | 6 +- 4 files changed, 114 insertions(+), 125 deletions(-) rename docs/{git.md => git.mdx} (100%) diff --git a/docs/conda.md b/docs/conda.md index 8b9c7a1faa..893c415157 100644 --- a/docs/conda.md +++ b/docs/conda.md @@ -1,39 +1,37 @@ -(conda-page)= - # Conda environments [Conda](https://conda.io/) is an open source package and environment management system that simplifies the installation and the configuration of complex software packages in a platform agnostic manner. -Nextflow has built-in support for Conda that allows the configuration of workflow dependencies using Conda recipes and environment files. +Nextflow has built-in support for conda that allows the configuration of workflow dependencies using conda recipes and environment files. This allows Nextflow applications to use popular tool collections such as [Bioconda](https://bioconda.github.io) and the [Python Package index](https://pypi.org/), whilst taking advantage of the configuration flexibility provided by Nextflow. ## Prerequisites -This feature requires the Conda or [Miniconda](https://conda.io/miniconda.html) package manager to be installed on your system. +This feature requires the conda or [Miniconda](https://conda.io/miniconda.html) package manager to be installed on your system. ## How it works -Nextflow automatically creates and activates the Conda environment(s) given the dependencies specified by each process. +Nextflow automatically creates and activates the conda environment(s) given the dependencies specified by each process. -Dependencies are specified by using the {ref}`process-conda` directive, providing either the names of the required Conda packages, the path of a Conda environment yaml file, or the path of an existing Conda environment directory. +Dependencies are specified by using the [conda][process-conda] directive, providing either the names of the required conda packages, the path of a conda environment yaml file, or the path of an existing conda environment directory. -:::{note} -Conda environments are stored on the file system. By default, Nextflow instructs Conda to save the required environments in the pipeline work directory. The same environment may be created/saved multiple times across multiple executions when using different work directories. +:::note +Conda environments are stored on the file system. By default, Nextflow instructs conda to save the required environments in the pipeline work directory. The same environment may be created/saved multiple times across multiple executions when using different work directories. ::: -You can specify the directory where the Conda environments are stored using the `conda.cacheDir` configuration property. When using a computing cluster, make sure to use a shared file system path accessible from all compute nodes. See the {ref}`configuration page ` for details about Conda configuration. +You can specify the directory where the conda environments are stored using the `conda.cacheDir` configuration property. When using a computing cluster, make sure to use a shared file system path accessible from all compute nodes. See [Configuration][config-conda] for details about conda configuration. -:::{warning} -The Conda environment feature is not supported by executors that use remote object storage as a work directory. For example, AWS Batch. +:::warning +The conda environment feature is not supported by executors that use remote object storage as a work directory. For example, AWS Batch. ::: -### Enabling Conda environment +### Enabling conda environment -:::{versionadded} 22.08.0-edge +:::note{title="Added in version 22.08.0-edge"} ::: -The use of Conda recipes specified using the {ref}`process-conda` directive needs to be enabled explicitly in the pipeline configuration file (i.e. `nextflow.config`): +The use of conda recipes specified using the [conda][process-conda] directive needs to be enabled explicitly in the pipeline configuration file (i.e., `nextflow.config`): ```groovy conda.enabled = true @@ -41,7 +39,7 @@ conda.enabled = true Alternatively, it can be specified by setting the variable `NXF_CONDA_ENABLED=true` in your environment or by using the `-with-conda` command line option. -### Use Conda package names +### Use conda package names Conda package names can specified using the `conda` directive. Multiple package names can be specified by separating them with a blank space. For example: @@ -56,16 +54,15 @@ process hello { } ``` -Using the above definition, a Conda environment that includes BWA, Samtools and MultiQC tools is created and activated when the process is executed. +Using the above definition, a conda environment that includes BWA, Samtools and MultiQC tools is created and activated when the process is executed. -The usual Conda package syntax and naming conventions can be used. The version of a package can be specified after the package name as shown here `bwa=0.7.15`. +The usual conda package syntax and naming conventions can be used. The version of a package can be specified after the package name as shown here `bwa=0.7.15`. The name of the channel where a package is located can be specified prefixing the package with the channel name as shown here `bioconda::bwa=0.7.15`. -(conda-env-files)= -### Use Conda environment files +### Use conda environment files -Conda environments can also be defined using one or more Conda environment files. This is a file that lists the required packages and channels structured using the YAML format. For example: +Conda environments can also be defined using one or more conda environment files. This is a file that lists the required packages and channels structured using the YAML format. For example: ```yaml name: my-env @@ -77,7 +74,7 @@ dependencies: - bwa=0.7.15 ``` -Read the Conda documentation for more details about how to create [environment files](https://conda.io/docs/user-guide/tasks/manage-environments.html#creating-an-environment-file-manually). +Read the conda documentation for more details about how to create [environment files](https://conda.io/docs/user-guide/tasks/manage-environments.html#creating-an-environment-file-manually). The path of an environment file can be specified using the `conda` directive: @@ -92,11 +89,10 @@ process hello { } ``` -:::{warning} +:::warning The environment file name **must** have a `.yml` or `.yaml` extension or else it won't be properly recognised. ::: -(conda-pypi)= ### Python Packages from PyPI Conda environment files can also be used to install Python packages from the [PyPI repository](https://pypi.org/), through the `pip` package manager (which must also be explicitly listed as a required package): @@ -124,15 +120,15 @@ bioconda::bwa=0.7.15 bioconda::multiqc=1.4 ``` -:::{note} +:::note Dependency files must be a text file with the `.txt` extension. ::: ### Conda lock files -The final method for providing packages to Conda is by using [Conda lock files](https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#identical-conda-envs). +The final method for providing packages to conda is by using [conda lock files](https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#identical-conda-envs). -To generate a lock file from an existing Conda environment, run the following command: +To generate a lock file from an existing conda environment, run the following command: ```bash conda list --explicit > spec-file.txt @@ -144,9 +140,9 @@ If you're using Mamba or Micromamba, use this command instead: micromamba env export --explicit > spec-file.txt ``` -You can also download Conda lock files from [Wave](https://seqera.io/wave/) build pages. +You can also download conda lock files from [Wave](https://seqera.io/wave/) build pages. -These files list every package and its dependencies, so Conda doesn't need to resolve the environment. This makes environment setup faster and more reproducible. +These files list every package and its dependencies, so conda doesn't need to resolve the environment. This makes environment setup faster and more reproducible. Each file includes package URLs and, optionally, an MD5 hash for verifying file integrity: @@ -163,15 +159,15 @@ https://conda.anaconda.org/conda-forge/linux-64/libgcc-ng-13.2.0-h77fa898_7.cond # .. and so on ``` -To use a Conda lock file with Nextflow, set the `conda` directive to the path of the lock file. +To use a conda lock file with Nextflow, set the `conda` directive to the path of the lock file. -:::{note} +:::note Conda lock files must be a text file with the `.txt` extension. ::: -### Use existing Conda environments +### Use existing conda environments -If you already have a local Conda environment, you can use it in your workflow specifying the installation directory of such environment by using the `conda` directive: +If you already have a local conda environment, you can use it in your workflow specifying the installation directory of such environment by using the `conda` directive: ```nextflow process hello { @@ -186,16 +182,16 @@ process hello { ### Use Mamba to resolve packages -:::{warning} *Experimental: may change in a future release.* +:::warning{title="Experimental: may change in a future release."} ::: -It is also possible to use [mamba](https://github.com/mamba-org/mamba) to speed up the creation of conda environments. For more information on how to enable this feature please refer to {ref}`Conda `. +It is also possible to use [mamba](https://github.com/mamba-org/mamba) to speed up the creation of conda environments. For more information on how to enable this feature please refer to [conda][config-conda]. ## Best practices -When a `conda` directive is used in any `process` definition within the workflow script, Conda tool is required for the workflow execution. +When a `conda` directive is used in any `process` definition within the workflow script, conda tool is required for the workflow execution. -Specifying the Conda environments in a separate configuration {ref}`profile ` is therefore recommended to allow the execution via a command line option and to enhance the workflow portability. For example: +Specifying the conda environments in a separate configuration [profile][config-profiles] is therefore recommended to allow the execution via a command line option and to enhance the workflow portability. For example: ```groovy profiles { @@ -210,8 +206,12 @@ profiles { } ``` -The above configuration snippet allows the execution either with Conda or Docker specifying `-profile conda` or `-profile docker` when running the workflow script. +The above configuration snippet allows the execution either with conda or Docker specifying `-profile conda` or `-profile docker` when running the workflow script. ## Advanced settings -Conda advanced configuration settings are described in the {ref}`Conda ` section on the Nextflow configuration page. +Conda advanced configuration settings are described in the [conda][config-conda] section on the Nextflow configuration page. + +[config-conda]: /nextflow_docs/nextflow_repo/docs/reference/config#conda +[config-profiles]: /nextflow_docs/nextflow_repo/docs/config#config-profiles +[process-conda]: /nextflow_docs/nextflow_repo/docs/reference/process#conda diff --git a/docs/container.md b/docs/container.md index 5c4257fef3..9f8ac84b5b 100644 --- a/docs/container.md +++ b/docs/container.md @@ -1,18 +1,14 @@ -(container-page)= - # Containers Nextflow supports a variety of container runtimes. Containerization allows you to write self-contained and truly reproducible computational pipelines, by packaging the binary dependencies of a script into a standard and portable format that can be executed on any platform that supports a container runtime. Furthermore, the same pipeline can be transparently executed with any of the supported container runtimes, depending on which runtimes are available in the target compute environment. -:::{note} -When creating a container image to use with Nextflow, make sure that Bash (3.x or later) and `ps` are installed in the image, along with other tools required for collecting metrics (See {ref}`this section `). Bash should be available on the path `/bin/bash` and it should be the container entrypoint. +:::note +When creating a container image to use with Nextflow, make sure that Bash (3.x or later) and `ps` are installed in the image, along with other tools required for collecting metrics (see [Tasks][execution-report-tasks] for more information). Bash should be available on the path `/bin/bash` and it should be the container entrypoint. ::: -(container-apptainer)= - ## Apptainer -:::{versionadded} 22.11.0-edge +:::note{title="Added in version 22.11.0-edge"} ::: [Apptainer](https://apptainer.org) is an alternative container runtime to Docker and an open source fork of Singularity. The main advantages of Apptainer are that it can be used without root privileges and it doesn't require a separate daemon process. These, along with other features such as support for autofs mounts, makes Apptainer better suited to the requirements of HPC workloads. Apptainer is able to use existing Docker images and can pull from Docker registries. @@ -27,7 +23,7 @@ Apptainer makes use of a container image file, which physically contains the con Apptainer allows paths that do not currently exist within the container to be created and mounted dynamically by specifying them on the command line. However this feature is only supported on hosts that support the [Overlay file system](https://en.wikipedia.org/wiki/OverlayFS) and is not enabled by default. -:::{note} +:::note Nextflow expects that data paths are defined system wide, and your Apptainer images need to be created having the mount paths defined in the container file system. ::: @@ -38,12 +34,12 @@ If your Apptainer installation support the "user bind control" feature, enable t The integration for Apptainer follows the same execution model implemented for Docker. You won't need to modify your Nextflow script in order to run it with Apptainer. Simply specify the Apptainer image file from where the containers are started by using the `-with-apptainer` command line option. For example: ```bash -nextflow run -with-apptainer [apptainer image file] +nextflow run main.nf -with-apptainer [apptainer image file] ``` Every time your script launches a process execution, Nextflow will run it into a Apptainer container created by using the specified image. In practice Nextflow will automatically wrap your processes and launch them by running the `apptainer exec` command with the image you have provided. -:::{note} +:::note A Apptainer image can contain any tool or piece of software you may need to carry out a process execution. Moreover, the container is run in such a way that the process result files are created in the host file system, thus it behaves in a completely transparent manner without requiring extra steps or affecting the flow in your pipeline. ::: @@ -56,17 +52,17 @@ apptainer.enabled = true In the above example replace `/path/to/apptainer.img` with any Apptainer image of your choice. -Read the {ref}`config-page` page to learn more about the configuration file and how to use it to configure your pipeline execution. +See [Configuration][config-page] to learn more about the configuration file and how to use it to configure your pipeline execution. -:::{note} -Unlike Docker, Nextflow does not automatically mount host paths in the container when using Apptainer. It expects that the paths are configured and mounted system wide by the Apptainer runtime. If your Apptainer installation allows user defined bind points, read the {ref}`Apptainer configuration ` section to learn how to enable Nextflow auto mounts. +:::note +Unlike Docker, Nextflow does not automatically mount host paths in the container when using Apptainer. It expects that the paths are configured and mounted system wide by the Apptainer runtime. If your Apptainer installation allows user defined bind points. See [Apptainer][config-apptainer] to learn how to enable Nextflow auto mounts. ::: -:::{warning} +:::warning When a process input is a *symbolic link* file, make sure the linked file is stored in a host folder that is accessible from a bind path defined in your Apptainer installation. Otherwise the process execution will fail because the launched container won't be able to access the linked file. ::: -:::{versionchanged} 23.07.0-edge +:::warning{title="Changed in version 23.07.0-edge"} Nextflow no longer mounts the home directory when launching an Apptainer container. To re-enable the old behavior, set the environment variable `NXF_APPTAINER_HOME_MOUNT` to `true`. ::: @@ -88,7 +84,7 @@ apptainer { } ``` -Read the {ref}`Process scope ` section to learn more about processes configuration. +See [Process configuration][config-process] to learn more about processes configuration. ### Apptainer & Docker Hub @@ -103,7 +99,7 @@ process.container = 'file:///path/to/apptainer.img' apptainer.enabled = true ``` -:::{warning} +:::warning Use three `/` slashes to specify an **absolute** file path, otherwise the path will be interpreted as relative to the workflow launch directory. ::: @@ -116,7 +112,7 @@ apptainer.enabled = true You do not need to specify `docker://` to pull from a Docker repository. Nextflow will automatically prepend it to your image name when Apptainer is enabled. Additionally, the Docker engine will not work with containers specified as `docker://`. -:::{note} +:::note This feature requires the `apptainer` tool to be installed where the workflow execution is launched (as opposed to the compute nodes). ::: @@ -126,34 +122,32 @@ Nextflow uses the library directory to determine the location of Apptainer conta Nextflow first checks the library directory when searching for the image. If the image is not found it then checks the cache directory. The main difference between the library directory and the cache directory is that the first is assumed to be a read-only container repository, while the latter is expected to be writable path where container images can added for caching purposes. -:::{warning} +:::warning When using a compute cluster, the Apptainer cache directory must reside in a shared filesystem accessible to all compute nodes. ::: -:::{danger} -When pulling Docker images, Apptainer may be unable to determine the container size if the image was stored using an old Docker format, resulting in a pipeline execution error. See the Apptainer documentation for details. +:::danger +When pulling Docker images, Apptainer may be unable to determine the container size if the image was stored using an old Docker format, resulting in a pipeline execution error. See Apptainer documentation for details. ::: ### Advanced settings -Apptainer advanced configuration settings are described in {ref}`config-apptainer` section in the Nextflow configuration page. - -(container-charliecloud)= +Apptainer advanced configuration settings are described in[Apptainer][config-apptainer] section in the Nextflow configuration page. ## Charliecloud -:::{versionadded} 20.12.0-edge +:::note{title="Added in version 20.12.0-edge"} ::: -:::{versionchanged} 21.03.0-edge +:::warning{title="Changed in version 21.03.0-edge"} Requires Charliecloud 0.22 to 0.27. ::: -:::{versionchanged} 22.09.0-edge +:::warning{title="Changed in version 22.09.0-edge"} Requires Charliecloud 0.28 or later. ::: -:::{warning} *Experimental: not recommended for production environments.* +:::warning{title="Experimental: not recommended for production environments."} ::: [Charliecloud](https://hpc.github.io/charliecloud) is an alternative container runtime to Docker, that is better suited for use in HPC environments. Its main advantage is that it can be used without root privileges, making use of user namespaces in the Linux kernel. Charliecloud is able to pull from Docker registries. @@ -167,12 +161,12 @@ You will need Charliecloud installed in your execution environment e.g. on your You won't need to modify your Nextflow script in order to run it with Charliecloud. Simply specify the docker image from where the containers are started by using the `-with-charliecloud` command line option. For example: ```bash -nextflow run -with-charliecloud [container] +nextflow run main.nf -with-charliecloud [container] ``` Every time your script launches a process execution, Nextflow will run it into a charliecloud container created by using the specified container image. In practice Nextflow will automatically wrap your processes and run them by executing the `ch-run` command with the container you have provided. -:::{note} +:::note A container image can contain any tool or piece of software you may need to carry out a process execution. Moreover, the container is run in such a way that the process result files are created in the host file system, thus it behaves in a completely transparent manner without requiring extra steps or affecting the flow in your pipeline. ::: @@ -183,13 +177,13 @@ process.container = '/path/to/container' charliecloud.enabled = true ``` -:::{warning} -If an absolute path is provided, the container needs to be in the Charliecloud flat directory format. See the section below on compatibility with Docker registries. +:::warning +If an absolute path is provided, the container needs to be in the Charliecloud flat directory format. See below on compatibility with Docker registries. ::: -Read the {ref}`config-page` page to learn more about the configuration file and how to use it to configure your pipeline execution. +See [Configuration][config-page] to learn more about the configuration file and how to use it to configure your pipeline execution. -:::{warning} +:::warning Nextflow automatically manages the file system mounts whenever a container is launched depending on the process input files. However, when a process input is a *symbolic link*, the linked file **must** be stored in the same folder where the symlink is located, or a sub-folder of it. Otherwise the process execution will fail because the launched container won't be able to access the linked file. ::: @@ -232,7 +226,7 @@ charliecloud { } ``` -Read the {ref}`Process scope ` section to learn more about processes configuration. +See [Process configuration][config-process] to learn more about processes configuration. After running your pipeline, you can easily query the container image that each process used with the following command: @@ -242,9 +236,7 @@ nextflow log last -f name,container ### Advanced settings -Charliecloud advanced configuration settings are described in {ref}`config-charliecloud` section in the Nextflow configuration page. - -(container-docker)= +Charliecloud advanced configuration settings are described in the [Charliecloud][config-charliecloud] section in the Nextflow configuration page. ## Docker @@ -261,12 +253,12 @@ If you are running Docker on Mac OSX make sure you are mounting your local `/Use You won't need to modify your Nextflow script in order to run it with Docker. Simply specify the Docker image from where the containers are started by using the `-with-docker` command line option. For example: ```bash -nextflow run -with-docker [docker image] +nextflow run main.nf -with-docker [docker image] ``` Every time your script launches a process execution, Nextflow will run it into a Docker container created by using the specified image. In practice Nextflow will automatically wrap your processes and run them by executing the `docker run` command with the image you have provided. -:::{note} +:::note A Docker image can contain any tool or piece of software you may need to carry out a process execution. Moreover, the container is run in such a way that the process result files are created in the host file system, thus it behaves in a completely transparent manner without requiring extra steps or affecting the flow in your pipeline. ::: @@ -279,9 +271,9 @@ docker.enabled = true In the above example replace `nextflow/examples:latest` with any Docker image of your choice. -Read the {ref}`config-page` page to learn more about the configuration file and how to use it to configure your pipeline execution. +See [Configuration][config-page] to learn more about the configuration file and how to use it to configure your pipeline execution. -:::{warning} +:::warning Nextflow automatically manages the file system mounts whenever a container is launched depending on the process input files. However, when a process input is a *symbolic link*, the linked file **must** be stored in the same folder where the symlink is located, or a sub-folder of it. Otherwise the process execution will fail because the launched container won't be able to access the linked file. ::: @@ -326,17 +318,15 @@ docker { } ``` -Read the {ref}`Process scope ` section to learn more about processes configuration. +See [Process configuration][config-process] to learn more. ### Advanced settings -Docker advanced configuration settings are described in {ref}`config-docker` section in the Nextflow configuration page. - -(container-podman)= +Docker advanced configuration settings are described in the [docker][config-docker] section in the Nextflow configuration page. ## Podman -:::{versionadded} 20.01.0 +:::note{title="Added in version 20.01.0"} ::: [Podman](http://www.podman.io) is a drop-in replacement for Docker that can run containers with or without root privileges. @@ -350,12 +340,12 @@ You will need Podman installed on your execution environment e.g. your computer You won't need to modify your Nextflow script in order to run it with Podman. Simply specify the Podman image from where the containers are started by using the `-with-podman` command line option. For example: ```bash -nextflow run -with-podman [OCI container image] +nextflow run main.nf -with-podman [OCI container image] ``` Every time your script launches a process execution, Nextflow will run it into a Podman container created by using the specified image. In practice Nextflow will automatically wrap your processes and run them by executing the `podman run` command with the image you have provided. -:::{note} +:::note An OCI container image can contain any tool or piece of software you may need to carry out a process execution. Moreover, the container is run in such a way that the process result files are created in the host file system, thus it behaves in a completely transparent manner without requiring extra steps or affecting the flow in your pipeline. ::: @@ -368,9 +358,9 @@ podman.enabled = true In the above example replace `nextflow/examples:latest` with any Podman image of your choice. -Read the {ref}`config-page` page to learn more about the configuration file and how to use it to configure your pipeline execution. +See [Configuration][config-page] to learn more about the configuration file and how to use it to configure your pipeline execution. -:::{warning} +:::warning Nextflow automatically manages the file system mounts whenever a container is launched depending on the process input files. However, when a process input is a *symbolic link*, the linked file **must** be stored in the same folder where the symlink is located, or a sub-folder of it. Otherwise the process execution will fail because the launched container won't be able to access the linked file. ::: @@ -415,17 +405,15 @@ podman { } ``` -Read the {ref}`Process scope ` section to learn more about processes configuration. +See[Process configuration][config-process] to learn more about processes configuration. ### Advanced settings -Podman advanced configuration settings are described in {ref}`config-podman` section in the Nextflow configuration page. - -(container-sarus)= +Podman advanced configuration settings are described in the [podman][config-podman] section in the Nextflow configuration page. ## Sarus -:::{versionadded} 22.12.0-edge +:::note{title="Added in verison 22.12.0-edge"} Requires Sarus 1.5.1 or later. ::: @@ -450,7 +438,7 @@ sarus.enabled = true and it will always try to search the Docker Hub registry for the images. -:::{note} +:::note if you do not specify an image tag, the `latest` tag will be fetched by default. ::: @@ -473,16 +461,10 @@ sarus { } ``` -Read the {ref}`Process scope ` section to learn more about processes configuration. - -(container-shifter)= +See [Process configuration][config-process] to learn more about processes configuration. ## Shifter -:::{versionadded} 19.10.0 -Requires Shifter 18.03 or later. -::: - [Shifter](https://docs.nersc.gov/programming/shifter/overview/) is an alternative container runtime to Docker. Shifter works by converting Docker images to a common format that can then be distributed and launched on HPC systems. The user interface to Shifter enables a user to select an image from [Docker Hub](https://hub.docker.com/) and then submit jobs which run entirely within the container. ### Prerequisites @@ -523,9 +505,7 @@ shifter { } ``` -Read the {ref}`Process scope ` section to learn more about processes configuration. - -(container-singularity)= +See [Process configuration][config-process] to learn more about processes configuration. ## Singularity @@ -541,7 +521,7 @@ Singularity makes use of a container image file, which physically contains the c Singularity allows paths that do not currently exist within the container to be created and mounted dynamically by specifying them on the command line. However this feature is only supported on hosts that support the [Overlay file system](https://en.wikipedia.org/wiki/OverlayFS) and is not enabled by default. -:::{note} +:::note Nextflow expects that data paths are defined system wide, and your Singularity images need to be created having the mount paths defined in the container file system. ::: @@ -552,12 +532,12 @@ If your Singularity installation support the "user bind control" feature, enable The integration for Singularity follows the same execution model implemented for Docker. You won't need to modify your Nextflow script in order to run it with Singularity. Simply specify the Singularity image file from where the containers are started by using the `-with-singularity` command line option. For example: ```bash -nextflow run -with-singularity [singularity image file] +nextflow run main.nf -with-singularity [singularity image file] ``` Every time your script launches a process execution, Nextflow will run it into a Singularity container created by using the specified image. In practice Nextflow will automatically wrap your processes and launch them by running the `singularity exec` command with the image you have provided. -:::{note} +:::note A Singularity image can contain any tool or piece of software you may need to carry out a process execution. Moreover, the container is run in such a way that the process result files are created in the host file system, thus it behaves in a completely transparent manner without requiring extra steps or affecting the flow in your pipeline. ::: @@ -570,25 +550,25 @@ singularity.enabled = true In the above example replace `/path/to/singularity.img` with any Singularity image of your choice. -Read the {ref}`config-page` page to learn more about the configuration file and how to use it to configure your pipeline execution. +See [Configuration][config-page] to learn more about the configuration file and how to use it to configure your pipeline execution. -:::{note} -Unlike Docker, Nextflow does not automatically mount host paths in the container when using Singularity. It expects that the paths are configure and mounted system wide by the Singularity runtime. If your Singularity installation allows user defined bind points, read the {ref}`Singularity configuration ` section to learn how to enable Nextflow auto mounts. +:::note +Unlike Docker, Nextflow does not automatically mount host paths in the container when using Singularity. It expects that the paths are configure and mounted system wide by the Singularity runtime. If your Singularity installation allows user defined bind points, see [singularity][config-singularity] to learn how to enable Nextflow auto mounts. ::: -:::{warning} +:::warning When a process input is a *symbolic link* file, make sure the linked file is stored in a host folder that is accessible from a bind path defined in your Singularity installation. Otherwise the process execution will fail because the launched container won't be able to access the linked file. ::: -:::{versionchanged} 23.07.0-edge +:::warning{title="Changed in version 23.07.0-edge"} Nextflow no longer mounts the home directory when launching a Singularity container. To re-enable the old behavior, set the environment variable `NXF_SINGULARITY_HOME_MOUNT` to `true`. ::: -:::{versionchanged} 23.09.0-edge +:::warning{title="Changed in version 23.09.0-edge"} Nextflow automatically mounts the required host paths in the container. To re-enable the old behavior, set the environment variable `NXF_SINGULARITY_AUTO_MOUNTS` to `false` or set `singularity.autoMounts=false` in the Nextflow configuration file. ::: -:::{versionchanged} 23.09.0-edge +:::warning{title="Changed in version 23.09.0-edge"} The execution command for Singularity/Apptainer containers can be set to `run` by means of the environment variable `NXF_SINGULARITY_RUN_COMMAND` (default command is `exec`). ::: @@ -611,7 +591,7 @@ singularity { } ``` -Read the {ref}`Process scope ` section to learn more about processes configuration. +See [Process configuration][config-process] to learn more about processes configuration. ### Singularity & Docker Hub @@ -628,7 +608,7 @@ process.container = 'file:///path/to/singularity.img' singularity.enabled = true ``` -:::{warning} +:::warning Use three `/` slashes to specify an **absolute** file path, otherwise the path will be interpreted as relative to the workflow launch directory. ::: @@ -641,7 +621,7 @@ singularity.enabled = true You do not need to specify `docker://` to pull from a Docker repository. Nextflow will automatically prepend it to your image name when Singularity is enabled. Additionally, the Docker engine will not work with containers specified as `docker://`. -:::{versionadded} 19.04.0 +:::note{title="Added in verison 19.04.0"} Requires Singularity 3.0.3 or later. ::: @@ -659,14 +639,23 @@ Nextflow uses the library directory to determine the location of Singularity ima Nextflow first checks the library directory when searching for the image. If the image is not found it then checks the cache directory. The main difference between the library directory and the cache directory is that the first is assumed to be a read-only container repository, while the latter is expected to be writable path where container images can added for caching purposes. -:::{warning} +:::warning When using a compute cluster, the Singularity cache directory must reside in a shared filesystem accessible to all compute nodes. ::: -:::{danger} -When pulling Docker images, Singularity may be unable to determine the container size if the image was stored using an old Docker format, resulting in a pipeline execution error. See the Singularity documentation for details. +:::danger +When pulling Docker images, Singularity may be unable to determine the container size if the image was stored using an old Docker format, resulting in a pipeline execution error. See Singularity documentation for details. ::: ### Advanced settings -Singularity advanced configuration settings are described in {ref}`config-singularity` section in the Nextflow configuration page. +Singularity advanced configuration settings are described in the [singularity][config-singularity] section in the Nextflow configuration page. + +[execution-report-tasks]: /nextflow_docs/nextflow_repo/docs/reports#tasks +[config-page]: /nextflow_docs/nextflow_repo/docs/config +[config-apptainer]: /nextflow_docs/nextflow_repo/docs/reference/config#apptainer +[config-process]: /nextflow_docs/nextflow_repo/docs/config#process-configuration +[config-charliecloud]: /nextflow_docs/nextflow_repo/docs/reference/config#charliecloud +[config-docker]: /nextflow_docs/nextflow_repo/docs/reference/config#docker +[config-podman]: /nextflow_docs/nextflow_repo/docs/reference/config#podman +[config-singularity]: /nextflow_docs/nextflow_repo/docs/reference/config#singularity diff --git a/docs/git.md b/docs/git.mdx similarity index 100% rename from docs/git.md rename to docs/git.mdx diff --git a/docs/install.md b/docs/install.md index 806aeb5101..88cfbef6cc 100644 --- a/docs/install.md +++ b/docs/install.md @@ -24,7 +24,7 @@ To install Java with SDKMAN: 1. [Install SDKMAN](https://sdkman.io/install): - ``` + ```bash curl -s https://get.sdkman.io | bash ``` @@ -32,13 +32,13 @@ To install Java with SDKMAN: 3. Install Java: - ``` + ```bash sdk install java 17.0.10-tem ``` 4. Confirm that Java is installed correctly: - ``` + ```bash java -version ``` From 49015d90113235b7b92b683a3a45a53e6a320ca6 Mon Sep 17 00:00:00 2001 From: Christopher Hakkaart Date: Mon, 30 Jun 2025 12:18:28 +1200 Subject: [PATCH 09/38] Migrated compute and storage sections Signed-off-by: Christopher Hakkaart --- docs/amazons3.md | 11 ++++-- docs/aws.md | 95 +++++++++++++++++++++------------------------- docs/azure.md | 52 ++++++++++++------------- docs/fusion.md | 11 ++---- docs/google.md | 79 ++++++++++++++++++++------------------ docs/kubernetes.md | 43 +++++++++++---------- docs/spack.md | 34 +++++++++-------- docs/wave.md | 54 +++++++++++++------------- 8 files changed, 192 insertions(+), 187 deletions(-) diff --git a/docs/amazons3.md b/docs/amazons3.md index 95c6cfb4cb..26dec1d72c 100644 --- a/docs/amazons3.md +++ b/docs/amazons3.md @@ -1,5 +1,3 @@ -(amazons3-page)= - # Amazon S3 Nextflow includes support for AWS S3 storage. Files stored in an S3 bucket can be accessed transparently in your pipeline script like any other file in the local file system. @@ -20,7 +18,7 @@ The usual file operations can be applied to a path handle with the above notatio println file('s3://my-bucket/data/sequences.fa').text ``` -See {ref}`working-with-files` and the {ref}`stdlib-types-path` reference to learn more about available file operations. +See [Working with files][working-with-files] and the [Path][stdlib-types-path] reference to learn more about available file operations. ## Security credentials @@ -96,4 +94,9 @@ aws { ## Advanced configuration -Read {ref}`AWS configuration` section to learn more about advanced S3 client configuration options. +See [AWS configuration][config-aws] for more information about advanced S3 client configuration options. + + +[config-aws]: /nextflow_docs/nextflow_repo/docs/reference/config#aws +[stdlib-types-path]: /nextflow_docs/nextflow_repo/docs/reference/stdlib-types#path +[working-with-files]: /nextflow_docs/nextflow_repo/docs/working-with-files \ No newline at end of file diff --git a/docs/aws.md b/docs/aws.md index fce55b3ca2..b26eb3c2ef 100644 --- a/docs/aws.md +++ b/docs/aws.md @@ -1,8 +1,6 @@ -(aws-page)= - # Amazon Web Services -:::{tip} +:::tip This page describes how to manually set up and use Nextflow with AWS Cloud. You may be interested in using [Batch Forge](https://docs.seqera.io/platform/latest/compute-envs/aws-batch) in [Seqera Platform](https://seqera.io/platform/), which automatically creates the required AWS infrastructure for you with minimal intervention. @@ -14,7 +12,7 @@ Nextflow uses the [AWS security credentials](https://docs.aws.amazon.com/general The AWS credentials are selected from the following sources, in order of descending priority: -1. Nextflow configuration file - `aws.accessKey` and `aws.secretKey`. See {ref}`AWS configuration` for more details. +1. Nextflow configuration file - `aws.accessKey` and `aws.secretKey`. See [aws][config-aws] for more details. 2. A custom profile in `$HOME/.aws/credentials` and/or `$HOME/.aws/config`. The profile can be supplied from the `aws.profile` config option, or the `AWS_PROFILE` or `AWS_DEFAULT_PROFILE` environmental variables. @@ -24,7 +22,7 @@ The AWS credentials are selected from the following sources, in order of descend 5. Single Sign-On (SSO) credentials. See the [AWS documentation](https://docs.aws.amazon.com/cli/latest/userguide/sso-configure-profile-token.html) for more details. - :::{versionadded} 23.07.0-edge + :::note{title="Added in version 23.07.0-edge"} ::: 6. EC2 instance profile credentials. See the [AWS documentation](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/iam-roles-for-amazon-ec2.html) and [this blog post](https://aws.amazon.com/blogs/security/granting-permission-to-launch-ec2-instances-with-iam-roles-passrole-permission/) for more details. @@ -88,7 +86,7 @@ Minimal permissions policies to be attached to the AWS account used by Nextflow Alternatively, you can use AWS provided `AmazonEC2ContainerRegistryReadOnly` managed policy. -:::{note} +:::note If you are running Fargate or Fargate Spot, you may need the following policies in addition to the listed above: ```json "ec2:DescribeSubnets" @@ -150,28 +148,24 @@ Depending on the pipeline configuration, the above actions can be done all in a See the [bucket policy documentation](https://docs.aws.amazon.com/config/latest/developerguide/s3-bucket-policy.html) for additional details. -(aws-batch)= - ## AWS Batch [AWS Batch](https://aws.amazon.com/batch/) is a managed computing service that allows the execution of containerised workloads in the AWS cloud infrastructure. It dynamically provisions the optimal quantity and type of compute resources (e.g., CPU or memory optimized compute resources) based on the volume and specific resource requirements of the jobs submitted. Nextflow provides built-in support for AWS Batch, allowing the seamless deployment of Nextflow pipelines in the cloud, in which tasks are offloaded as Batch jobs. -Read the {ref}`AWS Batch executor ` section to learn more about the `awsbatch` executor in Nextflow. - -(aws-batch-cli)= +See [AWS Batch][awsbatch-executor] to learn more about the `awsbatch` executor in Nextflow. ### AWS CLI -:::{tip} +:::tip The need for the AWS CLI is considered a legacy requirement for the deployment of Nextflow pipelines with AWS Batch. -Instead, consider using {ref}`wave-page` and {ref}`fusion-page` to facilitate access to S3 without using the AWS CLI. +Instead, consider using [Wave][wave-page] and [Fusion][fusion-page] to facilitate access to S3 without using the AWS CLI. ::: Nextflow uses the [AWS command line tool](https://aws.amazon.com/cli/) (`aws`) to stage input files and output files between S3 and the task containers. -The `aws` command can be made available by either (1) installing it in the container image(s) or (2) installing it in a {ref}`custom AMI ` to be used instead of the default AMI when configuring AWS Batch. +The `aws` command can be made available by either (1) installing it in the container image(s) or (2) installing it in a [custom AMI][aws-custom-ami]` to be used instead of the default AMI when configuring AWS Batch. ### Get started @@ -186,7 +180,7 @@ The `aws` command can be made available by either (1) installing it in the conta 3. In the AWS Console, create an S3 bucket for the work directory (see below). You can also create separate buckets for input data and results, as needed. -4. Make sure that every process in your pipeline specifies a Docker container with the {ref}`process-container` directive. +4. Make sure that every process in your pipeline specifies a Docker container with the [container][process-container] directive. 5. Make sure that all of your container images are published in a Docker registry that can be reached by AWS Batch, such as [Docker Hub](https://hub.docker.com/), [Quay](https://quay.io/), or [Elastic Container Registry](https://aws.amazon.com/ecr/). @@ -194,9 +188,9 @@ The `aws` command can be made available by either (1) installing it in the conta To configure your pipeline for AWS Batch: -1. Specify the AWS Batch {ref}`executor ` -2. Specify the AWS Batch queue with the {ref}`process-queue` directive -3. Specify any Batch job container options with the {ref}`process-containerOptions` directive. +1. Specify the AWS Batch [executor][awsbatch-executor] +2. Specify the AWS Batch queue with the [queue][process-queue] directive +3. Specify any Batch job container options with the [containerOptions][process-containerOptions] directive. An example `nextflow.config` file is shown below: @@ -217,16 +211,16 @@ aws { } ``` -:::{tip} -Each process can be configured with its own queue by using the {ref}`process-queue` directive in the process definition or via {ref}`config-process-selectors` in your Nextflow configuration. +:::tip +Each process can be configured with its own queue by using the [queue][process-queue] directive in the process definition or via [Process selectors][config-process-selectors] in your Nextflow configuration. ::: ## Container Options -:::{versionadded} 21.12.1-edge +:::note{title="Added in version 21.12.1-edge"} ::: -The {ref}`process-containerOptions` directive can be used to control the properties of the container execution associated with each Batch job. +The [containerOptions][process-containerOptions] directive can be used to control the properties of the container execution associated with each Batch job. The following container options are currently supported: @@ -267,8 +261,6 @@ containerOptions '--ulimit nofile=1280:2560 --ulimit nproc=16:32 --privileged' Check the [AWS documentation](https://docs.aws.amazon.com/batch/latest/APIReference/API_ContainerProperties.html) for further details. -(aws-custom-ami)= - ## Custom AMI There are several reasons why you might need to create your own [AMI (Amazon Machine Image)](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AMIs.html) to use in your Compute Environments: @@ -283,7 +275,7 @@ There are several reasons why you might need to create your own [AMI (Amazon Mac From the EC2 Dashboard, select **Launch Instance**, then select **Browse more AMIs**. In the new page, select **AWS Marketplace AMIs**, and then search for **Amazon ECS-Optimized Amazon Linux 2 (AL2) x86_64 AMI**. Select the AMI and continue as usual to configure and launch the instance. -:::{note} +:::note The selected instance has a root volume of 30GB. Make sure to increase its size or add a second EBS volume with enough storage for real genomic workloads. ::: @@ -293,17 +285,16 @@ Finally, select **Create Image** from the EC2 Dashboard to create a new AMI from The new AMI ID needs to be specified when creating the Batch Compute Environment. -:::{warning} +:::warning Any additional software must be installed on the EC2 instance *before* creating the AMI. ::: -(id2)= ### AWS CLI installation -:::{tip} +:::tip The need for the AWS CLI is considered a legacy requirement for the deployment of Nextflow pipelines with AWS Batch. -Instead, consider using {ref}`wave-page` and {ref}`fusion-page` to facilitate access to S3 without using the AWS CLI. +Instead, consider using [Wave][wave-page] and [Fusion][fusion-page] to facilitate access to S3 without using the AWS CLI. ::: The [AWS CLI](https://aws.amazon.com/cli) should be installed in your custom AMI using a self-contained package manager such as [Conda](https://conda.io). That way, you can control which version of Python is used by the AWS CLI (which is written in Python). @@ -328,7 +319,7 @@ $ ./miniconda/bin/aws --version aws-cli/1.29.20 Python/3.11.4 Linux/4.14.318-241.531.amzn2.x86_64 botocore/1.31.20 ``` -:::{note} +:::note The `aws` tool will be placed in a directory named `bin` in the main installation folder. Modifying this directory structure after the tool is installed will cause it to not work properly. ::: @@ -340,11 +331,7 @@ aws.batch.cliPath = '/home/ec2-user/miniconda/bin/aws' Replace the path above with the one matching the location where the `aws` tool is installed in your AMI. -:::{versionchanged} 19.07.0 -The `executor.awscli` config option was replaced by `aws.batch.cliPath`. -::: - -:::{warning} +:::warning The grandparent directory of the `aws` tool will be mounted into the container at the same path as the host, e.g. `/home/ec2-user/miniconda`, which will shadow existing files in the container. Make sure you use a path that is not already present in the container. ::: @@ -389,11 +376,11 @@ To test the installation: curl -s http://localhost:51678/v1/metadata | python -mjson.tool (test) ``` -:::{note} +:::note The `AmazonEC2ContainerServiceforEC2Role` policy must be attached to the instance role in order to be able to connect the EC2 instance created by the Compute Environment to the ECS container. ::: -:::{note} +:::note The `AmazonEC2ContainerRegistryReadOnly` policy should be attached to the instance role in order to get read-only access to Amazon EC2 Container Registry repositories. ::: @@ -405,7 +392,7 @@ Nextflow automatically creates the Batch [Job definitions](http://docs.aws.amazo However, sometimes you may still need to specify a custom **Job Definition** to fine tune the configuration of a specific job, for example to define custom mount paths. -To do that, first create a **Job Definition** in the AWS Console (or by other means). Note the name of the Job definition you created. You can then associate a process execution with this Job definition by using the {ref}`process-container` directive and specifying, in place of the container image name, the Job definition name prefixed by `job-definition://`, as shown below: +To do that, first create a **Job Definition** in the AWS Console (or by other means). Note the name of the Job definition you created. You can then associate a process execution with this Job definition by using the [container][process-container] directive and specifying, in place of the container image name, the Job definition name prefixed by `job-definition://`, as shown below: ```groovy process.container = 'job-definition://your-job-definition-name' @@ -415,7 +402,7 @@ process.container = 'job-definition://your-job-definition-name' Nextflow allows the use of multiple executors in the same workflow application. This feature enables the deployment of hybrid workloads in which some jobs are executed in the local computer or local computing cluster and some jobs are offloaded to AWS Batch. -To enable this feature, use one or more {ref}`config-process-selectors` in your Nextflow configuration to apply the AWS Batch configuration to the subset of processes that you want to offload. For example: +To enable this feature, use one or more [Process selectors][config-process-selectors] in your Nextflow configuration to apply the AWS Batch configuration to the subset of processes that you want to offload. For example: ```groovy process { @@ -431,7 +418,7 @@ aws { } ``` -With the above configuration, processes with the `bigTask` {ref}`process-label` will run on AWS Batch, while the remaining processes will run in the local computer. +With the above configuration, processes with the `bigTask` [label][process-label] will run on AWS Batch, while the remaining processes will run in the local computer. Then launch the pipeline with the -bucket-dir option to specify an AWS S3 path for the jobs computed with AWS Batch and, optionally, the -work-dir to specify the local storage for the jobs computed locally: @@ -439,19 +426,16 @@ Then launch the pipeline with the -bucket-dir option to specify an AWS S3 path f nextflow run + +{% endblock %} + +{% block footer %} + {{ super() }} + + + +{% endblock %} diff --git a/new-docs/nextflow-docs/amazons3.mdx b/new-docs/nextflow-docs/amazons3.mdx new file mode 100644 index 0000000000..611083c760 --- /dev/null +++ b/new-docs/nextflow-docs/amazons3.mdx @@ -0,0 +1,102 @@ +# Amazon S3 + +Nextflow includes support for AWS S3 storage. Files stored in an S3 bucket can be accessed transparently in your pipeline script like any other file in the local file system. + +## S3 path + +In order to access an S3 file, you only need to prefix the file path with the `s3` schema and the `bucket` name where it is stored. + +For example, if you need to access the file `/data/sequences.fa` stored in a bucket named `my-bucket`, that file can be accessed using the following fully qualified path: + +``` +s3://my-bucket/data/sequences.fa +``` + +The usual file operations can be applied to a path handle with the above notation. For example, the content of an S3 file can be printed as follows: + +```nextflow +println file('s3://my-bucket/data/sequences.fa').text +``` + +See [Working with files][working-with-files] and the [Path][stdlib-types-path] reference to learn more about available file operations. + +## Security credentials + +AWS access credentials can be provided in two ways: + +1. Using AWS access and secret keys in your pipeline configuration. +2. Using IAM roles to grant access to S3 storage on AWS EC2 instances. + +### AWS access and secret keys + +The AWS access and secret keys can be specified by using the `aws` section in the `nextflow.config` configuration file as shown below: + +```groovy +aws { + accessKey = '' + secretKey = '' + region = '' +} +``` + +If the access credentials are not found in the above file, Nextflow looks for AWS credentials in the following order: + +1. The `nextflow.config` file in the pipeline execution directory +2. The environment variables `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY` +3. The environment variables `AWS_ACCESS_KEY` and `AWS_SECRET_KEY` +4. The profile in the AWS credentials file located at `~/.aws/credentials` + - Uses the `default `profile or the environment variable `AWS_PROFILE` if set +5. The profile in the AWS client configuration file located at `~/.aws/config` + - Uses the `default `profile or the environment variable `AWS_PROFILE` if set +6. The temporary AWS credentials provided by an IAM instance role. See [IAM Roles](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/iam-roles-for-amazon-ec2.html) documentation for details. + +More information regarding [AWS Security Credentials](http://docs.aws.amazon.com/general/latest/gr/aws-security-credentials.html) are available in the AWS documentation. + +### IAM roles with AWS EC2 instances + +When running your pipeline in an EC2 instance, IAM roles can be used to grant access to AWS resources. + +In this scenario, you only need to launch the EC2 instance with an IAM role which includes the `AmazonS3FullAccess` policy. Nextflow will detect and automatically acquire the permission to access S3 storage, without any further configuration. + +Learn more about [Using IAM Roles to Delegate Permissions to Applications that Run on AWS EC2](http://docs.aws.amazon.com/IAM/latest/UserGuide/roles-usingrole-ec2instance.html) in the AWS documentation. + +## China regions + +To use an AWS China region, make sure to specify the corresponding AWS API S3 endpoint in the Nextflow configuration file as shown below: + +```groovy +aws { + client { + endpoint = "https://s3.cn-north-1.amazonaws.com.cn" + } +} +``` + +Read more about AWS API endpoints in the [AWS documentation](https://docs.aws.amazon.com/general/latest/gr/s3.html) + +## S3-compatible storage + +To use S3-compatible object storage such as [Ceph](https://ceph.io) or [Minio](https://min.io) specify the endpoint of +your storage provider and enable the [S3 path style access](https://docs.aws.amazon.com/AmazonS3/latest/userguide/VirtualHosting.html#path-style-access) +in your Nextflow configuration as shown below: + + +```groovy +aws { + accessKey = '' + secretKey = '' + client { + endpoint = '' + s3PathStyleAccess = true + } +} +``` + +## Advanced configuration + +See [AWS configuration][config-aws] for more information about advanced S3 client configuration options. + + +[config-aws]: /nextflow_docs/nextflow_repo/docs/reference/config.mdx#aws +[stdlib-types-path]: /nextflow_docs/nextflow_repo/docs/reference/stdlib-types.mdx#path +[working-with-files]: /nextflow_docs/nextflow_repo/docs/working-with-files.mdx \ No newline at end of file diff --git a/new-docs/nextflow-docs/aws.mdx b/new-docs/nextflow-docs/aws.mdx new file mode 100644 index 0000000000..aef7ce7d8d --- /dev/null +++ b/new-docs/nextflow-docs/aws.mdx @@ -0,0 +1,545 @@ +import { AddedInVersion, ChangedInVersion, DeprecatedInVersion } from '@site/src/components/VersionedAdmonitions'; + +# Amazon Web Services + +:::tip +This page describes how to manually set up and use Nextflow with AWS Cloud. +You may be interested in using [Batch Forge](https://docs.seqera.io/platform/latest/compute-envs/aws-batch) in [Seqera Platform](https://seqera.io/platform/), +which automatically creates the required AWS infrastructure for you with minimal intervention. +::: + +## AWS security credentials + +Nextflow uses the [AWS security credentials](https://docs.aws.amazon.com/general/latest/gr/aws-sec-cred-types.html) to make programmatic calls to AWS services. + +The AWS credentials are selected from the following sources, in order of descending priority: + +1. Nextflow configuration file - `aws.accessKey` and `aws.secretKey`. See [aws][config-aws] for more details. + +2. A custom profile in `$HOME/.aws/credentials` and/or `$HOME/.aws/config`. The profile can be supplied from the `aws.profile` config option, or the `AWS_PROFILE` or `AWS_DEFAULT_PROFILE` environmental variables. + +3. Environment variables - `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY`. + +4. The `default` profile in `~/.aws/credentials` and/or `~/.aws/config`. + +5. Single Sign-On (SSO) credentials. See the [AWS documentation](https://docs.aws.amazon.com/cli/latest/userguide/sso-configure-profile-token.html) for more details. + + + + +6. EC2 instance profile credentials. See the [AWS documentation](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/iam-roles-for-amazon-ec2.html) and [this blog post](https://aws.amazon.com/blogs/security/granting-permission-to-launch-ec2-instances-with-iam-roles-passrole-permission/) for more details. + +The AWS region is selected from the following sources, in order of descending priority: + +1. Nextflow configuration file - `aws.region` +2. Environment variables - `AWS_REGION` or `AWS_DEFAULT_REGION` +3. EC2 instance metadata (if Nextflow is running in an EC2 instance) + +SSO credentials and instance profile credentials are the most recommended because they don't require you to manage and distribute AWS keys explicitly. SSO credentials are ideal for launching pipelines from outside of AWS (e.g. your laptop), while instance profile credentials are ideal for launching pipelines within AWS (e.g. an EC2 instance). + +## AWS IAM policies + +[IAM policies](https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies.html) are the mechanism used by AWS to define permissions for IAM identities. In order to access certain AWS services, the proper policies must be attached to the identity associated to the AWS credentials. + +Minimal permissions policies to be attached to the AWS account used by Nextflow are: + +- To use AWS Batch: + + ```json + "batch:CancelJob" + "batch:DescribeComputeEnvironments" + "batch:DescribeJobDefinitions" + "batch:DescribeJobQueues" + "batch:DescribeJobs" + "batch:ListJobs" + "batch:RegisterJobDefinition" + "batch:SubmitJob" + "batch:TagResource" + "batch:TerminateJob" + ``` + +- To view [EC2](https://aws.amazon.com/ec2/) instances: + + ```json + "ec2:DescribeInstanceAttribute" + "ec2:DescribeInstances" + "ec2:DescribeInstanceStatus" + "ec2:DescribeInstanceTypes" + "ecs:DescribeContainerInstances" + "ecs:DescribeTasks" + ``` + +- To pull container images from [ECR](https://aws.amazon.com/ecr/) repositories: + + ```json + "ecr:BatchCheckLayerAvailability" + "ecr:BatchGetImage" + "ecr:DescribeImages" + "ecr:DescribeImageScanFindings" + "ecr:DescribeRepositories" + "ecr:GetAuthorizationToken" + "ecr:GetDownloadUrlForLayer" + "ecr:GetLifecyclePolicy" + "ecr:GetLifecyclePolicyPreview" + "ecr:GetRepositoryPolicy" + "ecr:ListImages" + "ecr:ListTagsForResource" + ``` + + Alternatively, you can use AWS provided `AmazonEC2ContainerRegistryReadOnly` managed policy. + +:::note +If you are running Fargate or Fargate Spot, you may need the following policies in addition to the listed above: + ```json + "ec2:DescribeSubnets" + "ecs:CreateCluster" + "ecs:DeleteCluster" + "ecs:DescribeClusters" + "ecs:ListClusters" + ``` +::: + +### S3 policies + +Nextflow also requires policies to access [S3 buckets](https://aws.amazon.com/s3/) in order to use the work directory, pull input data, and publish results. + +Depending on the pipeline configuration, the above actions can be done all in a single bucket but, more likely, spread across multiple buckets. Once the list of buckets used by the pipeline is identified, there are two alternative ways to give Nextflow access to these buckets: + +1. Grant access to all buckets by attaching the policy `"s3:*"` to the IAM identity. This works only if buckets do not set their own access policies (see point 2); + +2. For more fine grained control, assign to each bucket the following policy (replace the placeholders with the actual values): + + ```json + { + "Version": "2012-10-17", + "Id": "", + "Statement": [ + { + "Sid": "", + "Effect": "Allow", + "Principal": { + "AWS": "" + }, + "Action": [ + "s3:GetObject", + "s3:PutObject", + "s3:DeleteObject", + "s3:PutObjectTagging", + "s3:AbortMultipartUpload" + ], + "Resource": "arn:aws:s3:::/*" + }, + { + "Sid": "AllowSSLRequestsOnly", + "Effect": "Deny", + "Principal": "*", + "Action": "s3:*", + "Resource": [ + "arn:aws:s3:::", + "arn:aws:s3:::/*" + ], + "Condition": { + "Bool": { + "aws:SecureTransport": "false" + } + } + } + ] + } + ``` + +See the [bucket policy documentation](https://docs.aws.amazon.com/config/latest/developerguide/s3-bucket-policy.html) for additional details. + +## AWS Batch + +[AWS Batch](https://aws.amazon.com/batch/) is a managed computing service that allows the execution of containerised workloads in the AWS cloud infrastructure. It dynamically provisions the optimal quantity and type of compute resources (e.g., CPU or memory optimized compute resources) based on the volume and specific resource requirements of the jobs submitted. + +Nextflow provides built-in support for AWS Batch, allowing the seamless deployment of Nextflow pipelines in the cloud, in which tasks are offloaded as Batch jobs. + +See [AWS Batch][awsbatch-executor] to learn more about the `awsbatch` executor in Nextflow. + +### AWS CLI + +:::tip +The need for the AWS CLI is considered a legacy requirement for the deployment of Nextflow pipelines with AWS Batch. +Instead, consider using [Wave][wave-page] and [Fusion][fusion-page] to facilitate access to S3 without using the AWS CLI. +::: + +Nextflow uses the [AWS command line tool](https://aws.amazon.com/cli/) (`aws`) to stage input files and output files between S3 and the task containers. + +The `aws` command can be made available by either (1) installing it in the container image(s) or (2) installing it in a [custom AMI][aws-custom-ami]` to be used instead of the default AMI when configuring AWS Batch. + +### Get started + +1. In the AWS Console, navigate to **AWS Batch** and create a [Compute environment](http://docs.aws.amazon.com/batch/latest/userguide/compute_environments.html) (CE). + + 1. If you are using a custom AMI (see following sections), the AMI ID must be specified in the CE configuration + 2. Make sure to select an AMI (either custom or existing) with Docker installed (see following sections) + 3. Make sure the policy `AmazonS3FullAccess` (granting access to S3 buckets) is attached to the instance role configured for the CE + 4. If you plan to use Docker images from Amazon ECS container, make sure the `AmazonEC2ContainerServiceforEC2Role` policy is also attached to the instance role + +2. In the AWS Console, create (at least) one [Job Queue](https://docs.aws.amazon.com/batch/latest/userguide/job_queues.html) and bind it to the Compute environment. + +3. In the AWS Console, create an S3 bucket for the work directory (see below). You can also create separate buckets for input data and results, as needed. + +4. Make sure that every process in your pipeline specifies a Docker container with the [container][process-container] directive. + +5. Make sure that all of your container images are published in a Docker registry that can be reached by AWS Batch, such as [Docker Hub](https://hub.docker.com/), [Quay](https://quay.io/), or [Elastic Container Registry](https://aws.amazon.com/ecr/). + +### Configuration + +To configure your pipeline for AWS Batch: + +1. Specify the AWS Batch [executor][awsbatch-executor] +2. Specify the AWS Batch queue with the [queue][process-queue] directive +3. Specify any Batch job container options with the [containerOptions][process-containerOptions] directive. + +An example `nextflow.config` file is shown below: + +```groovy +process { + executor = 'awsbatch' + queue = 'my-batch-queue' + container = 'quay.io/biocontainers/salmon' + containerOptions = '--shm-size 16000000 --ulimit nofile=1280:2560 --ulimit nproc=16:32' +} + +aws { + batch { + // NOTE: this setting is only required if the AWS CLI is installed in a custom AMI + cliPath = '/home/ec2-user/miniconda/bin/aws' + } + region = 'us-east-1' +} +``` + +:::tip +Each process can be configured with its own queue by using the [queue][process-queue] directive in the process definition or via [Process selectors][config-process-selectors] in your Nextflow configuration. +::: + +## Container Options + + + +The [containerOptions][process-containerOptions] directive can be used to control the properties of the container execution associated with each Batch job. + +The following container options are currently supported: + +``` +-e, --env string + Set environment variables (format: or =) +--init + Run an init inside the container that forwards signals and reaps processes +--memory-swap int + The total amount of swap memory (in MiB) the container can use: '-1' to enable unlimited swap +--memory-swappiness int + Tune container memory swappiness (0 to 100) (default -1) +--privileged + Give extended privileges to the container +--read-only + Mount the container's root filesystem as read only +--shm-size int + Size (in MiB) of /dev/shm +--tmpfs string + Mount a tmpfs directory (format: :,size=), size is in MiB +-u, --user string + Username or UID (format: [:]) +--ulimit string + Ulimit options (format: =[:]) +``` + +Container options may be passed in long form (e.g `--option value`) or short form (e.g. `-o value`) where available. + +Few examples: + +```nextflow +containerOptions '--tmpfs /run:rw,noexec,nosuid,size=128 --tmpfs /app:ro,size=64' + +containerOptions '-e MYVAR1 --env MYVAR2=foo2 --env MYVAR3=foo3 --memory-swap 3240000 --memory-swappiness 20 --shm-size 16000000' + +containerOptions '--ulimit nofile=1280:2560 --ulimit nproc=16:32 --privileged' +``` + +Check the [AWS documentation](https://docs.aws.amazon.com/batch/latest/APIReference/API_ContainerProperties.html) for further details. + +## Custom AMI + +There are several reasons why you might need to create your own [AMI (Amazon Machine Image)](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AMIs.html) to use in your Compute Environments: + +- You do not want to install the AWS CLI into each of your Docker images and would rather provide it through the AMI +- The existing AMI (selected from the marketplace) does not have Docker installed +- You need to attach more storage to your EC2 instance (the default ECS instance AMI has only a 30GB EBS volume which is not enough for most data pipelines) +- You need to install additional software that is not available in your Docker image + +### Create your custom AMI + +From the EC2 Dashboard, select **Launch Instance**, then select **Browse more AMIs**. In the new page, select +**AWS Marketplace AMIs**, and then search for `Amazon ECS-Optimized Amazon Linux 2 (AL2) x86_64 AMI`. Select the AMI and continue as usual to configure and launch the instance. + +:::note +The selected instance has a root volume of 30GB. Make sure to increase its size or add a second EBS volume with enough storage for real genomic workloads. +::: + +When the instance is running, SSH into it (or connect with the Session Manager service), install the AWS CLI, and install any other tool that may be required (see following sections). + +Finally, select **Create Image** from the EC2 Dashboard to create a new AMI from the running instance (you can also do it through the AWS CLI). + +The new AMI ID needs to be specified when creating the Batch Compute Environment. + +:::warning +Any additional software must be installed on the EC2 instance *before* creating the AMI. +::: + + +### AWS CLI installation + +:::tip +The need for the AWS CLI is considered a legacy requirement for the deployment of Nextflow pipelines with AWS Batch. +Instead, consider using [Wave][wave-page] and [Fusion][fusion-page] to facilitate access to S3 without using the AWS CLI. +::: + +The [AWS CLI](https://aws.amazon.com/cli) should be installed in your custom AMI using a self-contained package manager such as [Conda](https://conda.io). That way, you can control which version of Python is used by the AWS CLI (which is written in Python). + +If you don't use Conda, the `aws` command will attempt to use the version of Python that is installed in the container, and it won't be able to find the necessary dependencies. + +The following snippet shows how to install AWS CLI with [Miniconda](https://conda.io/miniconda.html) in the home folder: + +```bash +cd $HOME +sudo yum install -y bzip2 wget +wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh +bash Miniconda3-latest-Linux-x86_64.sh -b -f -p $HOME/miniconda +$HOME/miniconda/bin/conda install -c conda-forge -y awscli +rm Miniconda3-latest-Linux-x86_64.sh +``` + +Afterwards, verify that the AWS CLI package works correctly: + +```console +$ ./miniconda/bin/aws --version +aws-cli/1.29.20 Python/3.11.4 Linux/4.14.318-241.531.amzn2.x86_64 botocore/1.31.20 +``` + +:::note +The `aws` tool will be placed in a directory named `bin` in the main installation folder. Modifying this directory structure after the tool is installed will cause it to not work properly. +::: + +To configure Nextflow to use this installation, specify the `aws.batch.cliPath` option in the Nextflow configuration as shown below: + +```groovy +aws.batch.cliPath = '/home/ec2-user/miniconda/bin/aws' +``` + +Replace the path above with the one matching the location where the `aws` tool is installed in your AMI. + +:::warning +The grandparent directory of the `aws` tool will be mounted into the container at the same path as the host, e.g. `/home/ec2-user/miniconda`, which will shadow existing files in the container. Make sure you use a path that is not already present in the container. +::: + +### Docker installation + +Docker is required by Nextflow to execute tasks on AWS Batch. The **Amazon ECS-optimized Amazon Linux 2023 AMI** has Docker installed, however, if you create your AMI from a different AMI that does not have Docker installed, you will need to install it manually. + +The following snippet shows how to install Docker on an Amazon EC2 instance: + +```bash +# install Docker +sudo yum update -y +sudo yum install docker + +# start the Docker service +sudo service docker start + +# empower your user to run Docker without sudo +sudo usermod -a -G docker ec2-user +``` + +You must log out and log back in again to use the new `ec2-user` permissions. + +These steps must be done *before* creating the AMI from the current EC2 instance. + +### Amazon ECS container agent installation + +The [ECS container agent](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ECS_agent.html) is a component of Amazon Elastic Container Service (Amazon ECS) and is responsible for managing containers on behalf of ECS. AWS Batch uses ECS to execute containerized jobs, therefore it requires the agent to be installed on EC2 instances within your Compute Environments. + +The ECS agent is included in the **Amazon ECS-optimized Amazon Linux 2023 AMI** . If you use a different base AMI, you can also install the agent on any EC2 instance that supports the Amazon ECS specification. + +To install the agent, follow these steps: + +```bash +sudo yum install ecs-init +sudo systemctl enable --now ecs +``` + +To test the installation: + +```bash +curl -s http://localhost:51678/v1/metadata | python -mjson.tool +``` + +:::note +The `AmazonEC2ContainerServiceforEC2Role` policy must be attached to the instance role in order to be able to connect the EC2 instance created by the Compute Environment to the ECS container. +::: + +:::note +The `AmazonEC2ContainerRegistryReadOnly` policy should be attached to the instance role in order to get read-only access to Amazon EC2 Container Registry repositories. +::: + +## Jobs & Execution + +### Custom job definition + +Nextflow automatically creates the Batch [Job definitions](http://docs.aws.amazon.com/batch/latest/userguide/job_definitions.html) needed to execute tasks in your pipeline, so you don't need to define them beforehand. + +However, sometimes you may still need to specify a custom **Job Definition** to fine tune the configuration of a specific job, for example to define custom mount paths. + +To do that, first create a **Job Definition** in the AWS Console (or by other means). Note the name of the Job definition you created. You can then associate a process execution with this Job definition by using the [container][process-container] directive and specifying, in place of the container image name, the Job definition name prefixed by `job-definition://`, as shown below: + +```groovy +process.container = 'job-definition://your-job-definition-name' +``` + +### Hybrid workloads + +Nextflow allows the use of multiple executors in the same workflow application. This feature enables the deployment of hybrid workloads in which some jobs are executed in the local computer or local computing cluster and some jobs are offloaded to AWS Batch. + +To enable this feature, use one or more [Process selectors][config-process-selectors] in your Nextflow configuration to apply the AWS Batch configuration to the subset of processes that you want to offload. For example: + +```groovy +process { + withLabel: bigTask { + executor = 'awsbatch' + queue = 'my-batch-queue' + container = 'my/image:tag' + } +} + +aws { + region = 'eu-west-1' +} +``` + +With the above configuration, processes with the `bigTask` [label][process-label] will run on AWS Batch, while the remaining processes will run in the local computer. + +Then launch the pipeline with the -bucket-dir option to specify an AWS S3 path for the jobs computed with AWS Batch and, optionally, the -work-dir to specify the local storage for the jobs computed locally: + +```bash +nextflow run