diff --git a/.DS_Store b/.DS_Store new file mode 100644 index 0000000000..05cf6fb857 Binary files /dev/null and b/.DS_Store differ diff --git a/docs/amazons3.md b/docs/amazons3.mdx similarity index 88% rename from docs/amazons3.md rename to docs/amazons3.mdx index 95c6cfb4cb..611083c760 100644 --- a/docs/amazons3.md +++ b/docs/amazons3.mdx @@ -1,5 +1,3 @@ -(amazons3-page)= - # Amazon S3 Nextflow includes support for AWS S3 storage. Files stored in an S3 bucket can be accessed transparently in your pipeline script like any other file in the local file system. @@ -20,7 +18,7 @@ The usual file operations can be applied to a path handle with the above notatio println file('s3://my-bucket/data/sequences.fa').text ``` -See {ref}`working-with-files` and the {ref}`stdlib-types-path` reference to learn more about available file operations. +See [Working with files][working-with-files] and the [Path][stdlib-types-path] reference to learn more about available file operations. ## Security credentials @@ -96,4 +94,9 @@ aws { ## Advanced configuration -Read {ref}`AWS configuration` section to learn more about advanced S3 client configuration options. +See [AWS configuration][config-aws] for more information about advanced S3 client configuration options. + + +[config-aws]: /nextflow_docs/nextflow_repo/docs/reference/config.mdx#aws +[stdlib-types-path]: /nextflow_docs/nextflow_repo/docs/reference/stdlib-types.mdx#path +[working-with-files]: /nextflow_docs/nextflow_repo/docs/working-with-files.mdx \ No newline at end of file diff --git a/docs/aws.mdx b/docs/aws.mdx new file mode 100644 index 0000000000..aef7ce7d8d --- /dev/null +++ b/docs/aws.mdx @@ -0,0 +1,545 @@ +import { AddedInVersion, ChangedInVersion, DeprecatedInVersion } from '@site/src/components/VersionedAdmonitions'; + +# Amazon Web Services + +:::tip +This page describes how to manually set up and use Nextflow with AWS Cloud. +You may be interested in using [Batch Forge](https://docs.seqera.io/platform/latest/compute-envs/aws-batch) in [Seqera Platform](https://seqera.io/platform/), +which automatically creates the required AWS infrastructure for you with minimal intervention. +::: + +## AWS security credentials + +Nextflow uses the [AWS security credentials](https://docs.aws.amazon.com/general/latest/gr/aws-sec-cred-types.html) to make programmatic calls to AWS services. + +The AWS credentials are selected from the following sources, in order of descending priority: + +1. Nextflow configuration file - `aws.accessKey` and `aws.secretKey`. See [aws][config-aws] for more details. + +2. A custom profile in `$HOME/.aws/credentials` and/or `$HOME/.aws/config`. The profile can be supplied from the `aws.profile` config option, or the `AWS_PROFILE` or `AWS_DEFAULT_PROFILE` environmental variables. + +3. Environment variables - `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY`. + +4. The `default` profile in `~/.aws/credentials` and/or `~/.aws/config`. + +5. Single Sign-On (SSO) credentials. See the [AWS documentation](https://docs.aws.amazon.com/cli/latest/userguide/sso-configure-profile-token.html) for more details. + + + + +6. EC2 instance profile credentials. See the [AWS documentation](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/iam-roles-for-amazon-ec2.html) and [this blog post](https://aws.amazon.com/blogs/security/granting-permission-to-launch-ec2-instances-with-iam-roles-passrole-permission/) for more details. + +The AWS region is selected from the following sources, in order of descending priority: + +1. Nextflow configuration file - `aws.region` +2. Environment variables - `AWS_REGION` or `AWS_DEFAULT_REGION` +3. EC2 instance metadata (if Nextflow is running in an EC2 instance) + +SSO credentials and instance profile credentials are the most recommended because they don't require you to manage and distribute AWS keys explicitly. SSO credentials are ideal for launching pipelines from outside of AWS (e.g. your laptop), while instance profile credentials are ideal for launching pipelines within AWS (e.g. an EC2 instance). + +## AWS IAM policies + +[IAM policies](https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies.html) are the mechanism used by AWS to define permissions for IAM identities. In order to access certain AWS services, the proper policies must be attached to the identity associated to the AWS credentials. + +Minimal permissions policies to be attached to the AWS account used by Nextflow are: + +- To use AWS Batch: + + ```json + "batch:CancelJob" + "batch:DescribeComputeEnvironments" + "batch:DescribeJobDefinitions" + "batch:DescribeJobQueues" + "batch:DescribeJobs" + "batch:ListJobs" + "batch:RegisterJobDefinition" + "batch:SubmitJob" + "batch:TagResource" + "batch:TerminateJob" + ``` + +- To view [EC2](https://aws.amazon.com/ec2/) instances: + + ```json + "ec2:DescribeInstanceAttribute" + "ec2:DescribeInstances" + "ec2:DescribeInstanceStatus" + "ec2:DescribeInstanceTypes" + "ecs:DescribeContainerInstances" + "ecs:DescribeTasks" + ``` + +- To pull container images from [ECR](https://aws.amazon.com/ecr/) repositories: + + ```json + "ecr:BatchCheckLayerAvailability" + "ecr:BatchGetImage" + "ecr:DescribeImages" + "ecr:DescribeImageScanFindings" + "ecr:DescribeRepositories" + "ecr:GetAuthorizationToken" + "ecr:GetDownloadUrlForLayer" + "ecr:GetLifecyclePolicy" + "ecr:GetLifecyclePolicyPreview" + "ecr:GetRepositoryPolicy" + "ecr:ListImages" + "ecr:ListTagsForResource" + ``` + + Alternatively, you can use AWS provided `AmazonEC2ContainerRegistryReadOnly` managed policy. + +:::note +If you are running Fargate or Fargate Spot, you may need the following policies in addition to the listed above: + ```json + "ec2:DescribeSubnets" + "ecs:CreateCluster" + "ecs:DeleteCluster" + "ecs:DescribeClusters" + "ecs:ListClusters" + ``` +::: + +### S3 policies + +Nextflow also requires policies to access [S3 buckets](https://aws.amazon.com/s3/) in order to use the work directory, pull input data, and publish results. + +Depending on the pipeline configuration, the above actions can be done all in a single bucket but, more likely, spread across multiple buckets. Once the list of buckets used by the pipeline is identified, there are two alternative ways to give Nextflow access to these buckets: + +1. Grant access to all buckets by attaching the policy `"s3:*"` to the IAM identity. This works only if buckets do not set their own access policies (see point 2); + +2. For more fine grained control, assign to each bucket the following policy (replace the placeholders with the actual values): + + ```json + { + "Version": "2012-10-17", + "Id": "", + "Statement": [ + { + "Sid": "", + "Effect": "Allow", + "Principal": { + "AWS": "" + }, + "Action": [ + "s3:GetObject", + "s3:PutObject", + "s3:DeleteObject", + "s3:PutObjectTagging", + "s3:AbortMultipartUpload" + ], + "Resource": "arn:aws:s3:::/*" + }, + { + "Sid": "AllowSSLRequestsOnly", + "Effect": "Deny", + "Principal": "*", + "Action": "s3:*", + "Resource": [ + "arn:aws:s3:::", + "arn:aws:s3:::/*" + ], + "Condition": { + "Bool": { + "aws:SecureTransport": "false" + } + } + } + ] + } + ``` + +See the [bucket policy documentation](https://docs.aws.amazon.com/config/latest/developerguide/s3-bucket-policy.html) for additional details. + +## AWS Batch + +[AWS Batch](https://aws.amazon.com/batch/) is a managed computing service that allows the execution of containerised workloads in the AWS cloud infrastructure. It dynamically provisions the optimal quantity and type of compute resources (e.g., CPU or memory optimized compute resources) based on the volume and specific resource requirements of the jobs submitted. + +Nextflow provides built-in support for AWS Batch, allowing the seamless deployment of Nextflow pipelines in the cloud, in which tasks are offloaded as Batch jobs. + +See [AWS Batch][awsbatch-executor] to learn more about the `awsbatch` executor in Nextflow. + +### AWS CLI + +:::tip +The need for the AWS CLI is considered a legacy requirement for the deployment of Nextflow pipelines with AWS Batch. +Instead, consider using [Wave][wave-page] and [Fusion][fusion-page] to facilitate access to S3 without using the AWS CLI. +::: + +Nextflow uses the [AWS command line tool](https://aws.amazon.com/cli/) (`aws`) to stage input files and output files between S3 and the task containers. + +The `aws` command can be made available by either (1) installing it in the container image(s) or (2) installing it in a [custom AMI][aws-custom-ami]` to be used instead of the default AMI when configuring AWS Batch. + +### Get started + +1. In the AWS Console, navigate to **AWS Batch** and create a [Compute environment](http://docs.aws.amazon.com/batch/latest/userguide/compute_environments.html) (CE). + + 1. If you are using a custom AMI (see following sections), the AMI ID must be specified in the CE configuration + 2. Make sure to select an AMI (either custom or existing) with Docker installed (see following sections) + 3. Make sure the policy `AmazonS3FullAccess` (granting access to S3 buckets) is attached to the instance role configured for the CE + 4. If you plan to use Docker images from Amazon ECS container, make sure the `AmazonEC2ContainerServiceforEC2Role` policy is also attached to the instance role + +2. In the AWS Console, create (at least) one [Job Queue](https://docs.aws.amazon.com/batch/latest/userguide/job_queues.html) and bind it to the Compute environment. + +3. In the AWS Console, create an S3 bucket for the work directory (see below). You can also create separate buckets for input data and results, as needed. + +4. Make sure that every process in your pipeline specifies a Docker container with the [container][process-container] directive. + +5. Make sure that all of your container images are published in a Docker registry that can be reached by AWS Batch, such as [Docker Hub](https://hub.docker.com/), [Quay](https://quay.io/), or [Elastic Container Registry](https://aws.amazon.com/ecr/). + +### Configuration + +To configure your pipeline for AWS Batch: + +1. Specify the AWS Batch [executor][awsbatch-executor] +2. Specify the AWS Batch queue with the [queue][process-queue] directive +3. Specify any Batch job container options with the [containerOptions][process-containerOptions] directive. + +An example `nextflow.config` file is shown below: + +```groovy +process { + executor = 'awsbatch' + queue = 'my-batch-queue' + container = 'quay.io/biocontainers/salmon' + containerOptions = '--shm-size 16000000 --ulimit nofile=1280:2560 --ulimit nproc=16:32' +} + +aws { + batch { + // NOTE: this setting is only required if the AWS CLI is installed in a custom AMI + cliPath = '/home/ec2-user/miniconda/bin/aws' + } + region = 'us-east-1' +} +``` + +:::tip +Each process can be configured with its own queue by using the [queue][process-queue] directive in the process definition or via [Process selectors][config-process-selectors] in your Nextflow configuration. +::: + +## Container Options + + + +The [containerOptions][process-containerOptions] directive can be used to control the properties of the container execution associated with each Batch job. + +The following container options are currently supported: + +``` +-e, --env string + Set environment variables (format: or =) +--init + Run an init inside the container that forwards signals and reaps processes +--memory-swap int + The total amount of swap memory (in MiB) the container can use: '-1' to enable unlimited swap +--memory-swappiness int + Tune container memory swappiness (0 to 100) (default -1) +--privileged + Give extended privileges to the container +--read-only + Mount the container's root filesystem as read only +--shm-size int + Size (in MiB) of /dev/shm +--tmpfs string + Mount a tmpfs directory (format: :,size=), size is in MiB +-u, --user string + Username or UID (format: [:]) +--ulimit string + Ulimit options (format: =[:]) +``` + +Container options may be passed in long form (e.g `--option value`) or short form (e.g. `-o value`) where available. + +Few examples: + +```nextflow +containerOptions '--tmpfs /run:rw,noexec,nosuid,size=128 --tmpfs /app:ro,size=64' + +containerOptions '-e MYVAR1 --env MYVAR2=foo2 --env MYVAR3=foo3 --memory-swap 3240000 --memory-swappiness 20 --shm-size 16000000' + +containerOptions '--ulimit nofile=1280:2560 --ulimit nproc=16:32 --privileged' +``` + +Check the [AWS documentation](https://docs.aws.amazon.com/batch/latest/APIReference/API_ContainerProperties.html) for further details. + +## Custom AMI + +There are several reasons why you might need to create your own [AMI (Amazon Machine Image)](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AMIs.html) to use in your Compute Environments: + +- You do not want to install the AWS CLI into each of your Docker images and would rather provide it through the AMI +- The existing AMI (selected from the marketplace) does not have Docker installed +- You need to attach more storage to your EC2 instance (the default ECS instance AMI has only a 30GB EBS volume which is not enough for most data pipelines) +- You need to install additional software that is not available in your Docker image + +### Create your custom AMI + +From the EC2 Dashboard, select **Launch Instance**, then select **Browse more AMIs**. In the new page, select +**AWS Marketplace AMIs**, and then search for `Amazon ECS-Optimized Amazon Linux 2 (AL2) x86_64 AMI`. Select the AMI and continue as usual to configure and launch the instance. + +:::note +The selected instance has a root volume of 30GB. Make sure to increase its size or add a second EBS volume with enough storage for real genomic workloads. +::: + +When the instance is running, SSH into it (or connect with the Session Manager service), install the AWS CLI, and install any other tool that may be required (see following sections). + +Finally, select **Create Image** from the EC2 Dashboard to create a new AMI from the running instance (you can also do it through the AWS CLI). + +The new AMI ID needs to be specified when creating the Batch Compute Environment. + +:::warning +Any additional software must be installed on the EC2 instance *before* creating the AMI. +::: + + +### AWS CLI installation + +:::tip +The need for the AWS CLI is considered a legacy requirement for the deployment of Nextflow pipelines with AWS Batch. +Instead, consider using [Wave][wave-page] and [Fusion][fusion-page] to facilitate access to S3 without using the AWS CLI. +::: + +The [AWS CLI](https://aws.amazon.com/cli) should be installed in your custom AMI using a self-contained package manager such as [Conda](https://conda.io). That way, you can control which version of Python is used by the AWS CLI (which is written in Python). + +If you don't use Conda, the `aws` command will attempt to use the version of Python that is installed in the container, and it won't be able to find the necessary dependencies. + +The following snippet shows how to install AWS CLI with [Miniconda](https://conda.io/miniconda.html) in the home folder: + +```bash +cd $HOME +sudo yum install -y bzip2 wget +wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh +bash Miniconda3-latest-Linux-x86_64.sh -b -f -p $HOME/miniconda +$HOME/miniconda/bin/conda install -c conda-forge -y awscli +rm Miniconda3-latest-Linux-x86_64.sh +``` + +Afterwards, verify that the AWS CLI package works correctly: + +```console +$ ./miniconda/bin/aws --version +aws-cli/1.29.20 Python/3.11.4 Linux/4.14.318-241.531.amzn2.x86_64 botocore/1.31.20 +``` + +:::note +The `aws` tool will be placed in a directory named `bin` in the main installation folder. Modifying this directory structure after the tool is installed will cause it to not work properly. +::: + +To configure Nextflow to use this installation, specify the `aws.batch.cliPath` option in the Nextflow configuration as shown below: + +```groovy +aws.batch.cliPath = '/home/ec2-user/miniconda/bin/aws' +``` + +Replace the path above with the one matching the location where the `aws` tool is installed in your AMI. + +:::warning +The grandparent directory of the `aws` tool will be mounted into the container at the same path as the host, e.g. `/home/ec2-user/miniconda`, which will shadow existing files in the container. Make sure you use a path that is not already present in the container. +::: + +### Docker installation + +Docker is required by Nextflow to execute tasks on AWS Batch. The **Amazon ECS-optimized Amazon Linux 2023 AMI** has Docker installed, however, if you create your AMI from a different AMI that does not have Docker installed, you will need to install it manually. + +The following snippet shows how to install Docker on an Amazon EC2 instance: + +```bash +# install Docker +sudo yum update -y +sudo yum install docker + +# start the Docker service +sudo service docker start + +# empower your user to run Docker without sudo +sudo usermod -a -G docker ec2-user +``` + +You must log out and log back in again to use the new `ec2-user` permissions. + +These steps must be done *before* creating the AMI from the current EC2 instance. + +### Amazon ECS container agent installation + +The [ECS container agent](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ECS_agent.html) is a component of Amazon Elastic Container Service (Amazon ECS) and is responsible for managing containers on behalf of ECS. AWS Batch uses ECS to execute containerized jobs, therefore it requires the agent to be installed on EC2 instances within your Compute Environments. + +The ECS agent is included in the **Amazon ECS-optimized Amazon Linux 2023 AMI** . If you use a different base AMI, you can also install the agent on any EC2 instance that supports the Amazon ECS specification. + +To install the agent, follow these steps: + +```bash +sudo yum install ecs-init +sudo systemctl enable --now ecs +``` + +To test the installation: + +```bash +curl -s http://localhost:51678/v1/metadata | python -mjson.tool +``` + +:::note +The `AmazonEC2ContainerServiceforEC2Role` policy must be attached to the instance role in order to be able to connect the EC2 instance created by the Compute Environment to the ECS container. +::: + +:::note +The `AmazonEC2ContainerRegistryReadOnly` policy should be attached to the instance role in order to get read-only access to Amazon EC2 Container Registry repositories. +::: + +## Jobs & Execution + +### Custom job definition + +Nextflow automatically creates the Batch [Job definitions](http://docs.aws.amazon.com/batch/latest/userguide/job_definitions.html) needed to execute tasks in your pipeline, so you don't need to define them beforehand. + +However, sometimes you may still need to specify a custom **Job Definition** to fine tune the configuration of a specific job, for example to define custom mount paths. + +To do that, first create a **Job Definition** in the AWS Console (or by other means). Note the name of the Job definition you created. You can then associate a process execution with this Job definition by using the [container][process-container] directive and specifying, in place of the container image name, the Job definition name prefixed by `job-definition://`, as shown below: + +```groovy +process.container = 'job-definition://your-job-definition-name' +``` + +### Hybrid workloads + +Nextflow allows the use of multiple executors in the same workflow application. This feature enables the deployment of hybrid workloads in which some jobs are executed in the local computer or local computing cluster and some jobs are offloaded to AWS Batch. + +To enable this feature, use one or more [Process selectors][config-process-selectors] in your Nextflow configuration to apply the AWS Batch configuration to the subset of processes that you want to offload. For example: + +```groovy +process { + withLabel: bigTask { + executor = 'awsbatch' + queue = 'my-batch-queue' + container = 'my/image:tag' + } +} + +aws { + region = 'eu-west-1' +} +``` + +With the above configuration, processes with the `bigTask` [label][process-label] will run on AWS Batch, while the remaining processes will run in the local computer. + +Then launch the pipeline with the -bucket-dir option to specify an AWS S3 path for the jobs computed with AWS Batch and, optionally, the -work-dir to specify the local storage for the jobs computed locally: + +```bash +nextflow run + +{% endblock %} + +{% block footer %} + {{ super() }} + + + +{% endblock %} diff --git a/new-docs/nextflow-docs/amazons3.mdx b/new-docs/nextflow-docs/amazons3.mdx new file mode 100644 index 0000000000..611083c760 --- /dev/null +++ b/new-docs/nextflow-docs/amazons3.mdx @@ -0,0 +1,102 @@ +# Amazon S3 + +Nextflow includes support for AWS S3 storage. Files stored in an S3 bucket can be accessed transparently in your pipeline script like any other file in the local file system. + +## S3 path + +In order to access an S3 file, you only need to prefix the file path with the `s3` schema and the `bucket` name where it is stored. + +For example, if you need to access the file `/data/sequences.fa` stored in a bucket named `my-bucket`, that file can be accessed using the following fully qualified path: + +``` +s3://my-bucket/data/sequences.fa +``` + +The usual file operations can be applied to a path handle with the above notation. For example, the content of an S3 file can be printed as follows: + +```nextflow +println file('s3://my-bucket/data/sequences.fa').text +``` + +See [Working with files][working-with-files] and the [Path][stdlib-types-path] reference to learn more about available file operations. + +## Security credentials + +AWS access credentials can be provided in two ways: + +1. Using AWS access and secret keys in your pipeline configuration. +2. Using IAM roles to grant access to S3 storage on AWS EC2 instances. + +### AWS access and secret keys + +The AWS access and secret keys can be specified by using the `aws` section in the `nextflow.config` configuration file as shown below: + +```groovy +aws { + accessKey = '' + secretKey = '' + region = '' +} +``` + +If the access credentials are not found in the above file, Nextflow looks for AWS credentials in the following order: + +1. The `nextflow.config` file in the pipeline execution directory +2. The environment variables `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY` +3. The environment variables `AWS_ACCESS_KEY` and `AWS_SECRET_KEY` +4. The profile in the AWS credentials file located at `~/.aws/credentials` + - Uses the `default `profile or the environment variable `AWS_PROFILE` if set +5. The profile in the AWS client configuration file located at `~/.aws/config` + - Uses the `default `profile or the environment variable `AWS_PROFILE` if set +6. The temporary AWS credentials provided by an IAM instance role. See [IAM Roles](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/iam-roles-for-amazon-ec2.html) documentation for details. + +More information regarding [AWS Security Credentials](http://docs.aws.amazon.com/general/latest/gr/aws-security-credentials.html) are available in the AWS documentation. + +### IAM roles with AWS EC2 instances + +When running your pipeline in an EC2 instance, IAM roles can be used to grant access to AWS resources. + +In this scenario, you only need to launch the EC2 instance with an IAM role which includes the `AmazonS3FullAccess` policy. Nextflow will detect and automatically acquire the permission to access S3 storage, without any further configuration. + +Learn more about [Using IAM Roles to Delegate Permissions to Applications that Run on AWS EC2](http://docs.aws.amazon.com/IAM/latest/UserGuide/roles-usingrole-ec2instance.html) in the AWS documentation. + +## China regions + +To use an AWS China region, make sure to specify the corresponding AWS API S3 endpoint in the Nextflow configuration file as shown below: + +```groovy +aws { + client { + endpoint = "https://s3.cn-north-1.amazonaws.com.cn" + } +} +``` + +Read more about AWS API endpoints in the [AWS documentation](https://docs.aws.amazon.com/general/latest/gr/s3.html) + +## S3-compatible storage + +To use S3-compatible object storage such as [Ceph](https://ceph.io) or [Minio](https://min.io) specify the endpoint of +your storage provider and enable the [S3 path style access](https://docs.aws.amazon.com/AmazonS3/latest/userguide/VirtualHosting.html#path-style-access) +in your Nextflow configuration as shown below: + + +```groovy +aws { + accessKey = '' + secretKey = '' + client { + endpoint = '' + s3PathStyleAccess = true + } +} +``` + +## Advanced configuration + +See [AWS configuration][config-aws] for more information about advanced S3 client configuration options. + + +[config-aws]: /nextflow_docs/nextflow_repo/docs/reference/config.mdx#aws +[stdlib-types-path]: /nextflow_docs/nextflow_repo/docs/reference/stdlib-types.mdx#path +[working-with-files]: /nextflow_docs/nextflow_repo/docs/working-with-files.mdx \ No newline at end of file diff --git a/docs/aws.md b/new-docs/nextflow-docs/aws.mdx similarity index 88% rename from docs/aws.md rename to new-docs/nextflow-docs/aws.mdx index cb1c2e4888..fba195e282 100644 --- a/docs/aws.md +++ b/new-docs/nextflow-docs/aws.mdx @@ -1,8 +1,8 @@ -(aws-page)= + # Amazon Web Services -:::{tip} +:::tip This page describes how to manually set up and use Nextflow with AWS Cloud. You may be interested in using [Batch Forge](https://docs.seqera.io/platform/latest/compute-envs/aws-batch) in [Seqera Platform](https://seqera.io/platform/), which automatically creates the required AWS infrastructure for you with minimal intervention. @@ -14,7 +14,7 @@ Nextflow uses the [AWS security credentials](https://docs.aws.amazon.com/general The AWS credentials are selected from the following sources, in order of descending priority: -1. Nextflow configuration file - `aws.accessKey` and `aws.secretKey`. See {ref}`AWS configuration` for more details. +1. Nextflow configuration file - `aws.accessKey` and `aws.secretKey`. See [aws][config-aws] for more details. 2. A custom profile in `$HOME/.aws/credentials` and/or `$HOME/.aws/config`. The profile can be supplied from the `aws.profile` config option, or the `AWS_PROFILE` or `AWS_DEFAULT_PROFILE` environmental variables. @@ -24,8 +24,8 @@ The AWS credentials are selected from the following sources, in order of descend 5. Single Sign-On (SSO) credentials. See the [AWS documentation](https://docs.aws.amazon.com/cli/latest/userguide/sso-configure-profile-token.html) for more details. - :::{versionadded} 23.07.0-edge - ::: + + 6. EC2 instance profile credentials. See the [AWS documentation](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/iam-roles-for-amazon-ec2.html) and [this blog post](https://aws.amazon.com/blogs/security/granting-permission-to-launch-ec2-instances-with-iam-roles-passrole-permission/) for more details. @@ -88,7 +88,7 @@ Minimal permissions policies to be attached to the AWS account used by Nextflow Alternatively, you can use AWS provided `AmazonEC2ContainerRegistryReadOnly` managed policy. -:::{note} +:::note If you are running Fargate or Fargate Spot, you may need the following policies in addition to the listed above: ```json "ec2:DescribeSubnets" @@ -150,28 +150,24 @@ Depending on the pipeline configuration, the above actions can be done all in a See the [bucket policy documentation](https://docs.aws.amazon.com/config/latest/developerguide/s3-bucket-policy.html) for additional details. -(aws-batch)= - ## AWS Batch [AWS Batch](https://aws.amazon.com/batch/) is a managed computing service that allows the execution of containerised workloads in the AWS cloud infrastructure. It dynamically provisions the optimal quantity and type of compute resources (e.g., CPU or memory optimized compute resources) based on the volume and specific resource requirements of the jobs submitted. Nextflow provides built-in support for AWS Batch, allowing the seamless deployment of Nextflow pipelines in the cloud, in which tasks are offloaded as Batch jobs. -Read the {ref}`AWS Batch executor ` section to learn more about the `awsbatch` executor in Nextflow. - -(aws-batch-cli)= +See [AWS Batch][awsbatch-executor] to learn more about the `awsbatch` executor in Nextflow. ### AWS CLI -:::{tip} +:::tip The need for the AWS CLI is considered a legacy requirement for the deployment of Nextflow pipelines with AWS Batch. -Instead, consider using {ref}`wave-page` and {ref}`fusion-page` to facilitate access to S3 without using the AWS CLI. +Instead, consider using [Wave][wave-page] and [Fusion][fusion-page] to facilitate access to S3 without using the AWS CLI. ::: Nextflow uses the [AWS command line tool](https://aws.amazon.com/cli/) (`aws`) to stage input files and output files between S3 and the task containers. -The `aws` command can be made available by either (1) installing it in the container image(s) or (2) installing it in a {ref}`custom AMI ` to be used instead of the default AMI when configuring AWS Batch. +The `aws` command can be made available by either (1) installing it in the container image(s) or (2) installing it in a [custom AMI][aws-custom-ami]` to be used instead of the default AMI when configuring AWS Batch. ### Get started @@ -186,7 +182,7 @@ The `aws` command can be made available by either (1) installing it in the conta 3. In the AWS Console, create an S3 bucket for the work directory (see below). You can also create separate buckets for input data and results, as needed. -4. Make sure that every process in your pipeline specifies a Docker container with the {ref}`process-container` directive. +4. Make sure that every process in your pipeline specifies a Docker container with the [container][process-container] directive. 5. Make sure that all of your container images are published in a Docker registry that can be reached by AWS Batch, such as [Docker Hub](https://hub.docker.com/), [Quay](https://quay.io/), or [Elastic Container Registry](https://aws.amazon.com/ecr/). @@ -194,9 +190,9 @@ The `aws` command can be made available by either (1) installing it in the conta To configure your pipeline for AWS Batch: -1. Specify the AWS Batch {ref}`executor ` -2. Specify the AWS Batch queue with the {ref}`process-queue` directive -3. Specify any Batch job container options with the {ref}`process-containerOptions` directive. +1. Specify the AWS Batch [executor][awsbatch-executor] +2. Specify the AWS Batch queue with the [queue][process-queue] directive +3. Specify any Batch job container options with the [containerOptions][process-containerOptions] directive. An example `nextflow.config` file is shown below: @@ -217,16 +213,15 @@ aws { } ``` -:::{tip} -Each process can be configured with its own queue by using the {ref}`process-queue` directive in the process definition or via {ref}`config-process-selectors` in your Nextflow configuration. +:::tip +Each process can be configured with its own queue by using the [queue][process-queue] directive in the process definition or via [Process selectors][config-process-selectors] in your Nextflow configuration. ::: ## Container Options -:::{versionadded} 21.12.1-edge -::: + -The {ref}`process-containerOptions` directive can be used to control the properties of the container execution associated with each Batch job. +The [containerOptions][process-containerOptions] directive can be used to control the properties of the container execution associated with each Batch job. The following container options are currently supported: @@ -267,8 +262,6 @@ containerOptions '--ulimit nofile=1280:2560 --ulimit nproc=16:32 --privileged' Check the [AWS documentation](https://docs.aws.amazon.com/batch/latest/APIReference/API_ContainerProperties.html) for further details. -(aws-custom-ami)= - ## Custom AMI There are several reasons why you might need to create your own [AMI (Amazon Machine Image)](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AMIs.html) to use in your Compute Environments: @@ -283,7 +276,7 @@ There are several reasons why you might need to create your own [AMI (Amazon Mac From the EC2 Dashboard, select **Launch Instance**, then select **Browse more AMIs**. In the new page, select **AWS Marketplace AMIs**, and then search for `Amazon ECS-Optimized Amazon Linux 2 (AL2) x86_64 AMI`. Select the AMI and continue as usual to configure and launch the instance. -:::{note} +:::note The selected instance has a root volume of 30GB. Make sure to increase its size or add a second EBS volume with enough storage for real genomic workloads. ::: @@ -293,17 +286,16 @@ Finally, select **Create Image** from the EC2 Dashboard to create a new AMI from The new AMI ID needs to be specified when creating the Batch Compute Environment. -:::{warning} +:::warning Any additional software must be installed on the EC2 instance *before* creating the AMI. ::: -(id2)= ### AWS CLI installation -:::{tip} +:::tip The need for the AWS CLI is considered a legacy requirement for the deployment of Nextflow pipelines with AWS Batch. -Instead, consider using {ref}`wave-page` and {ref}`fusion-page` to facilitate access to S3 without using the AWS CLI. +Instead, consider using [Wave][wave-page] and [Fusion][fusion-page] to facilitate access to S3 without using the AWS CLI. ::: The [AWS CLI](https://aws.amazon.com/cli) should be installed in your custom AMI using a self-contained package manager such as [Conda](https://conda.io). That way, you can control which version of Python is used by the AWS CLI (which is written in Python). @@ -328,7 +320,7 @@ $ ./miniconda/bin/aws --version aws-cli/1.29.20 Python/3.11.4 Linux/4.14.318-241.531.amzn2.x86_64 botocore/1.31.20 ``` -:::{note} +:::note The `aws` tool will be placed in a directory named `bin` in the main installation folder. Modifying this directory structure after the tool is installed will cause it to not work properly. ::: @@ -340,11 +332,7 @@ aws.batch.cliPath = '/home/ec2-user/miniconda/bin/aws' Replace the path above with the one matching the location where the `aws` tool is installed in your AMI. -:::{versionchanged} 19.07.0 -The `executor.awscli` config option was replaced by `aws.batch.cliPath`. -::: - -:::{warning} +:::warning The grandparent directory of the `aws` tool will be mounted into the container at the same path as the host, e.g. `/home/ec2-user/miniconda`, which will shadow existing files in the container. Make sure you use a path that is not already present in the container. ::: @@ -389,11 +377,11 @@ To test the installation: curl -s http://localhost:51678/v1/metadata | python -mjson.tool ``` -:::{note} +:::note The `AmazonEC2ContainerServiceforEC2Role` policy must be attached to the instance role in order to be able to connect the EC2 instance created by the Compute Environment to the ECS container. ::: -:::{note} +:::note The `AmazonEC2ContainerRegistryReadOnly` policy should be attached to the instance role in order to get read-only access to Amazon EC2 Container Registry repositories. ::: @@ -405,7 +393,7 @@ Nextflow automatically creates the Batch [Job definitions](http://docs.aws.amazo However, sometimes you may still need to specify a custom **Job Definition** to fine tune the configuration of a specific job, for example to define custom mount paths. -To do that, first create a **Job Definition** in the AWS Console (or by other means). Note the name of the Job definition you created. You can then associate a process execution with this Job definition by using the {ref}`process-container` directive and specifying, in place of the container image name, the Job definition name prefixed by `job-definition://`, as shown below: +To do that, first create a **Job Definition** in the AWS Console (or by other means). Note the name of the Job definition you created. You can then associate a process execution with this Job definition by using the [container][process-container] directive and specifying, in place of the container image name, the Job definition name prefixed by `job-definition://`, as shown below: ```groovy process.container = 'job-definition://your-job-definition-name' @@ -415,7 +403,7 @@ process.container = 'job-definition://your-job-definition-name' Nextflow allows the use of multiple executors in the same workflow application. This feature enables the deployment of hybrid workloads in which some jobs are executed in the local computer or local computing cluster and some jobs are offloaded to AWS Batch. -To enable this feature, use one or more {ref}`config-process-selectors` in your Nextflow configuration to apply the AWS Batch configuration to the subset of processes that you want to offload. For example: +To enable this feature, use one or more [Process selectors][config-process-selectors] in your Nextflow configuration to apply the AWS Batch configuration to the subset of processes that you want to offload. For example: ```groovy process { @@ -431,7 +419,7 @@ aws { } ``` -With the above configuration, processes with the `bigTask` {ref}`process-label` will run on AWS Batch, while the remaining processes will run in the local computer. +With the above configuration, processes with the `bigTask` [label][process-label] will run on AWS Batch, while the remaining processes will run in the local computer. Then launch the pipeline with the -bucket-dir option to specify an AWS S3 path for the jobs computed with AWS Batch and, optionally, the -work-dir to specify the local storage for the jobs computed locally: @@ -439,19 +427,16 @@ Then launch the pipeline with the -bucket-dir option to specify an AWS S3 path f nextflow run