Skip to content
This repository was archived by the owner on Aug 9, 2023. It is now read-only.

Commit 93125fd

Browse files
committed
update sfn documentation
1 parent d71abec commit 93125fd

File tree

1 file changed

+45
-17
lines changed

1 file changed

+45
-17
lines changed

docs/orchestration/step-functions/step-functions-overview.md

Lines changed: 45 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -6,13 +6,50 @@ The [AWS Step Functions](https://aws.amazon.com/step-functions/) service allows
66

77
In the context of genomics workflows, the combination of AWS Step Functions with Batch and Lambda constitutes a robust, scalable, and serverless task orchestration solution.
88

9-
## Prerequisites
9+
## Requirements
1010

1111
To get started using AWS Step Functions for genomics workflows you'll need the following setup in your AWS account:
1212

13-
* The core set of resources (S3 Bucket, IAM Roles, AWS Batch) described in the [Getting Started](../../../core-env/introduction/) section.
13+
* A VPC with at least 2 subnets (preferrably ones that are **private**)
14+
* The Genomics Workflow [Core Environment](../../core-env/introduction.md)
15+
* Containerized tools for your workflow steps like BWA-MEM, Samtools, BCFtools custom entrypoint scripts that uses AWS Batch supplied environment variables for configuration and data handling
16+
* A Batch Job Definitions for your tools
17+
* An IAM Role for AWS Step Functions that allows it to submit AWS Batch jobs
1418

15-
## AWS Step Functions Execution Role
19+
20+
The following will help you deploy these components
21+
22+
### VPC
23+
24+
If you are handling sensitive data in your genomics pipelines, we recommend using at least 2 **private** subnets for AWS Batch compute jobs. EC2 instances launched into **private** subnets do not have public IP addresses, and therefore cannot be directly accessed from the public internet. They can still retain internet access from within the VPC - e.g. to pull source code, retrive public datasets, or install required softwre - if networking is configured appropriately. If the target VPC you want to deploy into already has this, you can skip ahead. If not, you can use the CloudFormation template below, which uses the [AWS VPC Quickstart](https://aws.amazon.com/quickstart/architecture/vpc/), to create one meeting these requirements.
25+
26+
| Name | Description | Source | Launch Stack |
27+
| -- | -- | :--: | :--: |
28+
{{ cfn_stack_row("VPC (Optional)", "GenomicsVPC", "https://aws-quickstart.s3.amazonaws.com/quickstart-aws-vpc/templates/aws-vpc.template", "Creates a new Virtual Private Cloud to use for your genomics workflow resources.") }}
29+
30+
### Genomics Workflow Core
31+
32+
To launch the Genomics Workflow Core in your AWS account, use the CloudFormation template below.
33+
34+
| Name | Description | Source | Launch Stack |
35+
| -- | -- | :--: | :--: |
36+
{{ cfn_stack_row("Genomics Workflow Core", "GWFCore", "gwfcore/gwfcore-root.template.yaml", "Create EC2 Launch Templates, AWS Batch Job Queues and Compute Environments, a secure Amazon S3 bucket, and IAM policies and roles within an **existing** VPC. _NOTE: You must provide VPC ID, and subnet IDs_.") }}
37+
38+
The core is agnostic of the workflow orchestrator you intended to use, and can be installed multiple times in your account if needed (e.g. for use by different projects). Each installation uses a `Namespace` value to group resources accordingly. By default, the `Namespace` is set to the stack name, which must be unique within an AWS region.
39+
40+
See the [Core Environment](../../core-env/introduction.md) For more details on the core's architecture.
41+
42+
### Step Functions Resources
43+
44+
The the following CloudFormation template will create an AWS Step Functions State Machine that defines an example variant calling workflow using BWA-MEM, Samtools, and BCFtools; container images and AWS Batch Job Definitions for the tooling; and an IAM Role that allows AWS Step Functions to call AWS Batch during State Machine executions:
45+
46+
| Name | Description | Source | Launch Stack |
47+
| -- | -- | :--: | -- |
48+
{{ cfn_stack_row("AWS Step Functions Resources", "SfnResources", "step-functions/sfn-resources.template.yaml", "Create a Step Functions State Machine, Batch Job Definitions, and container images to run an example genomics workflow") }}
49+
50+
## Deployment Details
51+
52+
### AWS Step Functions Execution Role
1653

1754
An AWS Step Functions Execution role is an IAM role that allows Step Functions to execute other AWS services via the state machine.
1855

@@ -79,11 +116,11 @@ For more complex workflows that use nested workflows or require more complex inp
79116
!!! note
80117
All `Resource` values in the policy statements above can be scoped to be more specific if needed.
81118

82-
## Step Functions State Machine
119+
### Step Functions State Machines
83120

84121
Workflows in AWS Step Functions are built using [Amazon States Language](https://docs.aws.amazon.com/step-functions/latest/dg/concepts-amazon-states-language.html) (ASL), a declarative, JSON-based, structured language used to define a "state-machine". An AWS Step Functions State-Machine is a collection of states that can do work (Task states), determine which states to transition to next (Choice states), stop an execution with an error (Fail states), and so on.
85122

86-
### Building workflows with AWS Step Functions
123+
#### Building workflows with AWS Step Functions
87124

88125
The overall structure of a state-machine looks like the following:
89126

@@ -136,7 +173,7 @@ ASL supports several task types and simple structures that can be combined to fo
136173
More detailed coverage of ASL state types and structures is provided in the
137174
Step Functions [ASL documentation](https://docs.aws.amazon.com/step-functions/latest/dg/concepts-amazon-states-language.html).
138175

139-
### Batch Job Definitions
176+
#### Batch Job Definitions
140177

141178
[AWS Batch Job Definitions](https://docs.aws.amazon.com/batch/latest/userguide/job_definitions.html) are used to define compute resource requirements and parameter defaults for an AWS Batch Job. These are then referenced in state machine `Task` states by their respective ARNs.
142179

@@ -223,7 +260,7 @@ There are three key parts of the above definition to take note of.
223260
Together, **volumes** and **mountPoints** define what you would provide as using a `-v hostpath:containerpath` option to a `docker run` command. These can be used to map host directories with resources (e.g. data or tools) used by all containers. In the example above, a `scratch` volume is mapped so that the container can utilize a larger disk on the host. Also, a version of the AWS CLI installed with `conda` is mapped into the container - enabling the container to have access to it (e.g. so it can transfer data from S3 and back) with out explicitly building in.
224261

225262

226-
### State Machine Batch Job Tasks
263+
#### State Machine Batch Job Tasks
227264

228265
AWS Step Functions has built-in integration with AWS Batch (and [several other services](https://docs.aws.amazon.com/step-functions/latest/dg/concepts-connectors.html)), and provides snippets of code to make developing your state-machine tasks easier.
229266

@@ -276,16 +313,7 @@ Inputs to a state machine that uses the above `BwaMemTask` would look like this:
276313

277314
When the Task state completes Step Functions will add information to a new `status` key under `bwa-mem` in the JSON object. The complete object will be passed on to the next state in the workflow.
278315

279-
## Example state machine
280-
281-
The following CloudFormation template creates container images, AWS Batch Job Definitions, and an AWS Step Functions State Machine for a simple genomics workflow using bwa, samtools, and bcftools.
282-
283-
| Name | Description | Source | Launch Stack |
284-
| -- | -- | :--: | :--: |
285-
{{ cfn_stack_row("AWS Step Functions Example", "SfnExample", "step-functions/sfn-workflow.template.yaml", "Create a Step Functions State Machine, Batch Job Definitions, and container images to run an example genomics workflow") }}
286-
287-
!!! note
288-
The stack above needs to create several IAM Roles. You must have administrative privileges in your AWS Account for this to succeed.
316+
### Example state machine
289317

290318
The example workflow is a simple secondary analysis pipeline that converts raw FASTQ files into VCFs with variants called for a list of chromosomes. It uses the following open source based tools:
291319

0 commit comments

Comments
 (0)