Skip to content
This repository was archived by the owner on Aug 9, 2023. It is now read-only.

Commit 559fbd8

Browse files
authored
Merge pull request #3 from aws-samples/copy-edit
Copy edit
2 parents dbd28d5 + 560bc22 commit 559fbd8

File tree

8 files changed

+61
-63
lines changed

8 files changed

+61
-63
lines changed

docs/core-env/create-custom-compute-resources.md

Lines changed: 6 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,16 @@
11
# Creating Custom Compute Resources
22

33
Genomics is a data-heavy workload and requires some modification to the defaults
4-
used for batch job processing. In particular, we need to be able to scale the
5-
storage used by the instances Tasks/Jobs run on to meet unpredictable runtime
6-
demands.
4+
used for batch job processing. In particular, instances running the Tasks/Jobs need scalable storage to meet unpredictable runtime demands.
75

86
By default, AWS Batch relies upon the [Amazon ECS-Optimized AMI](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-optimized_AMI.html)
9-
as the image used to for instances it launches to run jobs. What this image
10-
provides is sufficient in most cases. Specialized needs, such as the large
7+
to launch container instances for running jobs. This is sufficient in most cases, but specialized needs, such as the large
118
storage requirements noted above, require customization of the base AMI.
129

1310
This section provides two methods for customizing the base ECS-Optimized AMI
14-
that adds an expandable working directory that the Jobs will use to write data.
15-
That directory will be monitored by a process that inspects the free space
16-
available and adds more EBS volumes and expands the filesystem on the fly, like so:
11+
that adds an expandable working directory for jobs to write data.
12+
A process will monitor the directory and add more EBS volumes on the fly to expand the free space
13+
based on the capacity threshold, like so:
1714

1815
![Autoscaling EBS storage](images/ebs-autoscale.png)
1916

@@ -92,4 +89,4 @@ Once your AMI is created, you will need to jot down its unique AMI Id. You will
9289
need this when creating compute resources in AWS Batch.
9390

9491
!!! note
95-
This is considered advanced use. All documentation and CloudFormation templates hereon assumes use of EC2 Launch Templates.
92+
This is considered advanced use. All documentation and CloudFormation templates hereon assumes use of EC2 Launch Templates.

docs/core-env/create-iam-roles.md

Lines changed: 10 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,19 @@
1-
# Creating IAM Roles
1+
# Permissions
22

3-
Below are IAM roles that your job execution environment in AWS Batch will use:
3+
## Create IAM Roles
44

5-
* Batch Service Role:
5+
IAM roles that your job execution environment in AWS Batch will use include:
6+
7+
* **Batch Service Role (required)**:
68

7-
(required)
89
Role used by AWS Batch to call other AWS services on its behalf.
910
AWS Batch makes calls to other AWS services on your behalf to manage the resources that you use with the service. Before you can use the service, you must have an IAM policy and role that provides the necessary permissions to AWS Batch.
1011
[(Learn More)](https://docs.aws.amazon.com/batch/latest/userguide/service_IAM_role.html)
1112

12-
* Batch Instance Profile:
13+
* **Batch Instance Profile (required)**:
1314

14-
(required)
1515
Role that defines service permissions for EC2 instances launched by AWS Batch.
16-
For example, this is used to specify policies that allow access to specific S3 buckets and modify storage on the intance (shown below).
16+
For example, this is used to specify policies that allow access to specific S3 buckets and modify storage on the instance (shown below).
1717
[(Learn More)](https://docs.aws.amazon.com/batch/latest/userguide/instance_IAM_role.html)
1818

1919
```yaml
@@ -46,16 +46,14 @@ Below are IAM roles that your job execution environment in AWS Batch will use:
4646
Resource: "*"
4747
```
4848
49-
* Batch SpotFleet Role:
49+
* **Batch SpotFleet Role (depends)**:
5050
51-
(depends)
52-
This is role is needed if you intend to launch spot instances from AWS Batch.
51+
This role is needed if you intend to launch spot instances from AWS Batch.
5352
If you create a managed compute environment that uses Amazon EC2 Spot Fleet Instances, you must create a role that grants the Spot Fleet permission to bid on, launch, tag, and terminate instances on your behalf.
5453
[(Learn More)](https://docs.aws.amazon.com/batch/latest/userguide/spot_fleet_IAM_role.html)
5554
56-
* Batch Job Role:
55+
* **Batch Job Role (optional)**:
5756
58-
(optional)
5957
Role used to provide service permissions to individual jobs.
6058
Jobs can run without an IAM role. In that case, they inherit the
6159
permissions of the instance they run on.

docs/core-env/create-s3-bucket.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Creating an S3 Bucket
1+
# Data Storage
22

33
You will need a robust location to store your input and output data. As mentioned
44
previously, genomics data files are fairly large. In addition to input sample
@@ -12,6 +12,8 @@ The following are key criteria for storing data for genomics workflows
1212
* durable
1313
* capable of handling large files
1414

15+
## Create an S3 Bucket
16+
1517
Amazon S3 buckets meet all of the above conditions.
1618

1719
You can use an existing bucket for your workflows, or you can create a new one using the CloudFormation template below.

docs/core-env/setup-aws-batch.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -14,19 +14,19 @@ A [job definition](http://docs.aws.amazon.com/batch/latest/userguide/job_definit
1414

1515
Jobs are submitted to [job queues](http://docs.aws.amazon.com/batch/latest/userguide/job_queues.html) where they reside until they can be scheduled to run on Amazon EC2 instances within a compute environment. An AWS account can have multiple job queues, each with varying priority. This gives you the ability to closely align the consumption of compute resources with your organizational requirements.
1616

17-
[Compute environments](http://docs.aws.amazon.com/batch/latest/userguide/compute_environments.html) provision and manage your EC2 instances and other compute resources that are used to run your AWS Batch jobs. Job queues are mapped to one more compute environments and a given environment can also be mapped to one or more job queues. This many-to-many relationship is defined by the compute environment order and job queue priority properties.
17+
[Compute environments](http://docs.aws.amazon.com/batch/latest/userguide/compute_environments.html) provision and manage your EC2 instances and other compute resources that are used to run your AWS Batch jobs. Job queues are mapped to one or more compute environments and a given environment can also be mapped to one or more job queues. This many-to-many relationship is defined by the compute environment order and job queue priority properties.
1818

1919
The following diagram shows a general overview of how the AWS Batch resources interact.
2020

2121
![AWS Batch environment](https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2018/04/23/AWSBatchresoucreinteract-diagram.png)
2222

2323
For more information, watch the [How AWS Batch Works](https://www.youtube.com/watch?v=T4aAWrGHmxQ) video.
2424

25-
### Requirements for AWS Batch Jobs
25+
## AWS Batch Jobs Requirements
2626

27-
AWS Batch does not make assumptions on the structure and requirements that Jobs take with respect to inputs and outputs. Batch Jobs may take data streams, files, or only parameters as input, and produce the same variaty for output, inclusive of files, metadata changes, updates to databases, etc. Batch assumes that each application handles their own input/output requirements.
27+
AWS Batch does not make assumptions on the structure and requirements that Jobs take with respect to inputs and outputs. Batch Jobs may take data streams, files, or only parameters as input, and produce the same variety for output, inclusive of files, metadata changes, updates to databases, etc. Batch assumes that each application handles their own input/output requirements.
2828

29-
A common pattern for bioinformatics tooling is that files such as genomic sequence data are both inputs and outputs to/from a process. Many bioinformatics tools have also been developed to run in traditional Linux-based compute clusters with shared filesystems, and are not necessarily optimized for cloud computing.
29+
A common pattern for bioinformatics tooling is that files such as genomic sequence data are both inputs and outputs to/from a process. Many bioinformatics tools have also been developed to run in traditional Linux-based compute clusters with shared filesystems and are not necessarily optimized for cloud computing.
3030

3131
The set of common requirements for genomics on AWS Batch are:
3232

@@ -36,19 +36,19 @@ The set of common requirements for genomics on AWS Batch are:
3636

3737
* Multitenancy:
3838

39-
Multiple container jobs may run concurrently on the same instance. In these situations, it’s essential that your job writes to a unique subdirectory.
39+
Multiple container jobs may run concurrently on the same instance. In these situations, it is essential that your job writes to a unique subdirectory.
4040

4141
* Data cleanup:
4242

4343
As your jobs complete and write the output back to S3, it is a good idea to delete the scratch data generated by that job on your instance. This allows you to optimize for cost by reusing EC2 instances if there are jobs remaining in the queue, rather than terminating the EC2 instances.
4444

45-
## What you will need
45+
## AWS Batch Environment
4646

4747
A complete AWS Batch environment consists of the following:
4848

4949
1. A Compute Environment that utilizes [EC2 Spot instances](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-spot-instances.html) for cost-effective computing
5050
2. A Compute Environment that utilizes EC2 on-demand (e.g. [public pricing](https://aws.amazon.com/ec2/pricing/on-demand/)) instances for high-priority work that can't risk job interruptions or delays due to insufficient Spot capacity.
51-
3. A default Job Queue that utilizes the Spot compute environment first, but falls back to the on-demand compute environment if there is spare capacity already there.
51+
3. A default Job Queue that utilizes the Spot compute environment first, but falls back to the on-demand compute environment if there is spare capacity available.
5252
4. A high-priority Job Queue that leverages the on-demand and Spot CE's (in that order) and has higher priority than the default queue.
5353

5454
The CloudFormation template below will create all of the above.

docs/orchestration/cromwell/cromwell-overview.md

Lines changed: 10 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
system for scientific workflows developed by the [Broad Institute](https://broadinstitute.org/)
77
and supports job execution using [AWS Batch](https://aws.amazon.com/batch/).
88

9-
## TL;DR
9+
## Full Stack Deployment (TL;DR)
1010

1111
If you need a Cromwell server backed by AWS **now** and will worry about the
1212
details later, use the CloudFormation template below.
@@ -16,7 +16,8 @@ details later, use the CloudFormation template below.
1616
{{ cfn_stack_row("Cromwell All-in-One", "Cromwell", "cromwell/cromwell-aio.template.yaml", "Create all resources needed to run Cromwell on AWS: an S3 Bucket, AWS Batch Environment, and Cromwell Server Instance") }}
1717

1818
When the above stack is complete, navigate to the `HostName` that is generated
19-
in the outputs to access Cromwell via its SwaggerUI.
19+
in the outputs to access Cromwell via its SwaggerUI, which provides a simple web interface for submitting workflows.
20+
2021

2122
![cromwell on aws](images/cromwell-all-in-one.png)
2223

@@ -36,7 +37,7 @@ these setup.
3637
!!! note
3738
For a Cromwell server that will run multiple workflows, or workflows with many
3839
steps (e.g. ones with large scatter steps), it is recommended to setup a
39-
database to use to store workflow metadata.
40+
database to store workflow metadata.
4041

4142
## Custom Compute Resource with Cromwell Additions
4243

@@ -49,7 +50,7 @@ Once complete, you will have a resource ID to give to AWS Batch to setup compute
4950

5051
## Cromwell Server
5152

52-
To ensure the highest level of security, and robustness for long running workflows,
53+
To ensure the highest level of security and robustness for long running workflows,
5354
it is recommended that you use an EC2 instance as your Cromwell server for submitting
5455
workflows to AWS Batch.
5556

@@ -222,7 +223,7 @@ backend {
222223

223224
The above file uses the [default credential provider chain](https://docs.aws.amazon.com/sdk-for-java/v1/developer-guide/credentials.html) for authorization.
224225

225-
Replace the following with values appropriate for your accoutn and workload:
226+
Replace the following with values appropriate for your account and workload:
226227

227228
* `<your region>` : the AWS region your S3 bucket and AWS Batch environment are
228229
deployed into - e.g. `us-east-1`
@@ -235,7 +236,7 @@ Replace the following with values appropriate for your accoutn and workload:
235236

236237
!!! note
237238
The CloudFormation template above automatically starts Cromwell on launch.
238-
Use the instructions below are if you are provisioning your own EC2 instance.
239+
Use the instructions below if you are provisioning your own EC2 instance.
239240

240241
Log into your server using SSH. If you setup a port tunnel, you can interact
241242
with Cromwell's REST API from your local machine:
@@ -266,13 +267,13 @@ your local machine by navigating to:
266267

267268
## Running a workflow
268269

269-
To submit a workflow to your Cromwell server, you can use:
270+
To submit a workflow to your Cromwell server, you can use any of the following:
270271

271272
* Cromwell's SwaggerUI in a web-browser
272273
* a REST client like [Insomnia](https://insomnia.rest/) or [Postman](https://www.getpostman.com/)
273-
* or, the command line with `curl`
274+
* the command line with `curl`
274275

275276
After submitting a workflow, you can monitor the progress of tasks via the
276277
AWS Batch console.
277278

278-
The next section provides some examples of running Crommwell on AWS.
279+
The next section provides some examples of running Crommwell on AWS.

docs/orchestration/step-functions/step-functions-overview.md

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44

55
[AWS Step Functions](https://aws.amazon.com/step-functions/) is a service that allows you to orchestrate other AWS services, such as Lambda, Batch, SNS, and Glue, making it easy to coordinate the components of distributed applications as a series of steps in a visual workflow.
66

7-
In the context of genomics workflows, the combination of Step Functions with Batch and Lambda constitutes a robust and scalable task orchestration solution.
7+
In the context of genomics workflows, the combination of AWS Step Functions with Batch and Lambda constitutes a robust, scalable, and serverless task orchestration solution.
88

99
<!--// not ready for release yet
1010
// TODO: create the example sfn state-machine cfn template
@@ -14,7 +14,7 @@ If you need something up and running in a hurry, and you've followed all of the
1414
steps in the [Getting Started](../../../core-env/introduction/) section, you
1515
already have the majority of what you need setup.
1616
17-
The last component you need is a Step Functions state machine for your workflow.
17+
The last component you need is an AWS Step Functions state machine for your workflow.
1818
Below is a CloudFormation template that creates an example state-machine that
1919
you can modify to suit your needs.
2020
@@ -27,15 +27,15 @@ you can modify to suit your needs.
2727

2828
To get started using AWS Step Functions for genomics workflows you'll need the following setup in your AWS account:
2929

30-
1. The core set of resources (S3 Bucket, IAM Roles, AWS Batch) described in the [Getting Started](../../../core-env/introduction/) section.
30+
* The core set of resources (S3 Bucket, IAM Roles, AWS Batch) described in the [Getting Started](../../../core-env/introduction/) section.
3131

32-
## Step Functions Execution Role
32+
## AWS Step Functions Execution Role
3333

34-
A Step Functions Execution role is an IAM role that allows Step Functions to execute other AWS services via the state machine.
34+
An AWS Step Functions Execution role is an IAM role that allows Step Functions to execute other AWS services via the state machine.
3535

36-
This can be created automatically during the "first-run" experience in the Step Functions console when you create your first state machine. The policy attached to the role will depend on the specifc tasks you incorporate into your state machine.
36+
This can be created automatically during the "first-run" experience in the AWS Step Functions console when you create your first state machine. The policy attached to the role will depend on the specifc tasks you incorporate into your state machine.
3737

38-
For state machines that use AWS Batch for job execution and send events to CloudWatch, the should have an Execution role with the following inline policy:
38+
State machines that use AWS Batch for job execution and send events to CloudWatch should have an Execution role with the following inline policy:
3939

4040
```json
4141
{
@@ -67,7 +67,7 @@ For state machines that use AWS Batch for job execution and send events to Cloud
6767

6868
## Step Functions State Machine
6969

70-
Workflows in AWS Step Functions are built using [Amazon States Language](https://docs.aws.amazon.com/step-functions/latest/dg/concepts-amazon-states-language.html) (ASL), a declarative, JSON-based, structured language used to define your state machine, a collection of states, that can do work (Task states), determine which states to transition to next (Choice states), stop an execution with an error (Fail states), and so on.
70+
Workflows in AWS Step Functions are built using [Amazon States Language](https://docs.aws.amazon.com/step-functions/latest/dg/concepts-amazon-states-language.html) (ASL), a declarative, JSON-based, structured language used to define your state machine, a collection of states that can do work (Task states), determine which states to transition to next (Choice states), stop an execution with an error (Fail states), and so on.
7171

7272
### Building workflows with AWS Step Functions
7373

@@ -185,7 +185,7 @@ An example Job Definition for the `bwa-mem` sequence aligner is shown below:
185185

186186
### State Machine Batch Job Tasks
187187

188-
Conveniently for genomics workflows, Step Functions has built-in integration with AWS Batch (and [several other services](https://docs.aws.amazon.com/step-functions/latest/dg/concepts-connectors.html)), and provides snippets of code to make developing your state-machine
188+
Conveniently for genomics workflows, AWS Step Functions has built-in integration with AWS Batch (and [several other services](https://docs.aws.amazon.com/step-functions/latest/dg/concepts-connectors.html)), and provides snippets of code to make developing your state-machine
189189
Batch tasks easier.
190190

191191
![Manage a Batch Job Snippet](images/sfn-batch-job-snippet.png)

0 commit comments

Comments
 (0)