Skip to content
This repository was archived by the owner on Aug 9, 2023. It is now read-only.

Commit b874046

Browse files
authored
Release Update (#65)
* Update README * update nextflow assets * update container entrypoint script * handle s3 uris as projects * sync session cache for resume * use logdir and workdir as defined in environment variables * update job definition to match container entrypoint script * create logdir and workdir environment variables * update s3 paths to create / use logdir and workdir * update generated config * use the new (19.07) config syntax for specifying path to awscli * update step-functions assets * optimize codebuild speed * increase size of build instance * update container builds * add apt-get update * increase thread counts * handle bam index file staging * handle non-standard paired read file names * update example workflow * use a smaller dataset so that the demo runs faster * implement ebs-autoscale on docker datavolume for sfn * implement two stage deployment * update aio templates * remove deprecated parameters * make default values for parameters consistent * automatically pick 2 AZs in VPCs * make stack exports consistent * update details to core-environment
1 parent 73ab29f commit b874046

27 files changed

+980
-546
lines changed

.travis.yml

Lines changed: 12 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -27,9 +27,15 @@ before_deploy:
2727
- bash _scripts/configure-deploy.sh
2828

2929
deploy:
30-
provider: script
31-
script: bash _scripts/deploy.sh
32-
skip_cleanup: true
33-
on:
34-
repo: aws-samples/aws-genomics-workflows
35-
branch: master
30+
- provider: script
31+
script: bash _scripts/deploy.sh production
32+
skip_cleanup: true
33+
on:
34+
repo: aws-samples/aws-genomics-workflows
35+
branch: release
36+
- provider: script
37+
script: bash _scripts/deploy.sh test
38+
skip_cleanup: true
39+
on:
40+
repo: aws-samples/aws-genomics-workflows
41+
branch: master

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ The documentation is built using mkdocs.
1111
Install dependencies:
1212

1313
```bash
14-
$ conda env create --file enviroment.yaml
14+
$ conda env create --file environment.yaml
1515
```
1616

1717
This will create a `conda` environment called `mkdocs`

_scripts/deploy.sh

Lines changed: 66 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -5,30 +5,75 @@ set -e
55
bash _scripts/make-artifacts.sh
66
mkdocs build
77

8+
ASSET_BUCKET=s3://aws-genomics-workflows
9+
ASSET_STAGE=${1:-production}
810

9-
echo "publishing artifacts:"
10-
aws s3 sync \
11-
--profile asset-publisher \
12-
--acl public-read \
13-
--delete \
14-
./artifacts \
15-
s3://aws-genomics-workflows/artifacts
1611

12+
function s3_uri() {
13+
BUCKET=$1
14+
shift
1715

18-
echo "publishing templates:"
19-
aws s3 sync \
20-
--profile asset-publisher \
21-
--acl public-read \
22-
--delete \
23-
--metadata commit=$(git rev-parse HEAD) \
24-
./src/templates \
25-
s3://aws-genomics-workflows/templates
16+
IFS=""
17+
PREFIX_PARTS=("$@")
18+
PREFIX_PARTS=(${PREFIX_PARTS[@]})
19+
PREFIX=$(printf '/%s' "${PREFIX_PARTS[@]%/}")
20+
21+
echo "${BUCKET%/}/${PREFIX:1}"
22+
}
2623

2724

28-
echo "publishing site"
29-
aws s3 sync \
30-
--acl public-read \
31-
--delete \
32-
./site \
33-
s3://docs.opendata.aws/genomics-workflows
25+
function artifacts() {
26+
S3_URI=$(s3_uri $ASSET_BUCKET $ASSET_STAGE_PATH "artifacts")
3427

28+
echo "publishing artifacts: $S3_URI"
29+
aws s3 sync \
30+
--profile asset-publisher \
31+
--acl public-read \
32+
--delete \
33+
./artifacts \
34+
$S3_URI
35+
}
36+
37+
function templates() {
38+
S3_URI=$(s3_uri $ASSET_BUCKET $ASSET_STAGE_PATH "templates")
39+
40+
echo "publishing templates: $S3_URI"
41+
aws s3 sync \
42+
--profile asset-publisher \
43+
--acl public-read \
44+
--delete \
45+
--metadata commit=$(git rev-parse HEAD) \
46+
./src/templates \
47+
$S3_URI
48+
}
49+
50+
function site() {
51+
echo "publishing site"
52+
aws s3 sync \
53+
--acl public-read \
54+
--delete \
55+
./site \
56+
s3://docs.opendata.aws/genomics-workflows
57+
}
58+
59+
function all() {
60+
artifacts
61+
templates
62+
site
63+
}
64+
65+
echo "DEPLOYMENT STAGE: $ASSET_STAGE"
66+
case $ASSET_STAGE in
67+
production)
68+
ASSET_STAGE_PATH=""
69+
all
70+
;;
71+
test)
72+
ASSET_STAGE_PATH="test"
73+
artifacts
74+
templates
75+
;;
76+
*)
77+
echo "unsupported staging level - $ASSET_STAGE"
78+
exit 1
79+
esac
Lines changed: 109 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,17 @@
1-
# Creating Custom Compute Resources
1+
# Custom Compute Resources
22

33
Genomics is a data-heavy workload and requires some modification to the defaults
4-
used for batch job processing. In particular, instances running the Tasks/Jobs
5-
need scalable storage to meet unpredictable runtime demands.
4+
used by AWS Batch for job processing. To efficiently use resources, AWS Batch places multiple jobs on an worker instance. The data requirements for individual jobs can range from a few MB to 100s of GB. Instances running workflow jobs will not know beforehand how much space is required, and need scalable storage to meet unpredictable runtime demands.
65

7-
By default, AWS Batch relies upon the [Amazon ECS-Optimized AMI](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-optimized_AMI.html)
8-
to launch container instances for running jobs. This is sufficient in most cases, but specialized needs, such as the large
9-
storage requirements noted above, require customization of the base AMI.
10-
11-
This section provides two methods for customizing the base ECS-Optimized AMI
12-
that adds an expandable working directory for jobs to write data.
13-
A process will monitor the directory and add more EBS volumes on the fly to expand the free space
14-
based on the capacity threshold, like so:
6+
To handle this use case, we can use a process that monitors a scratch directory on an instance and expands free space as needed based on capacity thresholds. This can be done using logical volume management and attaching EBS volumes as needed to the instance like so:
157

168
![Autoscaling EBS storage](images/ebs-autoscale.png)
179

10+
The above process - "EBS autoscaling" - requires a few small dependencies and a simple daemon installed on the host instance.
11+
12+
By default, AWS Batch uses the [Amazon ECS-Optimized AMI](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-optimized_AMI.html)
13+
to launch instances for running jobs. This is sufficient in most cases, but specialized needs, such as the large storage requirements noted above, require customization of the base AMI. Because the provisioning requirements for EBS autoscaling are fairly simple and light weight, one can use an EC2 Launch Template to customize instances.
14+
1815
## EC2 Launch Template
1916

2017
The simplest method for customizing an instance is to use an EC2 Launch Template.
@@ -43,11 +40,13 @@ packages:
4340
- python27-pip
4441
- sed
4542
- wget
43+
# add more package names here if you need them
4644
4745
runcmd:
4846
- pip install -U awscli boto3
4947
- cd /opt && wget https://aws-genomics-workflows.s3.amazonaws.com/artifacts/aws-ebs-autoscale.tgz && tar -xzf aws-ebs-autoscale.tgz
5048
- sh /opt/ebs-autoscale/bin/init-ebs-autoscale.sh /scratch /dev/sdc 2>&1 > /var/log/init-ebs-autoscale.log
49+
# you can add more commands here if you have additional provisioning steps
5150
5251
--==BOUNDARY==--
5352
```
@@ -58,23 +57,113 @@ If you want this volume to be larger initially, you can specify a bigger one
5857
mapped to `/dev/sdc` the Launch Template.
5958

6059
!!! note
61-
The mount point is specific to what orchestration method / engine you intend
62-
to use. `/scratch` is considered the default for AWS Step Functions. If you
63-
are using a 3rd party workflow orchestration engine this mount point will need
64-
to be adjusted to fit that engine's expectations.
60+
The mount point is specific to what orchestration method / engine you intend to use. `/scratch` is considered a generic default. If you are using a 3rd party workflow orchestration engine this mount point will need to be adjusted to fit that engine's expectations.
61+
62+
Also note that the script has MIME multi-part boundaries. This is because AWS Batch will combind this script with others that it uses to provision instances.
63+
64+
## Creating an EC2 Launch Template
65+
66+
Instructions on how to create a launch template are below. Once your Launch Template is created, you can reference it when you setup resources in AWS Batch to ensure that jobs run therein have your customizations available
67+
to them.
68+
69+
### Automated via CloudFormation
6570

6671
You can use the following CloudFormation template to create a Launch Template
6772
suitable for your needs.
6873

6974
| Name | Description | Source | Launch Stack |
7075
| -- | -- | :--: | :--: |
71-
{{ cfn_stack_row("EC2 Launch Template", "GenomicsWorkflow-LT", "aws-genomics-launch-template.template.yaml", "Creates an EC2 Launch Template that provisions instances on first boot for processing genomics workflow tasks.") }}
76+
{{ cfn_stack_row("EC2 Launch Template", "GWFCore-LT", "aws-genomics-launch-template.template.yaml", "Creates an EC2 Launch Template that provisions instances on first boot for processing genomics workflow tasks.") }}
77+
78+
### Manually via the AWS CLI
79+
80+
In most cases, EC2 Launch Templates can be created using the AWS EC2 Console.
81+
For this case, we need to use the AWS CLI.
82+
83+
Create a file named `launch-template-data.json` with the following contents:
84+
85+
```json
86+
{
87+
"TagSpecifications": [
88+
{
89+
"ResourceType": "instance",
90+
"Tags": [
91+
{
92+
"Key": "architecture",
93+
"Value": "genomics-workflow"
94+
},
95+
{
96+
"Key": "solution",
97+
"Value": "nextflow"
98+
}
99+
]
100+
}
101+
],
102+
"BlockDeviceMappings": [
103+
{
104+
"Ebs": {
105+
"DeleteOnTermination": true,
106+
"VolumeSize": 50,
107+
"VolumeType": "gp2"
108+
},
109+
"DeviceName": "/dev/xvda"
110+
},
111+
{
112+
"Ebs": {
113+
"Encrypted": true,
114+
"DeleteOnTermination": true,
115+
"VolumeSize": 75,
116+
"VolumeType": "gp2"
117+
},
118+
"DeviceName": "/dev/xvdcz"
119+
},
120+
{
121+
"Ebs": {
122+
"Encrypted": true,
123+
"DeleteOnTermination": true,
124+
"VolumeSize": 20,
125+
"VolumeType": "gp2"
126+
},
127+
"DeviceName": "/dev/sdc"
128+
}
129+
],
130+
"UserData": "...base64-encoded-string..."
131+
}
132+
```
72133

73-
Once your Launch Template is created, you can reference it when you setup resources
74-
in AWS Batch to ensure that jobs run therein have your customizations available
75-
to them.
134+
The above template will create an instance with three attached EBS volumes.
135+
136+
* `/dev/xvda`: will be used for the root volume
137+
* `/dev/xvdcz`: will be used for the docker metadata volume
138+
* `/dev/sdc`: will be the initial volume use for scratch space (more on this below)
76139

77-
## Custom AMI
140+
The `UserData` value should be the `base64` encoded version of the UserData script used to provision instances.
141+
142+
Use the command below to create the corresponding launch template:
143+
144+
```bash
145+
aws ec2 \
146+
create-launch-template \
147+
--launch-template-name genomics-workflow-template \
148+
--launch-template-data file://launch-template-data.json
149+
```
150+
151+
You should get something like the following as a response:
152+
153+
```json
154+
{
155+
"LaunchTemplate": {
156+
"LatestVersionNumber": 1,
157+
"LaunchTemplateId": "lt-0123456789abcdef0",
158+
"LaunchTemplateName": "genomics-workflow-template",
159+
"DefaultVersionNumber": 1,
160+
"CreatedBy": "arn:aws:iam::123456789012:user/alice",
161+
"CreateTime": "2019-01-01T00:00:00.000Z"
162+
}
163+
}
164+
```
165+
166+
## Custom AMIs
78167

79168
A slightly more involved method for customizing an instance is
80169
to create a new AMI based on the ECS Optimized AMI. This is good if you have
@@ -83,14 +172,5 @@ datasets preloaded that will be needed by all your jobs.
83172

84173
You can learn more about how to [create your own AMIs in the EC2 userguide](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AMIs.html).
85174

86-
The CloudFormation template below automates the tasks needed to create an AMI and should take about 10-15min to complete.
87-
88-
| Name | Description | Source | Launch Stack |
89-
| -- | -- | :--: | :--: |
90-
{{ cfn_stack_row("Custom AMI (Existing VPC)", "GenomicsWorkflow-AMI", "deprecated/aws-genomics-ami.template.yaml", "Creates a custom AMI that EC2 instances can be based on for processing genomics workflow tasks. The creation process will happen in a VPC you specify") }}
91-
92-
Once your AMI is created, you will need to jot down its unique AMI Id. You will
93-
need this when creating compute resources in AWS Batch.
94-
95175
!!! note
96176
This is considered advanced use. All documentation and CloudFormation templates hereon assumes use of EC2 Launch Templates.

0 commit comments

Comments
 (0)