You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Aug 9, 2023. It is now read-only.
Genomics is a data-heavy workload and requires some modification to the defaults
4
-
used for batch job processing. In particular, instances running the Tasks/Jobs
5
-
need scalable storage to meet unpredictable runtime demands.
4
+
used by AWS Batch for job processing. To efficiently use resources, AWS Batch places multiple jobs on an worker instance. The data requirements for individual jobs can range from a few MB to 100s of GB. Instances running workflow jobs will not know beforehand how much space is required, and need scalable storage to meet unpredictable runtime demands.
6
5
7
-
By default, AWS Batch relies upon the [Amazon ECS-Optimized AMI](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-optimized_AMI.html)
8
-
to launch container instances for running jobs. This is sufficient in most cases, but specialized needs, such as the large
9
-
storage requirements noted above, require customization of the base AMI.
10
-
11
-
This section provides two methods for customizing the base ECS-Optimized AMI
12
-
that adds an expandable working directory for jobs to write data.
13
-
A process will monitor the directory and add more EBS volumes on the fly to expand the free space
14
-
based on the capacity threshold, like so:
6
+
To handle this use case, we can use a process that monitors a scratch directory on an instance and expands free space as needed based on capacity thresholds. This can be done using logical volume management and attaching EBS volumes as needed to the instance like so:
The above process - "EBS autoscaling" - requires a few small dependencies and a simple daemon installed on the host instance.
11
+
12
+
By default, AWS Batch uses the [Amazon ECS-Optimized AMI](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-optimized_AMI.html)
13
+
to launch instances for running jobs. This is sufficient in most cases, but specialized needs, such as the large storage requirements noted above, require customization of the base AMI. Because the provisioning requirements for EBS autoscaling are fairly simple and light weight, one can use an EC2 Launch Template to customize instances.
14
+
18
15
## EC2 Launch Template
19
16
20
17
The simplest method for customizing an instance is to use an EC2 Launch Template.
@@ -43,11 +40,13 @@ packages:
43
40
- python27-pip
44
41
- sed
45
42
- wget
43
+
# add more package names here if you need them
46
44
47
45
runcmd:
48
46
- pip install -U awscli boto3
49
47
- cd /opt && wget https://aws-genomics-workflows.s3.amazonaws.com/artifacts/aws-ebs-autoscale.tgz && tar -xzf aws-ebs-autoscale.tgz
50
48
- sh /opt/ebs-autoscale/bin/init-ebs-autoscale.sh /scratch /dev/sdc 2>&1 > /var/log/init-ebs-autoscale.log
49
+
# you can add more commands here if you have additional provisioning steps
51
50
52
51
--==BOUNDARY==--
53
52
```
@@ -58,23 +57,113 @@ If you want this volume to be larger initially, you can specify a bigger one
58
57
mapped to `/dev/sdc` the Launch Template.
59
58
60
59
!!! note
61
-
The mount point is specific to what orchestration method / engine you intend
62
-
to use. `/scratch` is considered the default for AWS Step Functions. If you
63
-
are using a 3rd party workflow orchestration engine this mount point will need
64
-
to be adjusted to fit that engine's expectations.
60
+
The mount point is specific to what orchestration method / engine you intend to use. `/scratch` is considered a generic default. If you are using a 3rd party workflow orchestration engine this mount point will need to be adjusted to fit that engine's expectations.
61
+
62
+
Also note that the script has MIME multi-part boundaries. This is because AWS Batch will combind this script with others that it uses to provision instances.
63
+
64
+
## Creating an EC2 Launch Template
65
+
66
+
Instructions on how to create a launch template are below. Once your Launch Template is created, you can reference it when you setup resources in AWS Batch to ensure that jobs run therein have your customizations available
67
+
to them.
68
+
69
+
### Automated via CloudFormation
65
70
66
71
You can use the following CloudFormation template to create a Launch Template
67
72
suitable for your needs.
68
73
69
74
| Name | Description | Source | Launch Stack |
70
75
| -- | -- | :--: | :--: |
71
-
{{ cfn_stack_row("EC2 Launch Template", "GenomicsWorkflow-LT", "aws-genomics-launch-template.template.yaml", "Creates an EC2 Launch Template that provisions instances on first boot for processing genomics workflow tasks.") }}
76
+
{{ cfn_stack_row("EC2 Launch Template", "GWFCore-LT", "aws-genomics-launch-template.template.yaml", "Creates an EC2 Launch Template that provisions instances on first boot for processing genomics workflow tasks.") }}
77
+
78
+
### Manually via the AWS CLI
79
+
80
+
In most cases, EC2 Launch Templates can be created using the AWS EC2 Console.
81
+
For this case, we need to use the AWS CLI.
82
+
83
+
Create a file named `launch-template-data.json` with the following contents:
84
+
85
+
```json
86
+
{
87
+
"TagSpecifications": [
88
+
{
89
+
"ResourceType": "instance",
90
+
"Tags": [
91
+
{
92
+
"Key": "architecture",
93
+
"Value": "genomics-workflow"
94
+
},
95
+
{
96
+
"Key": "solution",
97
+
"Value": "nextflow"
98
+
}
99
+
]
100
+
}
101
+
],
102
+
"BlockDeviceMappings": [
103
+
{
104
+
"Ebs": {
105
+
"DeleteOnTermination": true,
106
+
"VolumeSize": 50,
107
+
"VolumeType": "gp2"
108
+
},
109
+
"DeviceName": "/dev/xvda"
110
+
},
111
+
{
112
+
"Ebs": {
113
+
"Encrypted": true,
114
+
"DeleteOnTermination": true,
115
+
"VolumeSize": 75,
116
+
"VolumeType": "gp2"
117
+
},
118
+
"DeviceName": "/dev/xvdcz"
119
+
},
120
+
{
121
+
"Ebs": {
122
+
"Encrypted": true,
123
+
"DeleteOnTermination": true,
124
+
"VolumeSize": 20,
125
+
"VolumeType": "gp2"
126
+
},
127
+
"DeviceName": "/dev/sdc"
128
+
}
129
+
],
130
+
"UserData": "...base64-encoded-string..."
131
+
}
132
+
```
72
133
73
-
Once your Launch Template is created, you can reference it when you setup resources
74
-
in AWS Batch to ensure that jobs run therein have your customizations available
75
-
to them.
134
+
The above template will create an instance with three attached EBS volumes.
135
+
136
+
*`/dev/xvda`: will be used for the root volume
137
+
*`/dev/xvdcz`: will be used for the docker metadata volume
138
+
*`/dev/sdc`: will be the initial volume use for scratch space (more on this below)
76
139
77
-
## Custom AMI
140
+
The `UserData` value should be the `base64` encoded version of the UserData script used to provision instances.
141
+
142
+
Use the command below to create the corresponding launch template:
A slightly more involved method for customizing an instance is
80
169
to create a new AMI based on the ECS Optimized AMI. This is good if you have
@@ -83,14 +172,5 @@ datasets preloaded that will be needed by all your jobs.
83
172
84
173
You can learn more about how to [create your own AMIs in the EC2 userguide](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AMIs.html).
85
174
86
-
The CloudFormation template below automates the tasks needed to create an AMI and should take about 10-15min to complete.
87
-
88
-
| Name | Description | Source | Launch Stack |
89
-
| -- | -- | :--: | :--: |
90
-
{{ cfn_stack_row("Custom AMI (Existing VPC)", "GenomicsWorkflow-AMI", "deprecated/aws-genomics-ami.template.yaml", "Creates a custom AMI that EC2 instances can be based on for processing genomics workflow tasks. The creation process will happen in a VPC you specify") }}
91
-
92
-
Once your AMI is created, you will need to jot down its unique AMI Id. You will
93
-
need this when creating compute resources in AWS Batch.
94
-
95
175
!!! note
96
176
This is considered advanced use. All documentation and CloudFormation templates hereon assumes use of EC2 Launch Templates.
0 commit comments