|
| 1 | +# Genomics Workflows on AWS - CDK code |
| 2 | + |
| 3 | +Contained herein is a CDK application for creating AWS resources for working |
| 4 | +with large-scale biomedical data - e.g. genomics. |
| 5 | + |
| 6 | +In order to deploy this CDK application, you'll need an environment with AWS |
| 7 | +CLI access and AWS CDK installed. A quick way to get an environment for running |
| 8 | +this application is to launch [AWS Cloud9](https://aws.amazon.com/cloud9/). |
| 9 | + |
| 10 | +AWS Cloud9 is a cloud-based integrated development environment (IDE) that lets |
| 11 | +you write, run, and debug your code with just a browser. It includes a code |
| 12 | +editor, debugger, and terminal. Cloud9 comes prepackaged with essential |
| 13 | +tools for popular programming languages, including JavaScript, Python, PHP, and |
| 14 | +more, so you don’t need to install files or configure your development machine |
| 15 | +to start new projects. |
| 16 | + |
| 17 | + |
| 18 | +## Download |
| 19 | + |
| 20 | +Clone the repo to your local environment / Cloud9 environment. |
| 21 | +``` |
| 22 | +git clone https://github.com/aws-samples/aws-genomics-workflows.git |
| 23 | +``` |
| 24 | + |
| 25 | +## Configure |
| 26 | + |
| 27 | +This CDK application requires an S3 bucket and a VPC. The application can |
| 28 | +create them as part of the deployment or you could configure the application to |
| 29 | +use your own S3 bucket and/or existing VPC. |
| 30 | + |
| 31 | +After cloning the repo, open, update, and save the application configuration |
| 32 | +file - `app.config.json`. |
| 33 | + |
| 34 | +**accountID** - Your |
| 35 | +[AWS account id](https://docs.aws.amazon.com/IAM/latest/UserGuide/console_account-alias.html). |
| 36 | +**region** - The |
| 37 | +[AWS region](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-regions-availability-zones.html) |
| 38 | +you want to use for the deployment (e.g., us-east-1, us-west-2, etc.). |
| 39 | +**projectName** - A name for the project that will be used as a prefix for the |
| 40 | +CDK stacks and constrcuts. |
| 41 | +**tags** - A list of key,value strings to use as tags for the AWS resources |
| 42 | +created by this app. |
| 43 | +**S3.existingBucket** - If you want to use an existing bucket, set this value |
| 44 | +to true, otherwise set it to false to create a new bucket. |
| 45 | +**S3.bucketName** - The bucket name to use or create. |
| 46 | +**VPC.createVPC** - If you want to create a new VPC, set this to true, |
| 47 | +otherwise set to false. |
| 48 | +**VPC.VPCName** - The VPC name to use a create. |
| 49 | +**VPC.maxAZs** - The amount of availability zones to use when creating a new |
| 50 | +VPC. |
| 51 | +**VPC.cidr** - The |
| 52 | +[CIDR block](https://en.wikipedia.org/wiki/Classless_Inter-Domain_Routing) for |
| 53 | +the new VPC. |
| 54 | +**VPC.cidrMask** - The |
| 55 | +[CIDR block subnet mask](https://en.wikipedia.org/wiki/Classless_Inter-Domain_Routing#Subnet_masks) |
| 56 | +for the new VPC. |
| 57 | +**batch.defaultVolumeSize** - The default EBS volume size in GiB to be attached |
| 58 | +to the EC2 instance under AWS Batch. |
| 59 | +**batch.spotMaxVCPUs** - The limit on vcpus when using |
| 60 | +[spot instances](https://aws.amazon.com/ec2/spot/). |
| 61 | +**batch.onDemendMaxVCPUs** - The limit on vcpus when using on-demand instances. |
| 62 | +**batch.instanceTypes** - The |
| 63 | +[EC2 instance types](https://aws.amazon.com/ec2/instance-types/) to use in |
| 64 | +AWS Batch. |
| 65 | +**workflows** - A list of workflows that you would like to launch. There are |
| 66 | +demo workflows under the `lib/workflows` directory. To add a workflow, update |
| 67 | +the code in the `lib/aws-genomics-cdk-stack.ts` file. Look for the workflows |
| 68 | +section. |
| 69 | + |
| 70 | +``` |
| 71 | +{ |
| 72 | + "accountID": "111111111111", |
| 73 | + "region": "us-west-2", |
| 74 | + "projectName": "genomics", |
| 75 | + "tags": [{ |
| 76 | + "name": "Environment", |
| 77 | + "value": "production" |
| 78 | + }, |
| 79 | + { |
| 80 | + "name": "Project", |
| 81 | + "value": "genomics-pipeline" |
| 82 | + } |
| 83 | + ] |
| 84 | + "S3": { |
| 85 | + "existingBucket": true, |
| 86 | + "bucketName": "YOUR-BUCKET-NAME" |
| 87 | + }, |
| 88 | + "VPC": { |
| 89 | + "createVPC": true, |
| 90 | + "VPCName": "genomics-vpc", |
| 91 | + "maxAZs": 2, |
| 92 | + "cidr": "10.0.0.0/16", |
| 93 | + "cidrMask": 24 |
| 94 | + }, |
| 95 | + "batch": { |
| 96 | + "defaultVolumeSize": 100, |
| 97 | + "spotMaxVCPUs": 128, |
| 98 | + "onDemendMaxVCPUs": 128, |
| 99 | + "instanceTypes": [ |
| 100 | + "c4.large", |
| 101 | + "c4.xlarge", |
| 102 | + "c4.2xlarge", |
| 103 | + "c4.4xlarge", |
| 104 | + "c4.8xlarge", |
| 105 | + "c5.large", |
| 106 | + "c5.xlarge", |
| 107 | + "c5.2xlarge", |
| 108 | + "c5.4xlarge", |
| 109 | + "c5.9xlarge", |
| 110 | + "c5.12xlarge", |
| 111 | + "c5.18xlarge", |
| 112 | + "c5.24xlarge" |
| 113 | + ] |
| 114 | + }, |
| 115 | + "workflows": [{ |
| 116 | + "name": "variantCalling", |
| 117 | + "spot": true |
| 118 | + }] |
| 119 | +} |
| 120 | +``` |
| 121 | + |
| 122 | +## Deploy |
| 123 | + |
| 124 | +To deploy the CDK application, use the command line and make sure you are in |
| 125 | +the root folder of the CDK application (`src/aws-genomics-cdk`). |
| 126 | +First install the neccessary node.js modules |
| 127 | +``` |
| 128 | +npm install |
| 129 | +``` |
| 130 | + |
| 131 | +Then deploy the application. |
| 132 | +``` |
| 133 | +# The "--require-approval never" parameter will skip the question to approve |
| 134 | +# specific resouce creation, such as IAM roles. You can remove this parameter |
| 135 | +# if you want to be prompted to approve creating these resources. |
| 136 | +cdk deploy --all --require-approval never |
| 137 | +``` |
| 138 | + |
| 139 | + |
| 140 | +## Stacks |
| 141 | + |
| 142 | +| File | Description | |
| 143 | +| :--- | :---------- | |
| 144 | +| `lib/aws-genomics-cdk-stack.ts` | The main stack that initialize the rest of the stacks | |
| 145 | +| `lib/vpc/vpc-stack.ts` | An optional stack that will launch a VPC | |
| 146 | +| `lib/batch/batch-stack.ts` | An AWS Batch stack with 2 comnpute environments (spot and on demand) and 2 queues (default and high priority) | |
| 147 | +| `lib/batch/batch-iam-stack.ts` | An IAM stack with roles and policies required for running AWS Batch | |
| 148 | +| `llib/workflows` | A folder containing pipeline stacks | |
| 149 | + |
| 150 | + |
| 151 | +## Constructs |
| 152 | + |
| 153 | +| File | Description | |
| 154 | +| :--- | :---------- | |
| 155 | +| `lib/batch/batch-compute-environmnet-construct.ts` | A construct for creating an [AWS Batch compute environment](https://docs.aws.amazon.com/batch/latest/userguide/compute_environments.html) | |
| 156 | +| `lib/batch/job-queue-construct.ts` | A construct for creating an [AWS Batch job queue](https://docs.aws.amazon.com/batch/latest/userguide/job_queues.html) | |
| 157 | +| `lib/batch/launch-template-construct.ts` | A construct for creating an [EC2 launch template](https://docs.aws.amazon.com/autoscaling/ec2/userguide/LaunchTemplates.html) | |
| 158 | +| `lib/workflows/genomics-task-construct.ts` | A construct for creating a step function task that submits a batch job | |
| 159 | +| `lib/workflows/job-definition-construct.ts` | A construct for creating an [AWS Batch job definition](https://docs.aws.amazon.com/batch/latest/userguide/job_definitions.html) to be used as a task in step functions | |
| 160 | + |
0 commit comments