Skip to content

Commit f1f697e

Browse files
sabrahaxkaiz-ioMichael Kaiser
authored
Fix(typescript/codepipeline-glue-deploy): Revamped a dated prescriptive guidance CDK code (#1025)
* Add files via upload * Delete typescript/codepipeline-glue-deploy/jest.config.js * Delete typescript/codepipeline-glue-deploy/test directory * Fix: Remove Jest * Fix: Remove Jest * Optimize * Chore: Update comments * Refactor: Use grant methods and auto role creation where possible * Fix: Naming in script cdk apps not dirs --------- Co-authored-by: Michael Kaiser <[email protected]> Co-authored-by: Michael Kaiser <[email protected]> Co-authored-by: Michael Kaiser <[email protected]>
1 parent 9d8bbff commit f1f697e

File tree

11 files changed

+4731
-1
lines changed

11 files changed

+4731
-1
lines changed

.github/workflows/build-pull-request.yml

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -98,10 +98,12 @@ jobs:
9898
echo "- $dir"
9999
done
100100
101+
# Change to language directory
102+
cd ./${{ matrix.language }}
103+
101104
# install CDK CLI from npm if not typescript, so that npx can find it later
102105
# ts will use the one from the particular cdk app
103106
if [[ ${{ matrix.language }} != 'typescript' ]]; then
104-
cd ./${{ matrix.language }}
105107
npm install -g aws-cdk
106108
fi
107109
Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
# AWS Glue job with an AWS CodePipeline CI/CD pipeline
2+
3+
## How to Run CDK TypeScript project
4+
5+
The `cdk.json` file tells the CDK Toolkit how to execute your app.
6+
7+
### Useful commands
8+
9+
* `aws configure` configure access to your AWS account
10+
* `npm run watch` watch for changes and compile
11+
* `npm run test` perform the jest unit tests
12+
* `cdk deploy --parameters glueJob="Glue Job Name"` deploy this stack to your default AWS account/region
13+
* `cdk diff` compare deployed stack with current state
14+
* `cdk synth` emits the synthesized CloudFormation template
15+
16+
## About Pattern
17+
18+
This pattern demonstrates how you can integrate Amazon Web Services (AWS) CodeCommit and AWS CodePipeline with AWS Glue, and use AWS Lambda to launch jobs as soon as a developer pushes their changes to a remote AWS CodeCommit repository.
19+
20+
When a developer submits a change to an extract, transform, and load (ETL) repository and pushes the changes to AWS CodeCommit, a new pipeline is invoked. The pipeline initiates a Lambda function that launches an AWS Glue job with these changes. The AWS Glue job performs the ETL task.
21+
22+
This solution is helpful in the situation where businesses, developers, and data engineers want to launch jobs as soon as changes are committed and pushed to the target repositories. It helps achieve a higher level of automation and reproducibility, therefore avoiding errors during the job launch and lifecycle.
23+
24+
![alt text](image.png)
25+
26+
The process consists of these steps:
27+
28+
1. The developer or data engineer makes a modification in the ETL code, commits, and pushes the change to AWS CodeCommit.
29+
2. The push initiates the pipeline.
30+
3. The pipeline initiates a Lambda function, which calls codecommit:GetFile on the repository and uploads the file to Amazon Simple Storage Service (Amazon S3).
31+
4. The Lambda function launches a new AWS Glue job with the ETL code.
32+
5. The Lambda function finishes the pipeline.
33+
34+
### Automation and scale
35+
36+
The sample attachment demonstrates how you can integrate AWS Glue with AWS CodePipeline. It provides a baseline example that you can customize or extend for your own use.
37+
38+
### References
39+
* [Adding jobs in AWS Glue](https://docs.aws.amazon.com/glue/latest/dg/add-job.html)
40+
* [Invoke an AWS Lambda function in CodePipeline](https://docs.aws.amazon.com/codepipeline/latest/userguide/actions-invoke-lambda-function.html)
41+
* [Source action integrations in CodePipeline](https://docs.aws.amazon.com/codepipeline/latest/userguide/integrations-action-type.html#integrations-source)
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
#!/usr/bin/env node
2+
import { App } from 'aws-cdk-lib';
3+
import { CodepipelineGlueDeployStack } from '../lib/codepipeline-glue-deploy-stack';
4+
5+
const app = new App();
6+
new CodepipelineGlueDeployStack(app, 'CodepipelineGlueDeploy', {});
Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,57 @@
1+
{
2+
"app": "npx ts-node --prefer-ts-exts bin/app.ts",
3+
"watch": {
4+
"include": [
5+
"**"
6+
],
7+
"exclude": [
8+
"README.md",
9+
"cdk*.json",
10+
"**/*.d.ts",
11+
"**/*.js",
12+
"tsconfig.json",
13+
"package*.json",
14+
"yarn.lock",
15+
"node_modules",
16+
"test"
17+
]
18+
},
19+
"context": {
20+
"@aws-cdk/aws-lambda:recognizeLayerVersion": true,
21+
"@aws-cdk/core:checkSecretUsage": true,
22+
"@aws-cdk/core:target-partitions": [
23+
"aws",
24+
"aws-cn"
25+
],
26+
"@aws-cdk-containers/ecs-service-extensions:enableDefaultLogDriver": true,
27+
"@aws-cdk/aws-ec2:uniqueImdsv2TemplateName": true,
28+
"@aws-cdk/aws-ecs:arnFormatIncludesClusterName": true,
29+
"@aws-cdk/aws-iam:minimizePolicies": true,
30+
"@aws-cdk/core:validateSnapshotRemovalPolicy": true,
31+
"@aws-cdk/aws-codepipeline:crossAccountKeyAliasStackSafeResourceName": true,
32+
"@aws-cdk/aws-s3:createDefaultLoggingPolicy": true,
33+
"@aws-cdk/aws-sns-subscriptions:restrictSqsDescryption": true,
34+
"@aws-cdk/aws-apigateway:disableCloudWatchRole": true,
35+
"@aws-cdk/core:enablePartitionLiterals": true,
36+
"@aws-cdk/aws-events:eventsTargetQueueSameAccount": true,
37+
"@aws-cdk/aws-iam:standardizedServicePrincipals": true,
38+
"@aws-cdk/aws-ecs:disableExplicitDeploymentControllerForCircuitBreaker": true,
39+
"@aws-cdk/aws-iam:importedRoleStackSafeDefaultPolicyName": true,
40+
"@aws-cdk/aws-s3:serverAccessLogsUseBucketPolicy": true,
41+
"@aws-cdk/aws-route53-patters:useCertificate": true,
42+
"@aws-cdk/customresources:installLatestAwsSdkDefault": false,
43+
"@aws-cdk/aws-rds:databaseProxyUniqueResourceName": true,
44+
"@aws-cdk/aws-codedeploy:removeAlarmsFromDeploymentGroup": true,
45+
"@aws-cdk/aws-apigateway:authorizerChangeDeploymentLogicalId": true,
46+
"@aws-cdk/aws-ec2:launchTemplateDefaultUserData": true,
47+
"@aws-cdk/aws-secretsmanager:useAttachedSecretResourcePolicyForSecretTargetAttachments": true,
48+
"@aws-cdk/aws-redshift:columnId": true,
49+
"@aws-cdk/aws-stepfunctions-tasks:enableEmrServicePolicyV2": true,
50+
"@aws-cdk/aws-ec2:restrictDefaultSecurityGroup": true,
51+
"@aws-cdk/aws-apigateway:requestValidatorUniqueId": true,
52+
"@aws-cdk/aws-kms:aliasNameRef": true,
53+
"@aws-cdk/aws-autoscaling:generateLaunchTemplateInsteadOfLaunchConfig": true,
54+
"@aws-cdk/core:includePrefixInUniqueNameGeneration": true,
55+
"@aws-cdk/aws-opensearchservice:enableOpensearchMultiAzWithStandby": true
56+
}
57+
}
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
from awsglue.context import GlueContext
2+
from awsglue.transforms import *
3+
from pyspark.context import SparkContext
4+
5+
glueContext = GlueContext(SparkContext.getOrCreate())
6+
7+
print('glueContext:', glueContext)
64.6 KB
Loading
Lines changed: 105 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,105 @@
1+
import json
2+
import os
3+
import base64
4+
from os.path import join
5+
6+
import boto3
7+
8+
s3 = boto3.client('s3')
9+
glue = boto3.client('glue')
10+
pipeline = boto3.client('codepipeline')
11+
codecommit = boto3.client('codecommit')
12+
13+
14+
def lambda_handler(event, context):
15+
# Extract relevant information from the CodePipeline event
16+
job = event['CodePipeline.job']
17+
try:
18+
data = job['data']
19+
config = data['actionConfiguration']['configuration']
20+
user_params = json.loads(config['UserParameters'])
21+
22+
print(json.dumps(event))
23+
24+
input_artifacts = data['inputArtifacts']
25+
source_code_artifact = input_artifacts[0]
26+
27+
# Get the S3 bucket and key for the source code artifact
28+
artifact_bucket = source_code_artifact['location']['s3Location']['bucketName']
29+
artifact_key = source_code_artifact['location']['s3Location']['objectKey']
30+
filename = os.getenv('FILENAME')
31+
file_key = join(artifact_key, filename)
32+
commit_id = source_code_artifact['revision']
33+
34+
repository_name = os.getenv('REPOSITORY_NAME')
35+
print('repository_name', repository_name)
36+
37+
# Retrieve the file content from CodeCommit
38+
codecommit_resp = codecommit.get_file(
39+
repositoryName=repository_name,
40+
commitSpecifier=commit_id,
41+
filePath=filename
42+
)
43+
print('codecommit_resp', codecommit_resp)
44+
45+
# Upload the file to S3
46+
s3_resp = s3.put_object(
47+
Bucket=artifact_bucket,
48+
Key=file_key,
49+
Body=codecommit_resp['fileContent']
50+
)
51+
print('s3_resp', s3_resp)
52+
53+
# Check the S3 upload status
54+
s3_status_code = s3_resp['ResponseMetadata']['HTTPStatusCode']
55+
if s3_status_code != 200:
56+
raise Exception(f'Failed to send file to S3. StatusCode={s3_status_code}')
57+
58+
# Construct the S3 script location
59+
s3_script_location = f's3://{artifact_bucket}/{file_key}'
60+
61+
# Construct the Glue job name
62+
glue_job_name_id = artifact_key.split('/')[-1:][0]
63+
glue_job_name = f'{user_params["glue_job_name"]}_{glue_job_name_id}'
64+
print('glue_job_named:', glue_job_name)
65+
66+
# Set additional Glue job arguments if provided
67+
default_arguments = {}
68+
if 'additional_python_modules' in user_params:
69+
default_arguments['--additional-python-modules'] = user_params['additional_python_modules']
70+
71+
# Create the Glue job
72+
create_job_resp = glue.create_job(
73+
Name=glue_job_name,
74+
Role=user_params['glue_role'],
75+
Command={
76+
'Name': 'glueetl',
77+
'ScriptLocation': s3_script_location
78+
},
79+
DefaultArguments=default_arguments,
80+
GlueVersion='4.0'
81+
)
82+
print('create_job_resp:', create_job_resp)
83+
84+
# Start the Glue job run
85+
start_job_run_resp = glue.start_job_run(
86+
JobName=create_job_resp['Name'],
87+
Arguments={
88+
}
89+
)
90+
print('start_job_run_resp:', start_job_run_resp)
91+
92+
# Report the successful job execution to CodePipeline
93+
print('submitting successful job')
94+
pipeline.put_job_success_result(jobId=job['id'])
95+
except Exception as e:
96+
# Report the failed job execution to CodePipeline
97+
print('submitting unsuccessful job: ' + str(e))
98+
pipeline.put_job_failure_result(
99+
jobId=job['id'],
100+
failureDetails={
101+
'type': 'JobFailed',
102+
'message': str(e),
103+
'externalExecutionId': context.aws_request_id
104+
}
105+
)
Lines changed: 145 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,145 @@
1+
import { CfnParameter, Stack, StackProps, RemovalPolicy } from 'aws-cdk-lib';
2+
import * as codecommit from 'aws-cdk-lib/aws-codecommit';
3+
import { Artifact, Pipeline, PipelineType } from 'aws-cdk-lib/aws-codepipeline';
4+
import { Function, Code, Runtime } from 'aws-cdk-lib/aws-lambda';
5+
import { Construct } from 'constructs';
6+
import { CodeCommitSourceAction, LambdaInvokeAction } from 'aws-cdk-lib/aws-codepipeline-actions';
7+
import { Bucket, BucketEncryption } from 'aws-cdk-lib/aws-s3';
8+
import { Key } from 'aws-cdk-lib/aws-kms';
9+
import { Effect, PolicyStatement, Role, ServicePrincipal } from 'aws-cdk-lib/aws-iam';
10+
11+
export class CodepipelineGlueDeployStack extends Stack {
12+
constructor(scope: Construct, id: string, props?: StackProps) {
13+
super(scope, id, props);
14+
15+
// Create a CloudFormation parameter for the Glue job name to use when creating
16+
const glueJob = new CfnParameter(this, 'glueJob', {
17+
type: 'String',
18+
description: 'The name of the Glue job',
19+
});
20+
21+
// Create a CodeCommit repository for the ETL code and upload
22+
// code from the etl directory
23+
const etlRepository = new codecommit.Repository(this, 'EtlRepository', {
24+
repositoryName: 'EtlRepository',
25+
code: codecommit.Code.fromDirectory('etl/'),
26+
description: 'EtlRepository'
27+
});
28+
29+
// Create a KMS key for encrypting the pipeline artifact store
30+
// with key rotation enabled
31+
const pipelineArtifactStoreEncryptionKey = new Key(this, 'pipelineArtifactStoreEncryptionKey', {
32+
removalPolicy: RemovalPolicy.DESTROY,
33+
enableKeyRotation: true
34+
});
35+
36+
// Create an S3 bucket for the pipeline artifact store
37+
// using the encryption key we just created
38+
// with server-side encryption enabled
39+
// and server access logs enabled for the bucket
40+
const pipelineArtifactStoreBucket = new Bucket(this, 'XXXXXXXXXXXXXXXXXXXXXXXXXXX', {
41+
removalPolicy: RemovalPolicy.DESTROY,
42+
encryption: BucketEncryption.KMS,
43+
encryptionKey: pipelineArtifactStoreEncryptionKey,
44+
serverAccessLogsPrefix: 'access-logs',
45+
enforceSSL: true
46+
});
47+
48+
// Create a Glue role so that we can allow lambda to pass
49+
// to glue ETL jobs that it creates
50+
const glueRole = new Role(this, 'GlueRole', {
51+
assumedBy: new ServicePrincipal('glue.amazonaws.com'),
52+
});
53+
54+
// Add the necessary permissions to the Glue role to create and start ETL Jobs
55+
glueRole.addToPrincipalPolicy(
56+
new PolicyStatement({
57+
actions: [
58+
'glue:CreateJob',
59+
'glue:StartJobRun'
60+
],
61+
effect: Effect.ALLOW,
62+
resources: ['*']
63+
})
64+
);
65+
66+
// Grant the Glue role the ability to encrypt and decrypt the pipeline artifact store encryption key
67+
pipelineArtifactStoreEncryptionKey.grantEncryptDecrypt(glueRole)
68+
// Grant the Glue role the ability to read and write to the pipeline artifact store bucket
69+
pipelineArtifactStoreBucket.grantReadWrite(glueRole);
70+
71+
72+
// Create a Lambda function to create Glue jobs based on the files
73+
// in the ETL code repository
74+
const lambda = new Function(this, 'lambda', {
75+
code: Code.fromAsset('lambda_etl_launch'),
76+
handler: 'lambda_etl_launch.lambda_handler',
77+
runtime: Runtime.PYTHON_3_12,
78+
environment: {
79+
'REPOSITORY_NAME': etlRepository.repositoryName,
80+
'FILENAME': 'etl.py'
81+
}
82+
});
83+
84+
// Add the necessary permissions to the Lambda role to pass the glue role
85+
lambda.role?.addToPrincipalPolicy(
86+
new PolicyStatement({
87+
actions: [
88+
'iam:PassRole'
89+
],
90+
effect: Effect.ALLOW,
91+
resources: [glueRole.roleArn]
92+
})
93+
);
94+
// Add the necessary permissions to the Lambda role to read and write from the pipeline artifact store
95+
pipelineArtifactStoreBucket.grantReadWrite(lambda.role!);
96+
pipelineArtifactStoreEncryptionKey.grantEncryptDecrypt(lambda.role!);
97+
98+
// Create a pipeline artifact store
99+
const pipelineArtifactStore = new Artifact();
100+
101+
// Create a CodePipeline pipeline
102+
const pipeline = new Pipeline(this, 'Pipeline', {
103+
pipelineName: 'pipeline',
104+
artifactBucket: pipelineArtifactStoreBucket,
105+
enableKeyRotation: true,
106+
pipelineType: PipelineType.V2,
107+
stages: [
108+
{
109+
stageName: 'Source',
110+
actions: [
111+
new CodeCommitSourceAction({
112+
actionName: 'Source',
113+
repository: etlRepository,
114+
branch: 'main',
115+
output: pipelineArtifactStore,
116+
})
117+
]
118+
},
119+
{
120+
stageName: 'Deploy',
121+
actions: [
122+
new LambdaInvokeAction({
123+
actionName: 'Deploy',
124+
lambda: lambda,
125+
inputs: [pipelineArtifactStore],
126+
userParameters: {
127+
glue_job_name: glueJob.valueAsString,
128+
glue_role: glueRole.roleName
129+
}
130+
})
131+
]
132+
},
133+
]
134+
});
135+
136+
// Grant the pipeline role the ability to pull from the ETL repository
137+
etlRepository.grantPull(pipeline.role);
138+
// Grant the pipeline role the ability to invoke the Lambda function
139+
lambda.grantInvoke(pipeline.role);
140+
// Grant the pipeline role the ability to encrypt and decrypt the pipeline artifact store encryption key
141+
pipelineArtifactStoreEncryptionKey.grantEncryptDecrypt(pipeline.role);
142+
143+
144+
}
145+
}

0 commit comments

Comments
 (0)