-
Notifications
You must be signed in to change notification settings - Fork 62
DRAFT control-service: AWS Code commit integration #3304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 5 commits
3ed9812
15d4c95
eb1d5df
22d3f77
a0d13f4
e074075
8bef395
505fb09
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,33 @@ | ||
| # Used to trigger a build for a data job image. | ||
|
|
||
| FROM gcr.io/kaniko-project/executor | ||
|
|
||
| FROM alpine | ||
|
|
||
| COPY --from=0 /kaniko /kaniko | ||
|
|
||
|
|
||
| ENV PATH $PATH:/kaniko | ||
| ENV SSL_CERT_DIR=/kaniko/ssl/certs | ||
| ENV DOCKER_CONFIG /kaniko/.docker/ | ||
|
|
||
| WORKDIR /workspace | ||
|
|
||
| COPY Dockerfile.python.vdk /workspace/Dockerfile | ||
| COPY build_image.sh /build_image.sh | ||
| RUN chmod +x /build_image.sh | ||
|
|
||
|
|
||
| # Setup Python and Git | ||
| ## Update & Install dependencies | ||
| RUN apk add --no-cache --update \ | ||
| git \ | ||
| bash | ||
|
|
||
| RUN apk add --no-cache --repository http://dl-cdn.alpinelinux.org/alpine/v3.10/main python3=3.7.10-r0 py3-pip \ | ||
| && pip3 install awscli \ | ||
| && pip3 install git-remote-codecommit \ | ||
| && apk --purge -v del py3-pip \ | ||
| && rm -rf /var/cache/apk/* | ||
|
|
||
| ENTRYPOINT ["/build_image.sh"] |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,30 @@ | ||
| # https://docs.docker.com/develop/develop-images/dockerfile_best-practices | ||
|
|
||
| ARG base_image=python:3.9-slim | ||
|
|
||
| FROM $base_image | ||
|
|
||
| ARG UID=1000 | ||
| ARG GID=1000 | ||
|
|
||
| # Set the working directory | ||
| WORKDIR /job | ||
|
|
||
| # Create necessary users and set home directory to /job | ||
| RUN groupadd -r -g $GID group && useradd -u $UID -g $GID -r user && chown -R $UID:$GID /job | ||
| ENV HOME=/job | ||
|
|
||
| # Copy the actual job that has to be executed | ||
| ARG job_name | ||
| COPY --chown=$UID:$GID $job_name $job_name/ | ||
|
|
||
| # TODO: this would trigger for any change in job even if requirements.txt does not change | ||
| # but there's no COPY_IF_EXISTS command in docker to try copy it. | ||
| ARG requirements_file=requirements.txt | ||
| RUN if [ -f "$job_name/$requirements_file" ]; then pip3 install --no-cache-dir --disable-pip-version-check -q -r "$job_name/$requirements_file" || ( echo ">requirements_failed<" && exit 1 ) ; fi | ||
|
|
||
| ARG job_githash | ||
| ENV JOB_NAME $job_name | ||
| ENV VDK_JOB_GITHASH $job_githash | ||
|
|
||
| USER $UID |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| This package provides a way to configure and build your own Data Job images. |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,84 @@ | ||
| #!/bin/sh | ||
| # Copyright 2023-2024 Broadcom | ||
| # SPDX-License-Identifier: Apache-2.0 | ||
|
|
||
| # Copyright 2021-2023 VMware, Inc. | ||
| # SPDX-License-Identifier: Apache-2.0 | ||
| # TODO: replace those as env variables | ||
|
|
||
| aws_access_key_id=$1 | ||
| aws_secret_access_key=$2 | ||
| aws_region=$3 | ||
| docker_registry=$4 | ||
| git_repository=$7 | ||
| registry_type=$8 | ||
| registry_username=$9 | ||
| registry_password=${10} | ||
| aws_session_token=${11} | ||
|
|
||
| # Within this property docker config should be included to connect to the registry used to pull the image from. | ||
| # it should be prefixed with a comma | ||
| # example: ,"ghcr.io/versatile-data-kit-dev/dp/versatiledatakit":{"auth":"dmVyc2F0aWxlLWRhdGEta2l0LWRldjo8bXlUb2tlbj4="}} | ||
| extra_auth=${extra_auth:-""} | ||
| # Echo selected data to be logged | ||
| echo "AWS_REGION=$aws_region" | ||
| echo "DOCKER_REGISTRY=$docker_registry" | ||
| echo "GIT_REPOSITORY=$git_repository" | ||
| echo "REGISTRY_TYPE=$registry_type" | ||
| # We default to generic repo. | ||
| # We have special support for ECR because | ||
| # even though Kaniko supports building and pushing images to ECR | ||
| # it doesn't create repository nor do they think they should support it - | ||
| # https://github.com/GoogleContainerTools/kaniko/pull/1537 | ||
| # And ECR requires for each image to create separate repository | ||
| # And ECR will not create new image repository on docker push | ||
| # So we need to do it manually. | ||
| if [ "$registry_type" = "ecr" ] || [ "$registry_type" = "ECR" ] ; then | ||
| # Setup credentials to connect to AWS - same creds will be used by kaniko as well. | ||
| aws configure set aws_access_key_id $aws_access_key_id | ||
| aws configure set aws_secret_access_key $aws_secret_access_key | ||
|
|
||
| # Check if aws_session_token is set and not empty. | ||
| if [ -n "$aws_session_token" ] ; then | ||
| aws configure set aws_session_token "$aws_session_token" | ||
| fi | ||
| # https://stackoverflow.com/questions/1199613/extract-filename-and-path-from-url-in-bash-script | ||
| repository_prefix=${docker_registry#*/} | ||
| # Create docker repository if it does not exist | ||
| aws ecr describe-repositories --region $aws_region --repository-names $repository_prefix/${DATA_JOB_NAME} || | ||
| aws ecr create-repository --region $aws_region --repository-name $repository_prefix/${DATA_JOB_NAME} | ||
| echo '{ "credsStore": "ecr-login" }' > /kaniko/.docker/config.json | ||
| elif [ "$registry_type" = "generic" ] || [ "$registry_type" = "GENERIC" ]; then | ||
| export auth=$(echo -n $registry_username:$registry_password | base64 -w 0) | ||
| cat > /kaniko/.docker/config.json <<- EOM | ||
| { | ||
| "auths": { | ||
| "$IMAGE_REGISTRY_PATH": { | ||
| "username":"$registry_username", | ||
| "password":"$registry_password", | ||
| "auth": "$auth" | ||
| } | ||
| $extra_auth | ||
| } | ||
| } | ||
| EOM | ||
| #cat /kaniko/.docker/config.json | ||
| fi | ||
| # Clone repo into /data-jobs dir to get job's source | ||
| git clone $git_repository ./data-jobs | ||
| cd ./data-jobs | ||
| git reset --hard $GIT_COMMIT || ( echo ">data-job-not-found<" && exit 1 ) | ||
| if [ ! -d ${DATA_JOB_NAME} ]; then | ||
| echo ">data-job-not-found<" | ||
| exit 1 | ||
| fi | ||
| cd .. | ||
| # kaniko supports building directly from git repository but as we are using codecommit | ||
| # and using aws session credentials, we need to build it beforehand | ||
| /kaniko/executor \ | ||
| --dockerfile=/workspace/Dockerfile \ | ||
| --destination="${IMAGE_REGISTRY_PATH}/${DATA_JOB_NAME}:${GIT_COMMIT}" \ | ||
| --build-arg=job_githash="$JOB_GITHASH" \ | ||
| --build-arg=base_image="$BASE_IMAGE" \ | ||
| --build-arg=job_name="$JOB_NAME" \ | ||
| --context=./data-jobs $EXTRA_ARGUMENTS |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,23 @@ | ||
| #!/bin/bash | ||
|
|
||
| # Copyright 2023-2024 Broadcom | ||
| # SPDX-License-Identifier: Apache-2.0 | ||
|
|
||
| SCRIPT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )" | ||
| VERSION_TAG=$(cat "$SCRIPT_DIR/version.txt") | ||
| VDK_DOCKER_REGISTRY_URL=${VDK_DOCKER_REGISTRY_URL:-"registry.hub.docker.com/versatiledatakit"} | ||
|
|
||
| function build_and_push_image() { | ||
| name="$1" | ||
| docker_file="$2" | ||
| arguments="$3" | ||
|
|
||
| image_repo="$VDK_DOCKER_REGISTRY_URL/$name" | ||
| image_tag="$image_repo:$VERSION_TAG" | ||
|
|
||
| docker build -t $image_tag -t $image_repo:latest -f "$SCRIPT_DIR/$docker_file" $arguments "$SCRIPT_DIR" | ||
| docker_push_vdk.sh $image_tag | ||
| docker_push_vdk.sh $image_repo:latest | ||
| } | ||
|
|
||
| build_and_push_image "job-builder" Dockerfile |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| 1.0.0 |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,26 @@ | ||
| /* | ||
| * Copyright 2023-2024 Broadcom | ||
| * SPDX-License-Identifier: Apache-2.0 | ||
| */ | ||
|
|
||
| package com.vmware.taurus.service.upload; | ||
|
|
||
| import com.vmware.taurus.service.credentials.AWSCredentialsService; | ||
| import org.eclipse.jgit.transport.CredentialsProvider; | ||
| import org.springframework.cloud.config.server.support.AwsCodeCommitCredentialProvider; | ||
| import org.springframework.stereotype.Component; | ||
|
|
||
| @Component | ||
| public class CodeCommitCredentialProvider { | ||
| private final AWSCredentialsService awsCredentialsService; | ||
|
|
||
| public CodeCommitCredentialProvider(AWSCredentialsService awsCredentialsService) { | ||
| this.awsCredentialsService = awsCredentialsService; | ||
| } | ||
|
|
||
| public CredentialsProvider getProvider() { | ||
| AwsCodeCommitCredentialProvider codeCommitCredentialProvider = new AwsCodeCommitCredentialProvider(); | ||
| codeCommitCredentialProvider.setAwsCredentialProvider(awsCredentialsService.getCredentialsProvider()); | ||
| return codeCommitCredentialProvider; | ||
| } | ||
| } |
| Original file line number | Diff line number | Diff line change | ||
|---|---|---|---|---|
|
|
@@ -35,7 +35,7 @@ public class JobUpload { | |||
| private static final Logger log = LoggerFactory.getLogger(JobUpload.class); | ||||
|
|
||||
| private final String datajobsTempStorageFolder; | ||||
| private final GitCredentialsProvider gitCredentialsProvider; | ||||
| private final CredentialsProvider credentialsProvider; | ||||
| private final GitWrapper gitWrapper; | ||||
| private final FeatureFlags featureFlags; | ||||
| private final AuthorizationProvider authorizationProvider; | ||||
|
|
@@ -45,19 +45,27 @@ public class JobUpload { | |||
| @Autowired | ||||
| public JobUpload( | ||||
| @Value("${datajobs.temp.storage.folder:}") String datajobsTempStorageFolder, | ||||
| @Value("${datajobs.git.assumeIAMRole}") boolean assumeCodeCommitIAMRole, | ||||
| GitCredentialsProvider gitCredentialsProvider, | ||||
| CodeCommitCredentialProvider codeCommitProvider, | ||||
| GitWrapper gitWrapper, | ||||
| FeatureFlags featureFlags, | ||||
| AuthorizationProvider authorizationProvider, | ||||
| JobUploadAllowListValidator jobUploadAllowListValidator, | ||||
| JobUploadFilterListValidator jobUploadFilterListValidator) { | ||||
| this.datajobsTempStorageFolder = datajobsTempStorageFolder; | ||||
| this.gitCredentialsProvider = gitCredentialsProvider; | ||||
| this.gitWrapper = gitWrapper; | ||||
| this.featureFlags = featureFlags; | ||||
| this.authorizationProvider = authorizationProvider; | ||||
| this.jobUploadAllowListValidator = jobUploadAllowListValidator; | ||||
| this.jobUploadFilterListValidator = jobUploadFilterListValidator; | ||||
|
|
||||
| if(assumeCodeCommitIAMRole){ | ||||
|
||||
| this.credentialsProvider = codeCommitProvider.getProvider(); | ||||
| } | ||||
| else{ | ||||
| this.credentialsProvider = gitCredentialsProvider.getProvider(); | ||||
| } | ||||
| } | ||||
|
|
||||
| /** | ||||
|
|
@@ -67,7 +75,6 @@ public JobUpload( | |||
| * @return resource containing data job content in a zip format. | ||||
| */ | ||||
| public Optional<Resource> getDataJob(String jobName) { | ||||
| CredentialsProvider credentialsProvider = gitCredentialsProvider.getProvider(); | ||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. you are changing the flow?
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The getProvider function generates a simple CredentialProvider based on username and password which we are supplying from helm/properties, so I concluded that it's OK to call it once in constructor, rather than every time - Line 30 in 8c67726
Same for the AWS code commit credential provider, we just need roleARN at the starting which we supplied using datajobs.aws.roleArn property |
||||
| try (var tempDirPath = | ||||
| new EphemeralFile(datajobsTempStorageFolder, jobName, "get data job source")) { | ||||
| Git git = | ||||
|
|
@@ -115,7 +122,6 @@ public Optional<Resource> getDataJob(String jobName) { | |||
| public String publishDataJob(String jobName, Resource resource, String reason) { | ||||
| log.debug("Publish datajob to git {}", jobName); | ||||
| String jobVersion; | ||||
| CredentialsProvider credentialsProvider = gitCredentialsProvider.getProvider(); | ||||
| try (var tempDirPath = new EphemeralFile(datajobsTempStorageFolder, jobName, "deploy")) { | ||||
| File jobFolder = | ||||
| FileUtils.unzipDataJob(resource, new File(tempDirPath.toFile(), "job"), jobName); | ||||
|
|
@@ -155,7 +161,6 @@ public String publishDataJob(String jobName, Resource resource, String reason) { | |||
| * @param reason reason specified by user for deleting the data job | ||||
| */ | ||||
| public void deleteDataJob(String jobName, String reason) { | ||||
| CredentialsProvider credentialsProvider = gitCredentialsProvider.getProvider(); | ||||
| try (var tempDirPath = new EphemeralFile(datajobsTempStorageFolder, jobName, "delete")) { | ||||
| Git git = | ||||
| gitWrapper.cloneJobRepository( | ||||
|
|
||||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -119,6 +119,10 @@ datajobs.notification.owner.name=Versatile Data Kit | |
|
|
||
| # The gitlab repository and credentials for pulling data jobs code when building their images. | ||
| datajobs.git.url=${GIT_URL} | ||
| datajobs.git.cc.grc=${GIT_GRC_URL} | ||
|
Comment on lines
121
to
+122
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We can just document that git.url supports both code commit and normal git URL. Seems unnecessary to have both. |
||
|
|
||
| # datajobs.git.assumeIAMRole tells the control-service if the Service Account pattern should be used for AWS CodeCommit. | ||
| datajobs.git.assumeIAMRole=${DATAJOBS_CC_AWS_ASSUME_IAM_ROLE:false} | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. New variables would also need to be exposed and documented n https://github.com/vmware/versatile-data-kit/blob/main/projects/control-service/projects/helm_charts/pipelines-control-service/values.yaml At least there's where we've tried to sort of maintain documentation and list of control service configuration. |
||
| datajobs.git.username=${GIT_USERNAME} | ||
| datajobs.git.password=${GIT_PASSWORD} | ||
| datajobs.git.branch=${GIT_BRANCH:master} | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why can't you just re use thw git url above ?
then you don't need the if stateent below ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is URL expected from Git Remote Code-commit tool, it is following format "codecommit::us-east-1://vdkdata-jobs" and only for this url format, git can fetch from AWS Code Commit repositories
Source - https://github.com/aws/git-remote-codecommit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes but can't you just set this through {datajobs.git.url} property?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested that but code push through jgit didnt work in that case, so I included both the git and grc url, this is a optional property required only if datajobs.git.assumeIAMRole is true, maybe I can add a comment before this field in properties file to clarify this further