Skip to content
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 34 additions & 3 deletions .github/workflows/beam_PostCommit_XVR_GoUsingJava_Dataflow.yml
Original file line number Diff line number Diff line change
Expand Up @@ -75,15 +75,46 @@ jobs:
uses: ./.github/actions/setup-environment-action
with:
python-version: default
- name: Set up writable gcloud config directory
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other Dataflow XVR tests also push containers in order to run tests: https://github.com/apache/beam/blob/master/.github/workflows/beam_PostCommit_XVR_PythonUsingJava_Dataflow.yml

but what's the reason only this one requires environment setup in github action yaml file? In general we wish to keep GHA yaml minimum, and aims to make gradle target self contained so developers can test the target locally, or in different environment, not necessarily rely on GitHub Action runner to run.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason this workflow needs the explicit CLOUDSDK_CONFIG setup is because we were hitting specific permission errors that weren't occurring in the PythonUsingJava workflow. The original KUBELET_GCLOUD_CONFIG_PATH points to a read only directory in the Kubernetes pod, causing gcloud to crash when trying to write its config files.
error from logs:
WARNING: Could not setup log file in /var/lib/kubelet/pods/.../volumes/kubernetes.io~empty-dir/gcloud/logs, (Error: Could not create directory [...] Permission denied.
ERROR: gcloud crashed (OperationalError): unable to open database file
denied: Permission "artifactregistry.repositories.uploadArtifacts" denied on resource "projects/apache-beam-testing/locations/us/repositories/us.gcr.io"

run: |
mkdir -p /tmp/gcloud
echo "CLOUDSDK_CONFIG=/tmp/gcloud" >> $GITHUB_ENV
- name: Authenticate to GCP
env:
CLOUDSDK_CONFIG: /tmp/gcloud
uses: google-github-actions/auth@v3
with:
service_account: ${{ secrets.GCP_SA_EMAIL }}
credentials_json: ${{ secrets.GCP_SA_KEY }}
- name: Set up Cloud SDK
env:
CLOUDSDK_CONFIG: /tmp/gcloud
uses: google-github-actions/setup-gcloud@v3
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: GCloud Docker credential helper
- name: Configure Docker auth for GCR
env:
CLOUDSDK_CONFIG: /tmp/gcloud
run: |
export CLOUDSDK_CONFIG=/tmp/gcloud
gcloud --quiet auth configure-docker us.gcr.io
gcloud --quiet auth configure-docker gcr.io
gcloud auth list
cat ~/.docker/config.json | grep -A 5 "us.gcr.io" || echo "Docker config check..."
echo "CLOUDSDK_CONFIG=/tmp/gcloud" >> ~/.docker/config.json.env || true
- name: Docker login to GCR (explicit)
env:
CLOUDSDK_CONFIG: /tmp/gcloud
run: |
gcloud auth configure-docker us.gcr.io
export CLOUDSDK_CONFIG=/tmp/gcloud
gcloud auth print-access-token | docker login -u oauth2accesstoken --password-stdin https://us.gcr.io
docker info | grep -i "username" || echo "Docker auth configured"
export CLOUDSDK_CONFIG=/tmp/gcloud
echo "export CLOUDSDK_CONFIG=/tmp/gcloud" >> ~/.bashrc || true
- name: run XVR GoUsingJava Dataflow script
env:
USER: github-actions
CLOUDSDK_CONFIG: ${{ env.KUBELET_GCLOUD_CONFIG_PATH}}
CLOUDSDK_CONFIG: /tmp/gcloud
uses: ./.github/actions/gradle-command-self-hosted-action
with:
gradle-command: :runners:google-cloud-dataflow-java:validatesCrossLanguageRunnerGoUsingJava
Expand Down
47 changes: 46 additions & 1 deletion runners/google-cloud-dataflow-java/build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -372,11 +372,56 @@ def buildAndPushDockerPythonContainer = tasks.create("buildAndPushDockerPythonCo
root: "apache",
tag: project.sdk_version)
doLast {
def imageExists = false
try {
exec {
commandLine "docker", "inspect", "--type=image", "${defaultDockerImageName}"
ignoreExitValue = false
}
imageExists = true
} catch (Exception e) {
println "Image ${defaultDockerImageName} not found locally: ${e.message}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What caused this? We should fix the underlying cause that docker image not gets built. be able to build container successfully at once. This kind of fallback logic is generally not preferred

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I added this try catch block was to address a specific error we were seeing in the logs:
The Docker build task (:sdks:python:container:py39:docker) was completing successfully, but when the subsequent docker inspect` command ran, the image wasn't found in the local Docker daemon. This happens because when using docker buildx with the docker container driver, the image is built into buildx's cache but isn't automatically loaded into the Docker daemons image store.

The fallback logic try to load the image from buildx cache using docker buildx build --output type=docker` which forces the image into the local Docker daemon. If that fails, it tries to pull from the registry .

println "Attempting to load image from buildx cache..."

def pythonContainerProject = project.project(":sdks:python:container:py${pythonVer.replace('.', '')}")
def dockerBuildDir = pythonContainerProject.tasks.getByName("dockerPrepare").destinationDir
def dockerfile = new File(pythonContainerProject.projectDir, "../Dockerfile")

def loadResult = exec {
commandLine "sh", "-c", """
cd ${dockerBuildDir} && \\
docker buildx build --output type=docker \\
--tag ${defaultDockerImageName} \\
--build-arg py_version=${pythonVer} \\
--build-arg pull_licenses=true \\
-f ${dockerfile.absolutePath} \\
.
"""
ignoreExitValue = true
}

if (loadResult.exitValue != 0) {
println "Failed to load from buildx cache. Attempting to pull from registry..."
def pullResult = exec {
commandLine "docker", "pull", "${defaultDockerImageName}"
ignoreExitValue = true
}

if (pullResult.exitValue != 0) {
throw new GradleException(
"Docker image ${defaultDockerImageName} not found locally, in buildx cache, or in registry. " +
"Check the Docker build output for errors."
)
}
}
}

exec {
commandLine "docker", "tag", "${defaultDockerImageName}", "${dockerPythonImageName}"
}
exec {
commandLine "gcloud", "docker", "--", "push", "${dockerPythonImageName}"
environment "CLOUDSDK_CONFIG", System.getenv("CLOUDSDK_CONFIG") ?: System.getProperty("user.home") + "/.config/gcloud"
commandLine "docker", "push", "${dockerPythonImageName}"
}
}
}
Expand Down
Loading