Skip to content

Upgrade Debian version in test runner image#5400

Open
parulbajaj01 wants to merge 3 commits intoGoogleCloudPlatform:developfrom
parulbajaj01:golang
Open

Upgrade Debian version in test runner image#5400
parulbajaj01 wants to merge 3 commits intoGoogleCloudPlatform:developfrom
parulbajaj01:golang

Conversation

@parulbajaj01
Copy link
Copy Markdown
Contributor

@parulbajaj01 parulbajaj01 commented Mar 25, 2026

This PR upgrades the base image of our Cloud Build test runner from Debian 11 (Bullseye) to Debian 13 (Trixie) and resolves the associated system-level and Ansible execution issues.

What changed:

  1. Test Runner Dockerfile Updates:
  • Base Image: Upgraded the test-runner Dockerfile base image to golang:trixie.
  • Apt Key Management: Modernized repository key management by replacing deprecated apt-key add - commands with gpg --dearmor and explicit signed-by directives in .list files for HashiCorp and Google Cloud SDK.
  • Python Dependencies: To comply with PEP 668 "externally managed environment" restrictions introduced in newer Debian releases, implemented a Python virtual environment (/opt/venv). This safely isolates pip dependencies (like Ansible and Paramiko) from the system packages.
  • Dependencies: Added the gnupg package to support the new gpg --dearmor key processing.
  1. Ansible Playbook Fixes (Strict Boolean Evaluation):
  • Upgrading the OS bumped the Python version and Ansible version (ansible-core >= 2.16). This version introduced a breaking change requiring strict boolean evaluation in when conditionals.
  • Updated conditionals across multiple playbooks that previously relied on implicit truthy evaluation.
  • Changed them to explicit boolean checks.

Why:

  • To keep our CI/CD tooling up to date with modern OS releases and avoid building on top of the older Debian 11 distribution.
  • To ensure compatibility with newer Ansible versions and prevent silent failures caused by implicit boolean evaluation deprecations.

Testing:

  • Successfully built the new trixie test runner image.
  • Manually triggered and verified successful builds for:
    • gke-h4d.yaml
    • ml-a3-ultragpu-onspot-slurm.yaml
    • ml-g4-onspot-slurm.yaml

Submission Checklist

NOTE: Community submissions can take up to 2 weeks to be reviewed.

Please take the following actions before submitting this pull request.

  • Fork your PR branch from the Toolkit "develop" branch (not main)
  • Test all changes with pre-commit in a local branch #
  • Confirm that "make tests" passes all tests
  • Add or modify unit tests to cover code changes
  • Ensure that unit test coverage remains above 80%
  • Update all applicable documentation
  • Follow Cluster Toolkit Contribution guidelines #

@parulbajaj01 parulbajaj01 requested a review from bytetwin March 25, 2026 09:45
@parulbajaj01 parulbajaj01 requested review from a team and samskillman as code owners March 25, 2026 09:45
@parulbajaj01 parulbajaj01 added release-improvements Added to release notes under the "Improvements" heading. release-version-updates Added to release notes under the "Version Updates" heading. labels Mar 25, 2026
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request updates the Cloud Build test runner environment by upgrading its base image from Debian 11 (Bullseye) to Debian 12 (Bookworm). This change addresses system-level and Ansible execution issues arising from the new OS version, ensuring the CI/CD tooling remains current and compatible with modern software practices and preventing potential build failures due to outdated dependencies or implicit boolean evaluation deprecations.

Highlights

  • Test Runner Dockerfile Updates: The base image was upgraded from Debian 11 (Bullseye) to Debian 12 (Bookworm). Apt key management was modernized by replacing deprecated apt-key add commands with gpg --dearmor and explicit signed-by directives for HashiCorp and Google Cloud SDK. The gnupg package was added, and the --break-system-packages flag was included in all pip install commands to address Python environment restrictions.
  • Ansible Playbook Fixes: Conditionals across multiple Ansible playbooks were updated to use explicit boolean evaluations (e.g., != false, length > 0). This change addresses a breaking change in newer Ansible versions (compatible with Python 3.11 on Bookworm) that require strict boolean checks in when conditions, preventing silent failures.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enhances the robustness of Ansible playbooks by ensuring when conditions correctly handle undefined variables and by improving JSON parsing logic for instance information. It also upgrades the base Docker image for the test runner from golang:bullseye to golang:bookworm, necessitating updates to package management and Python dependency installation commands. A critical issue was identified in the Dockerfile where the URL for requirements.txt in a pip install command is incorrectly split across two lines, which will lead to a build failure.

@parulbajaj01 parulbajaj01 changed the title Upgrade golang version in test runner image Upgrade Debian version in test runner image Mar 27, 2026
bytetwin
bytetwin previously approved these changes Mar 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-improvements Added to release notes under the "Improvements" heading. release-version-updates Added to release notes under the "Version Updates" heading.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants