Skip to content
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
69 changes: 9 additions & 60 deletions .evergreen.yml
Original file line number Diff line number Diff line change
Expand Up @@ -61,14 +61,14 @@ variables:
variant: init_test_run
- name: build_test_image
variant: init_test_run
- name: build_agent_images_ubi
variant: init_test_run
- name: build_readiness_probe_image
variant: init_test_run
- name: build_upgrade_hook_image
variant: init_test_run
- name: build_mco_test_image
variant: init_test_run
- name: build_agent_images_ubi
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we still use all run this on every patch, the script just checks whether its required and potentially skips it then if there are no changes.

Why still run it?
On CM and OM bump prs we still need the agent in ecr first. This ensures we build it to ecr first

variant: init_test_run

- &setup_group
setup_group_can_fail_task: true
Expand Down Expand Up @@ -347,19 +347,6 @@ tasks:
image_name: init-ops-manager
include_tags: release

- name: release_agent_operator_release
tags: [ "image_release" ]
allowed_requesters: [ "patch", "github_tag" ]
commands:
- func: clone
- func: setup_building_host
- func: quay_login
- func: setup_docker_sbom
- func: legacy_pipeline
vars:
image_name: agent
include_tags: release

# pct only triggers this variant once a new agent image is out
- name: release_agent
# this enables us to run this variant either manually (patch) which pct does or during an OM bump (github_pr)
Expand Down Expand Up @@ -392,21 +379,6 @@ tasks:
working_dir: src/github.com/mongodb/mongodb-kubernetes
binary: scripts/evergreen/precommit_bump.sh

# Pct only triggers this variant once a new agent image is out
# these releases the agent with the operator suffix (not patch id) on ecr to allow for digest pinning to pass.
# For this to work, we rely on skip_tags which is used to determine whether
# we want to release on quay or not, in this case - ecr instead.
# We rely on the init_database from ecr for the agent x operator images.
# This runs on agent releases that are not concurrent with operator releases.
- name: release_agents_on_ecr_conditional
commands:
- func: clone
- func: run_task_conditionally
vars:
condition_script: scripts/evergreen/should_release_agents_on_ecr.sh
variant: init_release_agents_on_ecr
task: release_agents_on_ecr

- name: release_agents_on_ecr
# this enables us to run this variant either manually (patch) which pct does or during an OM bump (github_pr)
allowed_requesters: [ "patch", "github_pr" ]
Expand Down Expand Up @@ -1334,8 +1306,7 @@ buildvariants:
variant: init_test_run
- name: build_init_database_image_ubi
variant: init_test_run
- name: build_agent_images_ubi
variant: init_test_run

tasks:
- name: e2e_custom_domain_task_group

Expand Down Expand Up @@ -1369,8 +1340,7 @@ buildvariants:
variant: init_test_run
- name: build_init_database_image_ubi
variant: init_test_run
- name: build_agent_images_ubi
variant: init_test_run

run_on:
- ubuntu2204-small
tasks:
Expand Down Expand Up @@ -1617,8 +1587,7 @@ buildvariants:
variant: init_tests_with_olm
- name: build_init_database_image_ubi
variant: init_test_run
- name: build_agent_images_ubi
variant: init_test_run

tasks:
- name: e2e_kind_olm_group

Expand Down Expand Up @@ -1683,18 +1652,6 @@ buildvariants:
- name: build_upgrade_hook_image
- name: prepare_aws

- name: init_release_agents_on_ecr
display_name: init_release_agents_on_ecr
# this enables us to run this variant either manually (patch) which pct does or during an OM bump (github_pr)
allowed_requesters: [ "patch", "github_pr" ]
tags: [ "release_agents_on_ecr" ]
# We want that to run first and finish asap. Digest pinning depends on this to succeed.
priority: 70
run_on:
- ubuntu2204-large
tasks:
- name: release_agents_on_ecr_conditional

- name: run_pre_commit
priority: 70
display_name: run_pre_commit
Expand Down Expand Up @@ -1722,8 +1679,7 @@ buildvariants:
variant: init_test_run
- name: build_init_om_images_ubi
variant: init_test_run
- name: build_agent_images_ubi
variant: init_test_run

run_on:
- ubuntu2204-small
tasks:
Expand Down Expand Up @@ -1809,13 +1765,6 @@ buildvariants:
- name: release_init_database
- name: release_init_ops_manager
- name: release_database
# Once we release the operator, we will also release the init databases, we require them to be out first
# such that we can reference them and retrieve those binaries.
# Since we immediately run daily rebuild after creating the image, we can ensure that the init_database is out
# such that the agent image build can use it.
- name: release_agent_operator_release
depends_on:
- name: release_init_database

- name: preflight_release_images
display_name: preflight_release_images
Expand Down Expand Up @@ -1847,13 +1796,13 @@ buildvariants:

# It will be called by pct while bumping the agent cloud manager image
- name: release_agent
display_name: (Static Containers) Release Agent matrix
display_name: release_agent
tags: [ "release_agent" ]
run_on:
- release-ubuntu2204-large # This is required for CISA attestation https://jira.mongodb.org/browse/DEVPROD-17780
depends_on:
- variant: init_release_agents_on_ecr
name: '*'
- variant: init_test_run
name: build_agent_images_ubi # this ensures the agent gets released to ECR as well
- variant: e2e_multi_cluster_kind
name: '*'
- variant: e2e_static_multi_cluster_2_clusters
Expand Down
186 changes: 26 additions & 160 deletions pipeline.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@
import docker
from lib.base_logger import logger
from lib.sonar.sonar import process_image
from scripts.detect_ops_manager_changes import detect_ops_manager_changes
from scripts.evergreen.release.agent_matrix import (
get_supported_version_for_image,
)
Expand Down Expand Up @@ -1241,22 +1242,11 @@ def build_multi_arch_agent_in_sonar(
)


def build_agent_default_case(build_configuration: BuildConfiguration):
"""
Build the agent only for the latest operator for patches and operator releases.

See more information in the function: build_agent_on_agent_bump
"""
release_json = get_release()

is_release = build_configuration.is_release_step_executed()

# We need to release [all agents x latest operator] on operator releases
if is_release:
agent_versions_to_build = gather_all_supported_agent_versions(release_json)
# We only need [latest agents (for each OM major version and for CM) x patch ID] for patches
else:
agent_versions_to_build = gather_latest_agent_versions(release_json, build_configuration.agent_to_build)
def build_agent(build_configuration: BuildConfiguration):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we still using the legacy pipeline anywhere for agents?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe - right now i want things to be consistent

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this still needs answering. Since you're changing the image build for the agent there should be a decision on whether moving to atomic pipeline is happening. The title and the description of the PR imply that we're getting rid of pipeline.py .

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes i was unclear. What I meant is - we should move to atomic_pipeline to make sure we are not changing 2 files at the same time (for the release as well as for patches).

Due to this reasons I've made pipeline.py and atomic_pipeline.py agent handling the same to not have that edge-case that we overlook something and still call pipeline.py

we plan to migrate to atomic for releases: #344

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but that PR does not touch agent images at all.

If we are in fact moving away from agent image building in legacy pipeline code, we should not make changes to it or remove redundant code completely

Copy link
Collaborator Author

@nammn nammn Aug 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the problem now is that we have duplicated code in pipeline and atomic.

one is for releasing and one for patches but for agent releases both are the same duplicated code. I think its better to just use atomic with scenarion release 937e953. If you strongly disagree i can move back to pipeline and have them duplicated

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

coming back - atomic doesn't support static, so i reverted the code to just use legacy

agent_versions_to_build = detect_ops_manager_changes()
if not agent_versions_to_build:
logger.info("No changes detected, skipping agent build")
return

logger.info(f"Building Agent versions: {agent_versions_to_build}")

Expand All @@ -1267,83 +1257,17 @@ def build_agent_default_case(build_configuration: BuildConfiguration):
if build_configuration.parallel_factor > 0:
max_workers = build_configuration.parallel_factor
with ProcessPoolExecutor(max_workers=max_workers) as executor:
logger.info(f"running with factor of {max_workers}")
for agent_version in agent_versions_to_build:
# We don't need to keep create and push the same image on every build.
# It is enough to create and push the non-operator suffixed images only during releases to ecr and quay.
if build_configuration.is_release_step_executed() or build_configuration.all_agents:
tasks_queue.put(
executor.submit(
build_multi_arch_agent_in_sonar,
build_configuration,
agent_version[0],
agent_version[1],
)
)
_add_to_agent_queue(agent_version, build_configuration, executor, tasks_queue)

queue_exception_handling(tasks_queue)


def build_agent_on_agent_bump(build_configuration: BuildConfiguration):
"""
Build the agent matrix (operator version x agent version), triggered by PCT.

We have three cases where we need to build the agent:
- e2e test runs
- operator releases
- OM/CM bumps via PCT

In OM/CM bumps, we release a new agent.
"""
release_json = get_release()
is_release = build_configuration.is_release_step_executed()

if build_configuration.all_agents:
agent_versions_to_build = gather_all_supported_agent_versions(release_json)
else:
# we only need to release the latest images, we don't need to re-push old images, as we don't clean them up anymore.
agent_versions_to_build = gather_latest_agent_versions(release_json, build_configuration.agent_to_build)

legacy_agent_versions_to_build = release_json["supportedImages"]["mongodb-agent"]["versions"]

tasks_queue = Queue()
max_workers = 1
if build_configuration.parallel:
max_workers = None
if build_configuration.parallel_factor > 0:
max_workers = build_configuration.parallel_factor
with ProcessPoolExecutor(max_workers=max_workers) as executor:
logger.info(f"running with factor of {max_workers}")

# We need to regularly push legacy agents, otherwise ecr lifecycle policy will expire them.
# We only need to push them once in a while to ecr, so no quay required
if not is_release:
for legacy_agent in legacy_agent_versions_to_build:
tasks_queue.put(
executor.submit(
build_multi_arch_agent_in_sonar,
build_configuration,
legacy_agent,
# we assume that all legacy agents are build using that tools version
"100.9.4",
)
)

for agent_version in agent_versions_to_build:
logger.info(f"Running with factor of {max_workers}")
for idx, agent_tools_version in enumerate(agent_versions_to_build):
# We don't need to keep create and push the same image on every build.
# It is enough to create and push the non-operator suffixed images only during releases to ecr and quay.
if build_configuration.is_release_step_executed() or build_configuration.all_agents:
tasks_queue.put(
executor.submit(
build_multi_arch_agent_in_sonar,
build_configuration,
agent_version[0],
agent_version[1],
)
)
logger.info(f"Building Agent versions: {agent_version}")
_add_to_agent_queue(agent_version, build_configuration, executor, tasks_queue)
logger.info(f"======= Building Agent {agent_tools_version} ({idx}/{len(agent_versions_to_build)})")
_build_agents(
agent_tools_version,
build_configuration,
executor,
tasks_queue,
)

queue_exception_handling(tasks_queue)

Expand Down Expand Up @@ -1384,87 +1308,29 @@ def queue_exception_handling(tasks_queue):
)


def _add_to_agent_queue(
agent_version: Tuple[str, str],
def _build_agents(
agent_tools_version: Tuple[str, str],
build_configuration: BuildConfiguration,
executor: ProcessPoolExecutor,
tasks_queue: Queue,
):
tools_version = agent_version[1]
image_version = f"{agent_version[0]}"
agent_version = agent_tools_version[0]
agent_distro = "rhel9_x86_64"
tools_version = agent_tools_version[1]
tools_distro = get_tools_distro(tools_version)["amd"]

tasks_queue.put(
executor.submit(
build_multi_arch_agent_in_sonar,
build_configuration,
image_version,
agent_version,
agent_distro,
tools_version,
tools_distro,
)
)


def gather_all_supported_agent_versions(release: Dict) -> List[Tuple[str, str]]:
# This is a list of a tuples - agent version and corresponding tools version
agent_versions_to_build = list()
agent_versions_to_build.append(
(
release["supportedImages"]["mongodb-agent"]["opsManagerMapping"]["cloud_manager"],
release["supportedImages"]["mongodb-agent"]["opsManagerMapping"]["cloud_manager_tools"],
)
)
for _, om in release["supportedImages"]["mongodb-agent"]["opsManagerMapping"]["ops_manager"].items():
agent_versions_to_build.append((om["agent_version"], om["tools_version"]))

# lets not build the same image multiple times
return sorted(list(set(agent_versions_to_build)))


def gather_latest_agent_versions(release: Dict, agent_to_build: str = "") -> List[Tuple[str, str]]:
"""
This function is used when we release a new agent via OM bump.
That means we will need to release that agent with all supported operators.
Since we don't want to release all agents again, we only release the latest, which will contain the newly added one
:return: the latest agent for each major version
"""
agent_versions_to_build = list()
agent_versions_to_build.append(
(
release["supportedImages"]["mongodb-agent"]["opsManagerMapping"]["cloud_manager"],
release["supportedImages"]["mongodb-agent"]["opsManagerMapping"]["cloud_manager_tools"],
)
)

latest_versions = {}

for version in release["supportedImages"]["mongodb-agent"]["opsManagerMapping"]["ops_manager"].keys():
parsed_version = semver.VersionInfo.parse(version)
major_version = parsed_version.major
if major_version in latest_versions:
latest_parsed_version = semver.VersionInfo.parse(str(latest_versions[major_version]))
latest_versions[major_version] = max(parsed_version, latest_parsed_version)
else:
latest_versions[major_version] = version

for major_version, latest_version in latest_versions.items():
agent_versions_to_build.append(
(
release["supportedImages"]["mongodb-agent"]["opsManagerMapping"]["ops_manager"][str(latest_version)][
"agent_version"
],
release["supportedImages"]["mongodb-agent"]["opsManagerMapping"]["ops_manager"][str(latest_version)][
"tools_version"
],
)
)

if agent_to_build != "":
for agent_tuple in agent_versions_to_build:
if agent_tuple[0] == agent_to_build:
return [agent_tuple]

return sorted(list(set(agent_versions_to_build)))


def get_builder_function_for_image_name() -> Dict[str, Callable]:
"""Returns a dictionary of image names that can be built."""

Expand All @@ -1478,8 +1344,8 @@ def get_builder_function_for_image_name() -> Dict[str, Callable]:
"upgrade-hook": build_upgrade_hook_image,
"operator-quick": build_operator_image_patch,
"database": build_database_image,
"agent-pct": build_agent_on_agent_bump,
"agent": build_agent_default_case,
"agent-pct": build_agent,
"agent": build_agent,
#
# Init images
"init-appdb": build_init_appdb,
Expand Down
Loading