Skip to content

Conversation

nammn
Copy link
Collaborator

@nammn nammn commented Aug 29, 2025

Summary

This pull request introduces several improvements to CI/CD configuration, test reliability, and scripting. The main changes are the expansion of allowed requesters for certain build variants, enhancements to the authentication test fixtures for better isolation and reliability, and a simplification of the AWS CLI installation script.

Test Improvements:

  • Changed test fixtures in replica_set_scram_sha_256_connectivity.py from module scope to function scope, ensuring a fresh environment for each test and reducing side effects between tests.
  • Refactored secret creation and resource loading in authentication tests to use try_load and explicit secret management, improving test reliability and clarity.
  • Removed unnecessary attempts parameter from the authentication check in sharded cluster test, simplifying the test logic.
  • Added "commit" as an allowed requester in several buildvariants sections of .evergreen.yml, enabling these builds to be triggered on direct commits in addition to patches and GitHub tags. -> i hope that fixes smoke tests to run on master merges
  • Updated install_aws_cli_pip() in scripts/evergreen/setup_aws.sh to check if AWS CLI is already installed and working before attempting installation, preventing unnecessary reinstalls.
  • Simplified the installation process to always use pip3 and improved error handling and messaging for installation failures.

Proof

  • green ci
  • openshift test passed as well
  • ibm worked manually triggered the build: PATCH
  • in this patch no smoke and ibm stuff is running
  • after merge ibm stuff is triggered

Checklist

  • Have you linked a jira ticket and/or is the ticket in the title?
  • Have you checked whether your jira ticket required DOCSP changes?
  • Have you added changelog file?

@nammn nammn changed the title Fix aws setup Fix multi-arch master merge setup Aug 29, 2025
@nammn nammn changed the title Fix multi-arch master merge setup Fix multi-arch smoke setup Aug 29, 2025
@nammn nammn added the skip-changelog Use this label in Pull Request to not require new changelog entry file label Aug 29, 2025
Copy link

⚠️ (this preview might not be accurate if the PR is not rebased on current master branch)

MCK 1.3.0 Release Notes

New Features

Multi-Architecture Support

We've added comprehensive multi-architecture support for the kubernetes operator. This enhancement enables deployment on IBM Power (ppc64le) and IBM Z (s390x) architectures alongside
existing x86_64 support. Core images (operator, agent, init containers, database, readiness probe) now support multiple architectures. We do not add support IBM and ARM support for Ops-Manager and the init-ops-manager image.

  • MongoDB Agent images have been migrated to new container repository: quay.io/mongodb/mongodb-agent.
    • the agents in the new repository will support the x86-64, ARM64, s390x, and ppc64le architectures. More can be read in the public docs.
    • operator running >=MCK1.3.0 and static cannot use the agent images from the old container repository quay.io/mongodb/mongodb-agent-ubi.
  • quay.io/mongodb/mongodb-agent-ubi should not be used anymore, it's only there for backwards compatibility.

Bug Fixes

  • This change fixes the current complex and difficult-to-maintain architecture for stateful set containers, which relies on an "agent matrix" to map operator and agent versions which led to a sheer amount of images.
  • We solve this by shifting to a 3-container setup. This new design eliminates the need for the operator-version/agent-version matrix by adding one additional container containing all required binaries. This architecture maps to what we already do with the mongodb-database container.
  • Fixed an issue where the readiness probe reported the node as ready even when its authentication mechanism was not in sync with the other nodes, potentially causing premature restarts.

Other Changes

  • Optional permissions for PersistentVolumeClaim moved to a separate role. When managing the operator with Helm it is possible to disable permissions for PersistentVolumeClaim resources by setting operator.enablePVCResize value to false (true by default). When enabled, previously these permissions were part of the primary operator role. With this change, permissions have a separate role.
  • subresourceEnabled Helm value was removed. This setting used to be true by default and made it possible to exclude subresource permissions from the operator role by specifying false as the value. We are removing this configuration option, making the operator roles always have subresource permissions. This setting was introduced as a temporary solution for this OpenShift issue. The issue has since been resolved and the setting is no longer needed.
  • We have deliberately not published the container images for OpsManager versions 7.0.16, 8.0.8, 8.0.9 and 8.0.10 due to a bug in the OpsManager which prevents MCK customers to upgrade their OpsManager deployments to those versions.

@nammn nammn changed the title Fix multi-arch smoke setup Fix multi-arch smoke setup and some flaky test Aug 29, 2025
@@ -40,7 +40,7 @@ def replica_set(namespace: str, custom_mdb_version) -> MongoDB:
return resource.update()


@fixture(scope="module")
@fixture(scope="function")
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are editing the user and secret later - module scope means we wouldn't return the "updated" user/secret...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blocking: Doesn't this also mean that we will revert any change made to the user?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are right, that whole setup is bad. Let me rework that whole structure. We should have function for reads and generally split up read and write...

Copy link
Collaborator Author

@nammn nammn Aug 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adressed here: 06effa5

ugh i remember that we had the idea of reworking all tests to follow a clear pattern such that we are not running into that again..

@@ -117,7 +117,6 @@ def test_user_can_authenticate_with_new_password(self):
password="my-new-password",
username="mms-user-1",
auth_mechanism="SCRAM-SHA-256",
attempts=20,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missed that from prior pr where we changed the default to 50...

Copy link
Collaborator Author

@nammn nammn Aug 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1m is way too close to the actual time required to pass the test. When it passed it needed 55 seconds

@@ -1475,7 +1475,7 @@ buildvariants:
run_on:
- rhel9-power-small
- rhel9-power-large
allowed_requesters: [ "patch", "github_tag" ]
allowed_requesters: [ "patch", "github_tag" , "commit"]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by this is correct - I mean merging to master will trigger this task

@nammn nammn marked this pull request as ready for review August 29, 2025 13:58
@nammn nammn requested a review from a team as a code owner August 29, 2025 13:58
Copy link
Contributor

@viveksinghggits viveksinghggits left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some nits, looks good overall.

Comment on lines 52 to 54
if command -v aws &> /dev/null && aws --version &> /dev/null 2>&1; then
echo "AWS CLI is already installed and working:"
aws --version
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already ran the aws --version command on line 52, so we don't really have to run it on 54 right? if we are just making sure that aws is working properly.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed

Comment on lines 68 to 70
if command -v aws &> /dev/null && aws --version &> /dev/null 2>&1; then
echo "AWS CLI v1 installed successfully:"
aws --version
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed

@@ -1520,7 +1520,7 @@ buildvariants:
tags: [ "e2e_test_suite", "e2e_smoke_release_test_suite" ]
run_on:
- ubuntu2204-arm64-large
allowed_requesters: [ "patch", "github_tag" ]
allowed_requesters: [ "patch", "github_tag", "commit"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

enabling these builds to be triggered on direct commits in addition to patches and GitHub tags

Can you please explain this a bit, just for my understanding. Do you mean, now this test will also be run when a commit is pushed to a PR branch? If yes, was not does if we just had patch?

Copy link
Collaborator Author

@nammn nammn Aug 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no - only direct commits to master - which means merges to mainline (mainline is master in our case) in evergreen terminology (more on the linked docs).

"Yes, setting allowed_requesters: ["commit"] is correct if you want your Evergreen task to run only on mainline (master) commits—i.e., after a branch is merged into master. This will prevent the task from running on PRs, patches, or other request types."

@@ -40,7 +40,7 @@ def replica_set(namespace: str, custom_mdb_version) -> MongoDB:
return resource.update()


@fixture(scope="module")
@fixture(scope="function")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blocking: Doesn't this also mean that we will revert any change made to the user?

@nammn nammn requested a review from lucian-tosa August 29, 2025 15:10
@nammn nammn merged commit 0da8924 into master Sep 1, 2025
35 of 37 checks passed
@nammn nammn deleted the fix-aws-setup branch September 1, 2025 08:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
skip-changelog Use this label in Pull Request to not require new changelog entry file
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants