-
Notifications
You must be signed in to change notification settings - Fork 47
PMM-14347: Add Jenkins job for PMM HA testing on OpenShift (ROSA HCP) #3700
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
nogueiraanderson
wants to merge
48
commits into
master
Choose a base branch
from
feature/pmm-ha-rosa
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add Jenkins pipelines for PMM HA testing on ROSA HCP (Red Hat OpenShift Service on AWS with Hosted Control Planes). ROSA HCP provides ~4x faster provisioning (~10-15 min vs ~40 min for self-managed OpenShift), simpler cleanup via rosa CLI, and identical OpenShift experience for PMM HA testing. New files: - vars/pmmHaRosa.groovy: Shared library with reusable functions - pmm/v3/pmm3-ha-rosa.groovy: Main pipeline for cluster creation - pmm/v3/pmm3-ha-rosa-cleanup.groovy: Cleanup pipeline with cron Library provides: createCluster(), installPmm(), createRoute(), deleteCluster(), and helper functions for QA team reuse. Ref: PMM-14347
The rosa CLI requires AWS_DEFAULT_REGION for wait, describe, delete operations. Added export to: - createCluster: rosa wait, rosa describe (3 places) - deleteCluster: rosa describe, delete cluster, delete oidc-provider, delete operator-roles Fixes build failures after cluster creation completes.
The 'rosa wait' command doesn't exist. Replaced with a polling loop that checks cluster state via 'rosa describe cluster' every 30 seconds for up to 30 minutes.
env.CLUSTER_INFO was being assigned a Groovy Map, but Jenkins environment variables can only hold strings. When trying to access env.CLUSTER_INFO.clusterName, it failed because the Map was converted to a string. Changed to use a local def variable which properly handles Maps.
The rosa create admin and rosa describe cluster commands in configureAccess() were missing AWS_DEFAULT_REGION, causing the password extraction to fail with "ERR: AWS region not set".
The rosa CLI outputs --password <password> (with space) not --password= (with equals sign). Updated grep patterns to handle both formats.
Previous grep patterns failed to extract just the password from rosa create admin output. Now using regex pattern to match the exact ROSA password format: XXXXX-XXXXX-XXXXX-XXXXX
- Create custom SCC (pmm-anyuid) for ROSA HCP clusters instead of modifying default SCCs which is blocked by the admission webhook - Pre-create pmm-secret with all required passwords before helm install - Fix password extraction using > instead of tee to avoid stdout pollution - Add all required helm repos (percona, vm, altinity, haproxytech) - Add helm dependency update step to download sub-charts - Increase admin user wait time to 90 seconds - Use secret.create=false and secret.name=pmm-secret for helm install - Set default chartBranch to pmmha-v3
- Add dockerHubUser and dockerHubPassword parameters to installPmm - Create dockerhub-pull-secret in the namespace before installing charts - Link the pull secret to service accounts for image pulls - Update pipeline to pass hub.docker.com credentials to installPmm
Switch from namespace-level pull secret to global cluster pull secret in openshift-config namespace. This is the proper OpenShift approach for avoiding Docker Hub rate limiting cluster-wide. - Use oc registry login to add Docker Hub credentials - Update openshift-config/pull-secret instead of creating per-namespace secrets - Reference: https://access.redhat.com/solutions/6159832
The pmm-ha helm chart templates use b64dec on these keys: - vmauth.yaml: VMAGENT_remoteWrite_basicAuth_username, VMAGENT_remoteWrite_basicAuth_password - clickhouse-cluster.yaml: PMM_CLICKHOUSE_USER When secret.create=false and we pre-create the secret, these keys must exist or helm template rendering fails with 'b64dec: invalid value; expected string'. Added all required keys to match charts/pmm-ha/templates/secret.yaml: - PMM_ADMIN_PASSWORD - PMM_CLICKHOUSE_USER (default: clickhouse_pmm) - PMM_CLICKHOUSE_PASSWORD - VMAGENT_remoteWrite_basicAuth_username (default: victoriametrics_pmm) - VMAGENT_remoteWrite_basicAuth_password - PG_PASSWORD - GF_PASSWORD
The pg-secret-init-job pre-install hook uses pmm-ha-secret-generator service account which needs runAsAny permissions to run the PMM image for generating the encryption key.
The rosa whoami command was showing jenkins-pmm-amzn2-worker IAM role instead of pmm-staging-slave credentials, causing rosa list clusters to return empty results because different AWS accounts see different ROSA clusters.
…west - Add CLUSTER_NAMES param for comma-separated list deletion - Add SKIP_NEWEST param (default true) to protect active builds - Add AWS credentials to login stage (fixes empty cluster list) - Rename DELETE_CLUSTER to DELETE_CLUSTERS for batch operations - Sort clusters by creation time for SKIP_NEWEST logic
- Add rosa whoami check before listing clusters for debugging - Capture stderr with stdout (2>&1) instead of suppressing (2>/dev/null) - Extract JSON from mixed output by finding first '[' character - Log any debug output that appears before JSON array
The listClusters function was failing with 'ERR: AWS region not set' because each shell script runs in its own scope and AWS_DEFAULT_REGION from the login function wasn't available. Now listClusters explicitly sets the region.
- Change default HELM_CHART_BRANCH from 'main' to 'PMM-14420' - Try theTibi fork first, then fall back to percona repo (like EKS HA) - Update comments to document that charts are not yet merged to percona main - This fixes the secret-init-job issue with pmm-encryption-rotation
The sort() method in Groovy modifies in-place and returns an Integer, while toSorted() returns a new sorted list. This fixes the 'No signature of method: java.lang.Integer.size()' error.
Jenkins CPS transforms Groovy code differently. Using explicit ArrayList and Collections.sort() instead of toSorted() to ensure proper behavior in sandboxed pipeline.
Jenkins sandbox blocks Collections.sort with Closure. Using Groovy's list.sort() which modifies in-place and list.tail() instead of subList for better sandbox compatibility.
Use sort { it.createdAt }.reverse() instead of in-place sort with
comparator. Jenkins CPS doesn't handle the spaceship operator sort
correctly. Added debug output to show sorted order.
Remove --watch flag from rosa delete cluster command so deletions happen asynchronously. This significantly speeds up cleanup jobs when deleting multiple clusters.
Parse ISO date strings to timestamps and negate for descending sort. Previous approach with reverse() didn't work in Jenkins sandbox.
The PMM-14420 branch exists in theTibi/percona-helm-charts, not percona/percona-helm-charts. Added fallback logic matching EKS PR.
The pmm-ha chart's secret init job requires pmm-encryption-rotation binary which only exists in pmm-server-fb builds (e.g., PR-4078-fa6adbc). Changes: - Default PMM_IMAGE_TAG and PMM_IMAGE_REPOSITORY to empty (use chart defaults) - Only set image.repository/tag in helm if explicitly provided - Chart default (perconalab/pmm-server-fb:PR-4078-fa6adbc) has the binary
The multi-line string concatenation with escaped backslashes was breaking the helm command. Using array.join() is cleaner and reliable.
Changed single-quoted shell blocks to double-quoted for proper Groovy variable interpolation in deleteRoute53Record and createRoute53Record functions. Used JsonBuilder to construct JSON payloads safely instead of shell heredocs with embedded variables.
Groovy static fields don't support string interpolation at compile time.
Allow manual triggering of DELETE_OLD action to delete clusters older than 24 hours. Previously this was only available via cron trigger.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
Add Jenkins pipelines and shared library for PMM HA testing on Red Hat OpenShift Service on AWS (ROSA) with Hosted Control Plane (HCP).
This implements a complete testing environment for PMM High Availability on OpenShift, addressing the requirements in PMM-14347.
Changes
New Files
pmm/v3/pmm3-ha-rosa.groovy- Main pipeline for creating ROSA HCP clusters and deploying PMM HApmm/v3/pmm3-ha-rosa-cleanup.groovy- Cleanup pipeline with cron support for cost managementvars/pmmHaRosa.groovy- Shared library with reusable functions for ROSA operationsFeatures
Related