Skip to content

Conversation

@nogueiraanderson
Copy link
Contributor

@nogueiraanderson nogueiraanderson commented Dec 1, 2025

Summary

Add Jenkins pipelines and shared library for PMM HA testing on Red Hat OpenShift Service on AWS (ROSA) with Hosted Control Plane (HCP).

This implements a complete testing environment for PMM High Availability on OpenShift, addressing the requirements in PMM-14347.

Changes

New Files

  • pmm/v3/pmm3-ha-rosa.groovy - Main pipeline for creating ROSA HCP clusters and deploying PMM HA
  • pmm/v3/pmm3-ha-rosa-cleanup.groovy - Cleanup pipeline with cron support for cost management
  • vars/pmmHaRosa.groovy - Shared library with reusable functions for ROSA operations

Features

  • Creates ROSA HCP clusters with configurable OpenShift version (4.16, 4.17, 4.18)
  • Installs PMM HA using Helm charts from percona-helm-charts repository
  • Supports configurable worker node count and instance types
  • Creates Route53 DNS entries for external access
  • Implements cluster quota management (max 5 clusters)
  • Automated cleanup via cron (twice daily) for clusters older than 24h
  • Custom Security Context Constraints (SCC) for ROSA HCP compatibility
  • Global OpenShift pull secret for Docker Hub authentication

Related

Add Jenkins pipelines for PMM HA testing on ROSA HCP (Red Hat OpenShift
Service on AWS with Hosted Control Planes).

ROSA HCP provides ~4x faster provisioning (~10-15 min vs ~40 min for
self-managed OpenShift), simpler cleanup via rosa CLI, and identical
OpenShift experience for PMM HA testing.

New files:
- vars/pmmHaRosa.groovy: Shared library with reusable functions
- pmm/v3/pmm3-ha-rosa.groovy: Main pipeline for cluster creation
- pmm/v3/pmm3-ha-rosa-cleanup.groovy: Cleanup pipeline with cron

Library provides: createCluster(), installPmm(), createRoute(),
deleteCluster(), and helper functions for QA team reuse.

Ref: PMM-14347
The rosa CLI requires AWS_DEFAULT_REGION for wait, describe, delete
operations. Added export to:
- createCluster: rosa wait, rosa describe (3 places)
- deleteCluster: rosa describe, delete cluster, delete oidc-provider,
  delete operator-roles

Fixes build failures after cluster creation completes.
The 'rosa wait' command doesn't exist. Replaced with a polling loop
that checks cluster state via 'rosa describe cluster' every 30 seconds
for up to 30 minutes.
env.CLUSTER_INFO was being assigned a Groovy Map, but Jenkins
environment variables can only hold strings. When trying to access
env.CLUSTER_INFO.clusterName, it failed because the Map was converted
to a string.

Changed to use a local def variable which properly handles Maps.
The rosa create admin and rosa describe cluster commands in
configureAccess() were missing AWS_DEFAULT_REGION, causing
the password extraction to fail with "ERR: AWS region not set".
The rosa CLI outputs --password <password> (with space) not --password=
(with equals sign). Updated grep patterns to handle both formats.
Previous grep patterns failed to extract just the password from
rosa create admin output. Now using regex pattern to match the
exact ROSA password format: XXXXX-XXXXX-XXXXX-XXXXX
- Create custom SCC (pmm-anyuid) for ROSA HCP clusters instead of
  modifying default SCCs which is blocked by the admission webhook
- Pre-create pmm-secret with all required passwords before helm install
- Fix password extraction using > instead of tee to avoid stdout pollution
- Add all required helm repos (percona, vm, altinity, haproxytech)
- Add helm dependency update step to download sub-charts
- Increase admin user wait time to 90 seconds
- Use secret.create=false and secret.name=pmm-secret for helm install
- Set default chartBranch to pmmha-v3
- Add dockerHubUser and dockerHubPassword parameters to installPmm
- Create dockerhub-pull-secret in the namespace before installing charts
- Link the pull secret to service accounts for image pulls
- Update pipeline to pass hub.docker.com credentials to installPmm
Switch from namespace-level pull secret to global cluster pull secret
in openshift-config namespace. This is the proper OpenShift approach
for avoiding Docker Hub rate limiting cluster-wide.

- Use oc registry login to add Docker Hub credentials
- Update openshift-config/pull-secret instead of creating per-namespace secrets
- Reference: https://access.redhat.com/solutions/6159832
The pmm-ha helm chart templates use b64dec on these keys:
- vmauth.yaml: VMAGENT_remoteWrite_basicAuth_username, VMAGENT_remoteWrite_basicAuth_password
- clickhouse-cluster.yaml: PMM_CLICKHOUSE_USER

When secret.create=false and we pre-create the secret, these keys must exist
or helm template rendering fails with 'b64dec: invalid value; expected string'.

Added all required keys to match charts/pmm-ha/templates/secret.yaml:
- PMM_ADMIN_PASSWORD
- PMM_CLICKHOUSE_USER (default: clickhouse_pmm)
- PMM_CLICKHOUSE_PASSWORD
- VMAGENT_remoteWrite_basicAuth_username (default: victoriametrics_pmm)
- VMAGENT_remoteWrite_basicAuth_password
- PG_PASSWORD
- GF_PASSWORD
The pg-secret-init-job pre-install hook uses pmm-ha-secret-generator
service account which needs runAsAny permissions to run the PMM image
for generating the encryption key.
The rosa whoami command was showing jenkins-pmm-amzn2-worker IAM role
instead of pmm-staging-slave credentials, causing rosa list clusters
to return empty results because different AWS accounts see different
ROSA clusters.
…west

- Add CLUSTER_NAMES param for comma-separated list deletion
- Add SKIP_NEWEST param (default true) to protect active builds
- Add AWS credentials to login stage (fixes empty cluster list)
- Rename DELETE_CLUSTER to DELETE_CLUSTERS for batch operations
- Sort clusters by creation time for SKIP_NEWEST logic
- Add rosa whoami check before listing clusters for debugging
- Capture stderr with stdout (2>&1) instead of suppressing (2>/dev/null)
- Extract JSON from mixed output by finding first '[' character
- Log any debug output that appears before JSON array
The listClusters function was failing with 'ERR: AWS region not set' because
each shell script runs in its own scope and AWS_DEFAULT_REGION from the login
function wasn't available. Now listClusters explicitly sets the region.
- Change default HELM_CHART_BRANCH from 'main' to 'PMM-14420'
- Try theTibi fork first, then fall back to percona repo (like EKS HA)
- Update comments to document that charts are not yet merged to percona main
- This fixes the secret-init-job issue with pmm-encryption-rotation
The sort() method in Groovy modifies in-place and returns an Integer,
while toSorted() returns a new sorted list. This fixes the 'No signature
of method: java.lang.Integer.size()' error.
Jenkins CPS transforms Groovy code differently. Using explicit ArrayList
and Collections.sort() instead of toSorted() to ensure proper behavior
in sandboxed pipeline.
Jenkins sandbox blocks Collections.sort with Closure. Using Groovy's
list.sort() which modifies in-place and list.tail() instead of subList
for better sandbox compatibility.
Use sort { it.createdAt }.reverse() instead of in-place sort with
comparator. Jenkins CPS doesn't handle the spaceship operator sort
correctly. Added debug output to show sorted order.
Remove --watch flag from rosa delete cluster command so deletions
happen asynchronously. This significantly speeds up cleanup jobs
when deleting multiple clusters.
Parse ISO date strings to timestamps and negate for descending sort.
Previous approach with reverse() didn't work in Jenkins sandbox.
The PMM-14420 branch exists in theTibi/percona-helm-charts, not
percona/percona-helm-charts. Added fallback logic matching EKS PR.
The pmm-ha chart's secret init job requires pmm-encryption-rotation
binary which only exists in pmm-server-fb builds (e.g., PR-4078-fa6adbc).

Changes:
- Default PMM_IMAGE_TAG and PMM_IMAGE_REPOSITORY to empty (use chart defaults)
- Only set image.repository/tag in helm if explicitly provided
- Chart default (perconalab/pmm-server-fb:PR-4078-fa6adbc) has the binary
The multi-line string concatenation with escaped backslashes was
breaking the helm command. Using array.join() is cleaner and reliable.
Changed single-quoted shell blocks to double-quoted for proper Groovy
variable interpolation in deleteRoute53Record and createRoute53Record
functions. Used JsonBuilder to construct JSON payloads safely instead
of shell heredocs with embedded variables.
Groovy static fields don't support string interpolation at compile time.
Allow manual triggering of DELETE_OLD action to delete clusters older
than 24 hours. Previously this was only available via cron trigger.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants