-
Notifications
You must be signed in to change notification settings - Fork 47
PMM-14346: PMM HA EKS testing pipeline with ALB, Route53, and Access Entries #3693
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
nogueiraanderson
wants to merge
25
commits into
master
Choose a base branch
from
fix/pmm-ha-eks-access-entries
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
0f410b8 to
4d0bdaa
Compare
aa147f6 to
0ff5e20
Compare
…Entries - Add AWS Load Balancer Controller with IRSA for ALB ingress - Add ALB Ingress with ACM certificate (*.cd.percona.com wildcard) - Add Route53 alias records for friendly URLs (pmm-ha-test-N.cd.percona.com) - Replace ConfigMap-based auth with EKS Access Entries API - Add pmm-eks-admins IAM group for kubectl access - Add SSO AdministratorAccess role support - Add cleanup job with Route53/ALB cleanup before cluster deletion - Extract shared library vars/pmmHaEks.groovy for reusable functions Jira: PMM-14346
0ff5e20 to
b38da5e
Compare
- Remove hardcoded account ID (119175775298), use aws sts get-caller-identity - Remove hardcoded SSO role suffix, discover via aws iam list-roles - Skip SSO role gracefully if not found in account - Revert library branch to feature branch for testing
The jmespath query returns trailing None values when using --output text, causing the access entry creation to fail with an embedded newline.
Discover availability zones from AWS at runtime instead of hardcoding. Improves spot instance resilience - if one AZ has interruptions, pods can reschedule to nodes in other AZs.
Remove comments that merely restate what the code does: - 'Get AWS account ID dynamically' - 'Install PMM HA' - 'Wait for components' - 'Wait for ALB' - 'Create Route53 record' - 'Delete the EKS cluster'
Cleanup requires kubectl and eksctl which may not be available on cli agents.
This reverts commit a3832ce.
CLI agents have kubectl, eksctl, helm, and AWS CLI - same as cleanup.
Removes stale kubeconfig entries from previous builds that could persist in the Jenkins workspace, ensuring the artifact contains only the current cluster configuration.
ClickHouse merge operations failing with MEMORY_LIMIT_EXCEEDED on *.large instances (8GB RAM). Upgrade to *.xlarge (16GB RAM) to provide sufficient memory headroom for the full PMM HA stack.
Date.parse() is not allowed in Jenkins sandbox, causing DELETE_OLD cron jobs to fail with RejectedAccessException. Use shell date -d to convert ISO 8601 timestamps to epoch milliseconds instead. Fixes cron builds #21, #22, #23 failing with: "No such static method found: staticMethod java.util.Date parse"
Default 4Gi memory limit causes MEMORY_LIMIT_EXCEEDED errors during merge operations. Increase to 10Gi with 4Gi requests to allow proper merge execution on xlarge nodes.
Move cluster management functions from cleanup pipeline to pmmHaEks.groovy: - listClusters(): returns clusters sorted newest first (CPS-safe) - deleteAllClusters(): parallel deletion with SKIP_NEWEST and age filter - cleanupOrphans(): removes orphaned VPCs and failed CF stacks Simplify pmm3-ha-eks-cleanup.groovy to high-level orchestration only. Add SKIP_NEWEST parameter and CLEANUP_ORPHANS action.
Replace inline shell cluster discovery with pmmHaEks.listClusters(): - pmm3-ha-eks.groovy: Check Existing Clusters stage - pmm3-ha-eks-cleanup.groovy: List Clusters stage Reduces code duplication and ensures consistent behavior.
Replace readJSON (unavailable DSL method) with shell-based jq parsing. Sorting by createdAt is done in shell using sort -r for CPS safety.
eksctl cannot delete stacks with TerminationProtection enabled. Add step to disable protection on all cluster-related CF stacks before calling eksctl delete.
Add configurable cluster retention (1-7 days) and optional custom PMM admin password per PMM-14613. Changes: - RETENTION_DAYS: cluster survives cron cleanup for N days (default: 1) - PMM_ADMIN_PASSWORD: user-provided or auto-generated 16-char password - delete-after tag stored in cluster metadata for cleanup job - deleteAllClusters() checks tag before deleting, falls back to 24h for legacy Jira: PMM-14613
Add PostgreSQL, ClickHouse, and VictoriaMetrics credentials to pmm-credentials/access-info.txt artifact for convenience.
Add getCredentials() and writeAccessInfo() to pmmHaEks.groovy. Simplifies main pipeline from ~450 to 387 lines.
…to library - validateHelmChart(): validates branch exists and contains pmm-ha charts - resolveR53ZoneId(): resolves Route53 zone ID from zone name (DRY) - Main pipeline reduced from 387 to 352 lines
- Add named constants: MAX_CLUSTERS, DEFAULT_RETENTION_HOURS, etc. - Add validateRetentionDays() for Groovy-level input validation - Move retention validation from shell to Groovy
Reorganize pmmHaEks.groovy into 6 clearly labeled sections: 1. Constants 2. Validation Helpers 3. Credential Management 4. EKS Cluster Setup 5. PMM Installation 6. Cluster Lifecycle (list, delete, cleanup) Update file header to list sections instead of individual functions. Add visual section dividers for better navigation. No functional changes.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Jenkins pipelines for PMM HA testing on EKS.
Jobs
pmm3-ha-eks - Create EKS cluster with PMM HA
theTibi/percona-helm-charts(PMM-14420 branch)https://pmm-ha-test-{BUILD_NUMBER}.cd.percona.compmm3-ha-eks-cleanup - Delete EKS clusters
Access
Users in
pmm-eks-adminsIAM group get kubectl access via EKS Access Entries:Testing
Validated with temporary jobs (to be deleted after merge):
Jira: PMM-14346