Commit fe3d2c5
refactor(ci): modularize pipeline utilities into lib/ structure (#3817)
* refactor(ci): modularize pipeline utilities into lib/ structure
Extract common functions from utils.sh into focused modules to improve
maintainability and reduce code duplication.
* fix(ci): address all SonarQube code quality issues in lib modules
Fix all 12 issues reported by SonarQube analysis to improve code maintainability
and follow shell scripting best practices.
Changes:
- Add explicit return statements to all functions (11 Medium issues)
* common.sh: 3 functions (oc_login, is_openshift, sed_inplace)
* operators.sh: 8 functions (all install_* and check_status)
- Extract magic strings to constants (3 Low issues)
* k8s-wait.sh: ERR_MISSING_PARAMS (reused 5 times)
* operators.sh: OPERATOR_STATUS_SUCCEEDED, OPERATOR_NAMESPACE
Benefits:
- Explicit return codes improve error handling and debugging
- Constants reduce duplication and improve maintainability
- Follows SonarQube best practices for shell scripts
Resolves all issues in:
https://sonarcloud.io/project/issues?id=redhat-developer_rhdh&pullRequest=3817
* refactor(ci): address zdrapela review comments
- Remove 'set -euo pipefail' from lib modules to avoid conflicts with entrypoint
- Use DIR variable consistently for sourcing instead of SCRIPT_DIR
- Remove unused common::is_openshift function (now hardcoded in openshift/release)
- Fix shellcheck directive position for log.sh sourcing
* fix(ci): skip Tekton installation for K8s deployments (AKS/EKS/GKE)
Tekton tests are not executed in showcase-k8s or showcase-rbac-k8s projects
(see playwright.config.ts lines 149 and 165-170), but the pipeline was still:
- Installing Tekton operator via cluster_setup_k8s_operator/helm functions
- Applying Pipeline/PipelineRun YAMLs via apply_yaml_files function
This caused deployment failures in AKS with error:
'no endpoints available for service tekton-pipelines-webhook'
Changes:
- Skip operator::install_tekton in cluster_setup_k8s_operator()
- Skip operator::install_tekton in cluster_setup_k8s_helm()
- Conditionally skip Pipeline/Topology YAMLs in apply_yaml_files()
when JOB_NAME contains 'aks', 'eks', or 'gke'
Benefits:
- Fixes AKS deployment error
- Reduces deployment time (skips unnecessary operator installation)
- Aligns deployment with actual test execution
Refs: e2e-tests/playwright.config.ts (Tekton tests excluded from K8s projects)
* fix(ci): add spot node tolerations for Backstage pods in AKS
The Backstage pods were failing to schedule on AKS spot instances because
they lacked the required tolerations and affinity rules. PostgreSQL pods
already had these configurations, but Backstage pods were missing them.
This caused deployment failures with:
Warning FailedScheduling pod/rhdh-developer-hub
0/2 nodes are available:
1 Insufficient cpu
1 node(s) had untolerated taint {kubernetes.azure.com/scalesetpriority: spot}
Changes:
- Add tolerations for kubernetes.azure.com/scalesetpriority=spot to Backstage pods
- Add node affinity to prefer spot instances
- Applied to both diff-values_showcase_AKS.yaml and diff-values_showcase-rbac_AKS.yaml
This ensures Backstage pods can be scheduled on spot nodes like PostgreSQL pods.
* fix(ci): comment out undefined EKS verify functions
The functions aws_eks_verify_cluster and aws_eks_get_cluster_info
are called but were never implemented. This is a pre-existing bug
in the upstream codebase (commit 1add61b).
Commenting them out as TODOs until proper implementation is added.
Fixes EKS job failure:
- /tmp/rhdh/.ibm/pipelines/jobs/eks-helm.sh: line 20: aws_eks_verify_cluster: command not found
* fix(ci): wait for OpenShift Pipelines CRDs before applying Tekton YAMLs
The cluster_setup_ocp_helm() and cluster_setup_ocp_operator() functions
were calling operator::install_pipelines but not waiting for the CRDs
to be ready before proceeding.
This caused a race condition where apply_yaml_files() tried to create
Tekton Pipeline resources before the CRDs were available, resulting in:
error: no matches for kind "Pipeline" in version "tekton.dev/v1"
Added explicit waits for:
- k8s_wait::deployment for pipelines operator
- k8s_wait::endpoint for tekton-pipelines-webhook
Fixes OCP Helm PR check failures.
* refactor(ci): implement k8s_wait::crd function for CRD availability checks
Added a new function, k8s_wait::crd, to streamline the process of waiting for Custom Resource Definitions (CRDs) to become available. This function is now utilized in both the operator.sh and operators.sh scripts to ensure that the necessary CRDs are ready before proceeding with deployments.
Changes:
- Removed the previous wait_for_backstage_crd function in favor of k8s_wait::crd for consistency.
- Updated deploy_rhdh_operator to verify CRD availability after operator installation.
- Enhanced operator::install_pipelines to wait for Tekton Pipelines CRDs before applying YAMLs.
* fix(ci): ensure Backstage CRD availability checks are consistent
Updated the scripts to use the k8s_wait::crd function for waiting on Backstage CRD availability after operator installation. This change enhances consistency across the operator.sh and auth-providers.sh scripts, ensuring that the necessary CRDs are ready before proceeding with subsequent operations.
Changes:
- Removed unnecessary blank lines for cleaner code.
- Added comments to clarify the purpose of CRD availability checks.
* fix(ci): ensure CRD availability checks return proper status
* fix(ci): enhance error handling in OpenShift authentication and deployment checks
Updated the OpenShift authentication process to log errors if the login fails. Additionally, added return statements to the deployment and endpoint checks in the cluster setup functions to ensure proper error handling and prevent proceeding with operations if the checks fail.
* fix(ci): correct deployment name in OpenShift Pipelines checks
Updated the deployment name in the cluster_setup_ocp_helm() and cluster_setup_ocp_operator() functions to ensure accurate waiting for the OpenShift Pipelines operator. Additionally, refined the pod name retrieval logic in the k8s_wait::deployment function for improved reliability in identifying pods. This change enhances the overall accuracy of the deployment checks.
* fix(ci): increase timeout for Tekton webhook endpoint checks
Updated the timeout for the Tekton Pipelines webhook endpoint checks in the cluster_setup_ocp_helm() and cluster_setup_ocp_operator() functions from 30 seconds to 1800 seconds. This change ensures that the scripts have sufficient time to wait for the endpoint to become available, improving the reliability of the deployment process.
* fix(ci): add missing lib/common.sh imports in job files
* refactor(ci): add constants for repeated string literals in utils.sh
* fix(ci): update OpenShift Pipelines checks to use OPERATOR_NAMESPACE
Replaced the hardcoded OPENSHIFT_OPERATORS_NAMESPACE with OPERATOR_NAMESPACE in the cluster_setup_ocp_helm() and cluster_setup_ocp_operator() functions. This change improves flexibility and consistency in the deployment checks for OpenShift Pipelines.
* fix(ci): improve configmap retrieval logic in utils.sh
Enhanced the configmap retrieval process by implementing a wait mechanism that checks for the existence of the default dynamic plugins configmap created by the operator. This change allows the script to wait for up to 2.5 minutes before failing, improving reliability in scenarios where the configmap may take time to be created. Additionally, updated error logging to provide more context if the configmap is not found after the wait period.
* fix(ci): clean up whitespace and improve readability in utils.sh
Removed unnecessary blank lines and adjusted spacing in the configmap retrieval logic to enhance code readability. These changes contribute to a cleaner codebase without altering functionality.
* fix(ci): enhance diagnostic logging in k8s-wait.sh on deployment timeout
Added detailed diagnostic information to the k8s_wait::deployment function. When a timeout occurs, the script now logs pod status, pod description, pod logs, and recent events in the specified namespace. This improvement aids in troubleshooting deployment issues by providing more context on the state of resources at the time of failure.
* fix(ci): improve plugin merging logic in utils.sh
Refactored the plugin merging process to intelligently combine custom and default plugins, ensuring that custom plugins take precedence while avoiding conflicts. This change enhances the flexibility of plugin management and preserves the operator's default plugin states.
* fix(ci): refine plugin merging logic in utils.sh
Updated the plugin merging process to extract default plugins into a separate array and ensure deduplication by package name. This change improves the clarity of the merging strategy and enhances the robustness of plugin management while maintaining custom plugin precedence.
* refactor(ci): simplify orchestrator plugins enabling logic in utils.sh
Removed the complex merging process for orchestrator plugins and streamlined the function to focus on waiting for the Backstage resource to be ready. Updated logging to reflect the new approach, enhancing clarity and maintainability of the code.
* fix(ci): add wait mechanism for PostgreSQL readiness in rbac_deployment function
Implemented a wait mechanism to ensure that the external PostgreSQL database is fully ready before proceeding with the RBAC instance deployment. This change enhances the reliability of the deployment process by allowing immediate connection for the database creation job, and includes error logging for deployment failures.
* fix(ci): add error handling and wait mechanism for Backstage resource and deployment
Enhanced the `enable_orchestrator_plugins_op` function to include error handling for the Backstage resource check, logging an error if the resource is not found. Additionally, implemented a wait mechanism in the `deploy_rhdh_operator` function to ensure the Backstage deployment is created by the operator, with appropriate logging for success and warnings for potential asynchronous creation.
* fix(ci): add verification and wait mechanism for PostgresCluster resource in operator deployment
Enhanced the `deploy_rhdh_operator` function to verify the availability of the PostgresCluster CRD before deploying the Backstage resource. Implemented a wait mechanism to ensure the PostgresCluster resource is created by the operator, with detailed logging for success and error scenarios. This change improves the reliability of the deployment process and aids in troubleshooting.
* fix(ci): enhance wait mechanism for database resource creation in operator deployment
Updated the `deploy_rhdh_operator` function to wait for either a PostgresCluster or StatefulSet resource to be created by the operator. Improved logging to provide clarity on which resource is being checked and added error handling for cases where neither resource is created within the specified wait time. This change enhances the reliability of the deployment process and aids in troubleshooting.
* fix(ci): streamline wait logic in operator deployment for database resources
Refined the `deploy_rhdh_operator` function to eliminate unnecessary whitespace and improve the clarity of the wait mechanism for database resource creation. This update enhances the readability of the code while maintaining the existing functionality and logging for resource checks.
* fix(ci): enhance orchestrator plugins enabling process in utils.sh
Refined the `enable_orchestrator_plugins_op` function to improve the process of enabling orchestrator plugins. This update includes extracting and merging custom and default dynamic plugins, applying the merged configmap, and restarting the Backstage deployment. Enhanced logging and error handling were added to ensure clarity and reliability during the plugin enabling process.
* fix(ci): improve error handling and logging in operator deployment for Backstage resource
Enhanced the `deploy_rhdh_operator` function to log an error if the Backstage deployment is not created within the specified wait time. Added additional logging to check the status of the Backstage CR and the operator logs for better troubleshooting. This change improves the reliability of the deployment process and aids in identifying issues during the Backstage resource creation.
* fix(ci): refine plugin merging logic in enable_orchestrator_plugins_op function
Updated the plugin merging process in the `enable_orchestrator_plugins_op` function to utilize yq for improved clarity and efficiency. The new implementation merges default and custom plugins while ensuring deduplication by package name, enhancing the robustness of plugin management and maintaining custom plugin precedence.
* fix(ci): update logging for orchestrator plugins enabling process in utils.sh
Removed the wait logic for Backstage deployment readiness after enabling orchestrator plugins. Updated logging to clarify that deployment verification will occur in subsequent calls, enhancing the clarity of the process.
* fix(ci): eliminate log:: function bug in timeout subshells
Replace timeout bash -c subshells with proper polling loops to fix
log:: functions not working inside subprocesses.
Functions fixed:
- wait_for_svc(): rewritten with polling loop
- wait_for_endpoint(): rewritten with polling loop
- check_operator_status(): rewritten with polling loop
- waitfor_crunchy_postgres_*(): now uses k8s_wait::crd
- waitfor_tekton_pipelines(): now uses k8s_wait::crd
- install_pipelines_operator(): now uses k8s_wait::crd
- delete_tekton_pipelines(): rewritten with polling loop
Benefits:
- log::info/success/error now work correctly
- Consistent polling pattern across all wait functions
- Reduced code duplication by using k8s_wait::crd
- Variables now use 'local' keyword (DISPLAY_NAME → display_name)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* fix(ci): update Backstage CR to v1alpha5 API with deployment patch
Update all Backstage Custom Resource manifests from v1alpha4 to v1alpha5.
Changes:
- apiVersion: rhdh.redhat.com/v1alpha4 → v1alpha5
- Remove spec.application.image (not supported in v1alpha5)
- Add spec.deployment.patch to override container images
- Configure dynamic-plugins-root volume with 10Gi storage
The v1alpha5 API requires using deployment patches to customize
the container image instead of the direct image field.
Files updated:
- rhdh-start.yaml
- rhdh-start-rbac.yaml
- rhdh-start-runtime.yaml
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* fix(ci): correct misleading comment in enable_orchestrator_plugins_op
Update the function comment to accurately describe behavior:
- The operator DOES create backstage-dynamic-plugins-* configmap
- The function merges operator defaults with custom plugins
- Custom plugins override defaults on package conflicts
The previous comment incorrectly stated the operator doesn't create
the default configmap, which contradicted the actual code behavior.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* fix(ci): remove conflicting volume patch from v1alpha5 Backstage CRs
The ephemeral volume patch for dynamic-plugins-root was conflicting
with the operator's default volume configuration. Removed the volume
patch and let the operator handle the volume creation automatically
(as it did in v1alpha4).
This fixes the deployment timeout issue in e2e-ocp-operator-nightly
where the backstage pod failed to become ready after enabling
orchestrator plugins.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* chore: comment out SonarQube dynamic plugin configuration
Temporarily disable the SonarQube dynamic plugin in the showcase sanity values file to address potential issues related to its integration. This change is part of ongoing adjustments to plugin management.
# Conflicts:
# .ibm/pipelines/value_files/diff-values_showcase-sanity-plugins.yaml
* refactor(ci): eliminate code duplication and add reusable helpers
- Add reusable helper functions in lib/common.sh:
- common::poll_until - generic polling/waiting helper
- common::base64_encode - cross-platform base64 encoding
- common::create_configmap_from_file(s) - idempotent ConfigMap creation
- common::retry - command retry with backoff
- common::save_artifact - artifact saving helper
- Consolidate duplicate functions in utils.sh:
- Merge install_crunchy_postgres_ocp/k8s_operator into single parameterized function
- Replace install_olm/uninstall_olm with delegation shims to lib/operators.sh
- Use new helpers for polling, base64 encoding, and ConfigMap creation
- Simplify install-methods/operator.sh:
- Replace retry loops with common::retry helper
- Revert Backstage CR files to v1alpha4 API:
- Restores spec.application.image support (cleaner than v1alpha5 deployment.patch)
- Keeps consistency with main branch
- Replace artifact saving patterns across deployment files:
- aks-helm-deployment.sh, aks-operator-deployment.sh
- eks-helm-deployment.sh, eks-operator-deployment.sh
- gke-helm-deployment.sh, gke-operator-deployment.sh
- ocp-operator.sh
- Add documentation for namespace defaults in env_variables.sh
- Net reduction of ~280 lines of code
- Improved code maintainability and consistency
- Backward compatibility maintained via shim functions
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* revert: restore diff-values_showcase-sanity-plugins.yaml to original
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>1 parent 4a38688 commit fe3d2c5
File tree
25 files changed
+1827
-674
lines changed- .ibm/pipelines
- cluster
- aks
- eks
- gke
- install-methods
- jobs
- lib
- value_files
25 files changed
+1827
-674
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
121 | 121 | | |
122 | 122 | | |
123 | 123 | | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
0 commit comments