Skip to content

Commit fe3d2c5

Browse files
gustavoliraclaude
andauthored
refactor(ci): modularize pipeline utilities into lib/ structure (#3817)
* refactor(ci): modularize pipeline utilities into lib/ structure Extract common functions from utils.sh into focused modules to improve maintainability and reduce code duplication. * fix(ci): address all SonarQube code quality issues in lib modules Fix all 12 issues reported by SonarQube analysis to improve code maintainability and follow shell scripting best practices. Changes: - Add explicit return statements to all functions (11 Medium issues) * common.sh: 3 functions (oc_login, is_openshift, sed_inplace) * operators.sh: 8 functions (all install_* and check_status) - Extract magic strings to constants (3 Low issues) * k8s-wait.sh: ERR_MISSING_PARAMS (reused 5 times) * operators.sh: OPERATOR_STATUS_SUCCEEDED, OPERATOR_NAMESPACE Benefits: - Explicit return codes improve error handling and debugging - Constants reduce duplication and improve maintainability - Follows SonarQube best practices for shell scripts Resolves all issues in: https://sonarcloud.io/project/issues?id=redhat-developer_rhdh&pullRequest=3817 * refactor(ci): address zdrapela review comments - Remove 'set -euo pipefail' from lib modules to avoid conflicts with entrypoint - Use DIR variable consistently for sourcing instead of SCRIPT_DIR - Remove unused common::is_openshift function (now hardcoded in openshift/release) - Fix shellcheck directive position for log.sh sourcing * fix(ci): skip Tekton installation for K8s deployments (AKS/EKS/GKE) Tekton tests are not executed in showcase-k8s or showcase-rbac-k8s projects (see playwright.config.ts lines 149 and 165-170), but the pipeline was still: - Installing Tekton operator via cluster_setup_k8s_operator/helm functions - Applying Pipeline/PipelineRun YAMLs via apply_yaml_files function This caused deployment failures in AKS with error: 'no endpoints available for service tekton-pipelines-webhook' Changes: - Skip operator::install_tekton in cluster_setup_k8s_operator() - Skip operator::install_tekton in cluster_setup_k8s_helm() - Conditionally skip Pipeline/Topology YAMLs in apply_yaml_files() when JOB_NAME contains 'aks', 'eks', or 'gke' Benefits: - Fixes AKS deployment error - Reduces deployment time (skips unnecessary operator installation) - Aligns deployment with actual test execution Refs: e2e-tests/playwright.config.ts (Tekton tests excluded from K8s projects) * fix(ci): add spot node tolerations for Backstage pods in AKS The Backstage pods were failing to schedule on AKS spot instances because they lacked the required tolerations and affinity rules. PostgreSQL pods already had these configurations, but Backstage pods were missing them. This caused deployment failures with: Warning FailedScheduling pod/rhdh-developer-hub 0/2 nodes are available: 1 Insufficient cpu 1 node(s) had untolerated taint {kubernetes.azure.com/scalesetpriority: spot} Changes: - Add tolerations for kubernetes.azure.com/scalesetpriority=spot to Backstage pods - Add node affinity to prefer spot instances - Applied to both diff-values_showcase_AKS.yaml and diff-values_showcase-rbac_AKS.yaml This ensures Backstage pods can be scheduled on spot nodes like PostgreSQL pods. * fix(ci): comment out undefined EKS verify functions The functions aws_eks_verify_cluster and aws_eks_get_cluster_info are called but were never implemented. This is a pre-existing bug in the upstream codebase (commit 1add61b). Commenting them out as TODOs until proper implementation is added. Fixes EKS job failure: - /tmp/rhdh/.ibm/pipelines/jobs/eks-helm.sh: line 20: aws_eks_verify_cluster: command not found * fix(ci): wait for OpenShift Pipelines CRDs before applying Tekton YAMLs The cluster_setup_ocp_helm() and cluster_setup_ocp_operator() functions were calling operator::install_pipelines but not waiting for the CRDs to be ready before proceeding. This caused a race condition where apply_yaml_files() tried to create Tekton Pipeline resources before the CRDs were available, resulting in: error: no matches for kind "Pipeline" in version "tekton.dev/v1" Added explicit waits for: - k8s_wait::deployment for pipelines operator - k8s_wait::endpoint for tekton-pipelines-webhook Fixes OCP Helm PR check failures. * refactor(ci): implement k8s_wait::crd function for CRD availability checks Added a new function, k8s_wait::crd, to streamline the process of waiting for Custom Resource Definitions (CRDs) to become available. This function is now utilized in both the operator.sh and operators.sh scripts to ensure that the necessary CRDs are ready before proceeding with deployments. Changes: - Removed the previous wait_for_backstage_crd function in favor of k8s_wait::crd for consistency. - Updated deploy_rhdh_operator to verify CRD availability after operator installation. - Enhanced operator::install_pipelines to wait for Tekton Pipelines CRDs before applying YAMLs. * fix(ci): ensure Backstage CRD availability checks are consistent Updated the scripts to use the k8s_wait::crd function for waiting on Backstage CRD availability after operator installation. This change enhances consistency across the operator.sh and auth-providers.sh scripts, ensuring that the necessary CRDs are ready before proceeding with subsequent operations. Changes: - Removed unnecessary blank lines for cleaner code. - Added comments to clarify the purpose of CRD availability checks. * fix(ci): ensure CRD availability checks return proper status * fix(ci): enhance error handling in OpenShift authentication and deployment checks Updated the OpenShift authentication process to log errors if the login fails. Additionally, added return statements to the deployment and endpoint checks in the cluster setup functions to ensure proper error handling and prevent proceeding with operations if the checks fail. * fix(ci): correct deployment name in OpenShift Pipelines checks Updated the deployment name in the cluster_setup_ocp_helm() and cluster_setup_ocp_operator() functions to ensure accurate waiting for the OpenShift Pipelines operator. Additionally, refined the pod name retrieval logic in the k8s_wait::deployment function for improved reliability in identifying pods. This change enhances the overall accuracy of the deployment checks. * fix(ci): increase timeout for Tekton webhook endpoint checks Updated the timeout for the Tekton Pipelines webhook endpoint checks in the cluster_setup_ocp_helm() and cluster_setup_ocp_operator() functions from 30 seconds to 1800 seconds. This change ensures that the scripts have sufficient time to wait for the endpoint to become available, improving the reliability of the deployment process. * fix(ci): add missing lib/common.sh imports in job files * refactor(ci): add constants for repeated string literals in utils.sh * fix(ci): update OpenShift Pipelines checks to use OPERATOR_NAMESPACE Replaced the hardcoded OPENSHIFT_OPERATORS_NAMESPACE with OPERATOR_NAMESPACE in the cluster_setup_ocp_helm() and cluster_setup_ocp_operator() functions. This change improves flexibility and consistency in the deployment checks for OpenShift Pipelines. * fix(ci): improve configmap retrieval logic in utils.sh Enhanced the configmap retrieval process by implementing a wait mechanism that checks for the existence of the default dynamic plugins configmap created by the operator. This change allows the script to wait for up to 2.5 minutes before failing, improving reliability in scenarios where the configmap may take time to be created. Additionally, updated error logging to provide more context if the configmap is not found after the wait period. * fix(ci): clean up whitespace and improve readability in utils.sh Removed unnecessary blank lines and adjusted spacing in the configmap retrieval logic to enhance code readability. These changes contribute to a cleaner codebase without altering functionality. * fix(ci): enhance diagnostic logging in k8s-wait.sh on deployment timeout Added detailed diagnostic information to the k8s_wait::deployment function. When a timeout occurs, the script now logs pod status, pod description, pod logs, and recent events in the specified namespace. This improvement aids in troubleshooting deployment issues by providing more context on the state of resources at the time of failure. * fix(ci): improve plugin merging logic in utils.sh Refactored the plugin merging process to intelligently combine custom and default plugins, ensuring that custom plugins take precedence while avoiding conflicts. This change enhances the flexibility of plugin management and preserves the operator's default plugin states. * fix(ci): refine plugin merging logic in utils.sh Updated the plugin merging process to extract default plugins into a separate array and ensure deduplication by package name. This change improves the clarity of the merging strategy and enhances the robustness of plugin management while maintaining custom plugin precedence. * refactor(ci): simplify orchestrator plugins enabling logic in utils.sh Removed the complex merging process for orchestrator plugins and streamlined the function to focus on waiting for the Backstage resource to be ready. Updated logging to reflect the new approach, enhancing clarity and maintainability of the code. * fix(ci): add wait mechanism for PostgreSQL readiness in rbac_deployment function Implemented a wait mechanism to ensure that the external PostgreSQL database is fully ready before proceeding with the RBAC instance deployment. This change enhances the reliability of the deployment process by allowing immediate connection for the database creation job, and includes error logging for deployment failures. * fix(ci): add error handling and wait mechanism for Backstage resource and deployment Enhanced the `enable_orchestrator_plugins_op` function to include error handling for the Backstage resource check, logging an error if the resource is not found. Additionally, implemented a wait mechanism in the `deploy_rhdh_operator` function to ensure the Backstage deployment is created by the operator, with appropriate logging for success and warnings for potential asynchronous creation. * fix(ci): add verification and wait mechanism for PostgresCluster resource in operator deployment Enhanced the `deploy_rhdh_operator` function to verify the availability of the PostgresCluster CRD before deploying the Backstage resource. Implemented a wait mechanism to ensure the PostgresCluster resource is created by the operator, with detailed logging for success and error scenarios. This change improves the reliability of the deployment process and aids in troubleshooting. * fix(ci): enhance wait mechanism for database resource creation in operator deployment Updated the `deploy_rhdh_operator` function to wait for either a PostgresCluster or StatefulSet resource to be created by the operator. Improved logging to provide clarity on which resource is being checked and added error handling for cases where neither resource is created within the specified wait time. This change enhances the reliability of the deployment process and aids in troubleshooting. * fix(ci): streamline wait logic in operator deployment for database resources Refined the `deploy_rhdh_operator` function to eliminate unnecessary whitespace and improve the clarity of the wait mechanism for database resource creation. This update enhances the readability of the code while maintaining the existing functionality and logging for resource checks. * fix(ci): enhance orchestrator plugins enabling process in utils.sh Refined the `enable_orchestrator_plugins_op` function to improve the process of enabling orchestrator plugins. This update includes extracting and merging custom and default dynamic plugins, applying the merged configmap, and restarting the Backstage deployment. Enhanced logging and error handling were added to ensure clarity and reliability during the plugin enabling process. * fix(ci): improve error handling and logging in operator deployment for Backstage resource Enhanced the `deploy_rhdh_operator` function to log an error if the Backstage deployment is not created within the specified wait time. Added additional logging to check the status of the Backstage CR and the operator logs for better troubleshooting. This change improves the reliability of the deployment process and aids in identifying issues during the Backstage resource creation. * fix(ci): refine plugin merging logic in enable_orchestrator_plugins_op function Updated the plugin merging process in the `enable_orchestrator_plugins_op` function to utilize yq for improved clarity and efficiency. The new implementation merges default and custom plugins while ensuring deduplication by package name, enhancing the robustness of plugin management and maintaining custom plugin precedence. * fix(ci): update logging for orchestrator plugins enabling process in utils.sh Removed the wait logic for Backstage deployment readiness after enabling orchestrator plugins. Updated logging to clarify that deployment verification will occur in subsequent calls, enhancing the clarity of the process. * fix(ci): eliminate log:: function bug in timeout subshells Replace timeout bash -c subshells with proper polling loops to fix log:: functions not working inside subprocesses. Functions fixed: - wait_for_svc(): rewritten with polling loop - wait_for_endpoint(): rewritten with polling loop - check_operator_status(): rewritten with polling loop - waitfor_crunchy_postgres_*(): now uses k8s_wait::crd - waitfor_tekton_pipelines(): now uses k8s_wait::crd - install_pipelines_operator(): now uses k8s_wait::crd - delete_tekton_pipelines(): rewritten with polling loop Benefits: - log::info/success/error now work correctly - Consistent polling pattern across all wait functions - Reduced code duplication by using k8s_wait::crd - Variables now use 'local' keyword (DISPLAY_NAME → display_name) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix(ci): update Backstage CR to v1alpha5 API with deployment patch Update all Backstage Custom Resource manifests from v1alpha4 to v1alpha5. Changes: - apiVersion: rhdh.redhat.com/v1alpha4 → v1alpha5 - Remove spec.application.image (not supported in v1alpha5) - Add spec.deployment.patch to override container images - Configure dynamic-plugins-root volume with 10Gi storage The v1alpha5 API requires using deployment patches to customize the container image instead of the direct image field. Files updated: - rhdh-start.yaml - rhdh-start-rbac.yaml - rhdh-start-runtime.yaml Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix(ci): correct misleading comment in enable_orchestrator_plugins_op Update the function comment to accurately describe behavior: - The operator DOES create backstage-dynamic-plugins-* configmap - The function merges operator defaults with custom plugins - Custom plugins override defaults on package conflicts The previous comment incorrectly stated the operator doesn't create the default configmap, which contradicted the actual code behavior. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix(ci): remove conflicting volume patch from v1alpha5 Backstage CRs The ephemeral volume patch for dynamic-plugins-root was conflicting with the operator's default volume configuration. Removed the volume patch and let the operator handle the volume creation automatically (as it did in v1alpha4). This fixes the deployment timeout issue in e2e-ocp-operator-nightly where the backstage pod failed to become ready after enabling orchestrator plugins. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * chore: comment out SonarQube dynamic plugin configuration Temporarily disable the SonarQube dynamic plugin in the showcase sanity values file to address potential issues related to its integration. This change is part of ongoing adjustments to plugin management. # Conflicts: # .ibm/pipelines/value_files/diff-values_showcase-sanity-plugins.yaml * refactor(ci): eliminate code duplication and add reusable helpers - Add reusable helper functions in lib/common.sh: - common::poll_until - generic polling/waiting helper - common::base64_encode - cross-platform base64 encoding - common::create_configmap_from_file(s) - idempotent ConfigMap creation - common::retry - command retry with backoff - common::save_artifact - artifact saving helper - Consolidate duplicate functions in utils.sh: - Merge install_crunchy_postgres_ocp/k8s_operator into single parameterized function - Replace install_olm/uninstall_olm with delegation shims to lib/operators.sh - Use new helpers for polling, base64 encoding, and ConfigMap creation - Simplify install-methods/operator.sh: - Replace retry loops with common::retry helper - Revert Backstage CR files to v1alpha4 API: - Restores spec.application.image support (cleaner than v1alpha5 deployment.patch) - Keeps consistency with main branch - Replace artifact saving patterns across deployment files: - aks-helm-deployment.sh, aks-operator-deployment.sh - eks-helm-deployment.sh, eks-operator-deployment.sh - gke-helm-deployment.sh, gke-operator-deployment.sh - ocp-operator.sh - Add documentation for namespace defaults in env_variables.sh - Net reduction of ~280 lines of code - Improved code maintainability and consistency - Backward compatibility maintained via shim functions Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * revert: restore diff-values_showcase-sanity-plugins.yaml to original Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
1 parent 4a38688 commit fe3d2c5

25 files changed

+1827
-674
lines changed

.ibm/pipelines/README.md

Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -121,3 +121,52 @@ retrieve ephemeral environment credentials.
121121
- `KEYCLOAK_AUTH_CLIENT_SECRET`
122122
- `KEYCLOAK_AUTH_LOGIN_REALM`
123123
- `KEYCLOAK_AUTH_REALM`
124+
125+
---
126+
127+
## Development Guidelines
128+
129+
### Code Quality
130+
131+
The `.ibm` directory contains linting and formatting tools for pipeline scripts.
132+
133+
Install dependencies:
134+
135+
```bash
136+
cd .ibm
137+
yarn install
138+
```
139+
140+
Available commands:
141+
142+
- `yarn shellcheck` - Lint shell scripts (must pass with zero warnings)
143+
- `yarn prettier:check` - Check file formatting
144+
- `yarn prettier:fix` - Auto-format files
145+
146+
Before submitting a PR:
147+
148+
```bash
149+
cd .ibm
150+
yarn prettier:fix
151+
yarn shellcheck
152+
```
153+
154+
### Modular Architecture
155+
156+
Pipeline utilities are organized into modules in `.ibm/pipelines/lib/`:
157+
158+
- `log.sh` - Logging functions
159+
- `common.sh` - Common utilities (oc_login, sed_inplace, etc.)
160+
- `k8s-wait.sh` - Kubernetes wait/polling operations
161+
- `operators.sh` - Operator installations
162+
163+
Usage example:
164+
165+
```bash
166+
# Using modular functions
167+
k8s_wait::deployment "namespace" "deployment"
168+
common::oc_login
169+
operator::install_pipelines
170+
```
171+
172+
See `lib/README.md` for module details.

0 commit comments

Comments
 (0)