diff --git a/AUTONODE_TESTING_GUIDE.md b/AUTONODE_TESTING_GUIDE.md new file mode 100644 index 0000000..60c173a --- /dev/null +++ b/AUTONODE_TESTING_GUIDE.md @@ -0,0 +1,368 @@ +# AutoNode Testing Guide for automation-capi + +## Overview + +This guide explains how to test PR #5686 AutoNode (Karpenter) functionality using the automation-capi repository. The AutoNode feature enables automatic node scaling in ROSA HCP clusters through Karpenter integration. + +## Prerequisites + +### 1. Environment Setup + +- **OCP Hub Cluster:** Running with ACM/MCE installed +- **AWS Credentials:** Valid AWS access keys with appropriate permissions +- **ROSA CLI:** Logged into ROSA stage environment +- **OCM Credentials:** Valid OCM client ID and secret + +### 2. IAM Role Requirements + +Create a Karpenter IAM role with the required permissions (see `KARPENTER_IAM_PERMISSIONS.md`): + +```bash +# Create the role with trust relationship for your ROSA cluster's OIDC provider +aws iam create-role --role-name KarpenterNodeRole \ + --assume-role-policy-document file://karpenter-trust-policy.json + +# Attach required permissions policy +aws iam attach-role-policy --role-name KarpenterNodeRole \ + --policy-arn arn:aws:iam::YOUR-ACCOUNT:policy/KarpenterPolicy +``` + +### 3. CAPA Controller Update + +Ensure your CAPA controller includes PR #5686 AutoNode support: + +```bash +# Check current CAPA controller image +oc get deployment -n multicluster-engine capa-controller-manager \ + -o jsonpath='{.spec.template.spec.containers[0].image}' + +# Update to image with AutoNode support (if needed) +# This step depends on your environment and CAPA build process +``` + +## Configuration + +### 1. Update User Variables + +Copy and customize the AutoNode configuration: + +```bash +cp vars/user_vars_autonode_example.yml vars/user_vars.yml +``` + +Edit `vars/user_vars.yml` with your specific values: + +```yaml +# Required for AutoNode testing +AUTONODE_TESTING_ENABLED: true +AUTONODE_DEFAULT_MODE: "enabled" +AUTONODE_KARPENTER_ROLE_ARN: "arn:aws:iam::YOUR-ACCOUNT:role/KarpenterNodeRole" +AWS_ACCOUNT_ID: "YOUR-ACCOUNT-ID" +ROSA_OIDC_PROVIDER_URL: "your-cluster-oidc-provider-url" + +# Your existing AWS and OCP credentials +OCP_HUB_API_URL: "https://api.your-cluster.domain.com:6443" +AWS_ACCESS_KEY_ID: "your-access-key" +AWS_SECRET_ACCESS_KEY: "your-secret-key" +# ... other required variables +``` + +## Testing Methods + +### Method 1: Full End-to-End Testing + +Run comprehensive AutoNode testing with multiple scenarios: + +```bash +ansible-playbook end2end_tests_autonode.yaml -e "skip_ansible_runner=true" +``` + +This will: +- Validate AutoNode prerequisites +- Test AutoNode enabled cluster creation +- Test AutoNode disabled cluster creation (baseline) +- Validate Karpenter functionality +- Generate detailed test reports +- Clean up test resources + +### Method 2: Individual Scenario Testing + +Test specific AutoNode configurations: + +```bash +# Test AutoNode enabled cluster +ansible-playbook create_rosa_hcp_cluster_with_autonode.yaml \ + -e "AUTONODE_DEFAULT_MODE=enabled" \ + -e "skip_ansible_runner=true" + +# Test AutoNode disabled cluster (baseline) +ansible-playbook create_rosa_hcp_cluster_with_autonode.yaml \ + -e "AUTONODE_DEFAULT_MODE=disabled" \ + -e "skip_ansible_runner=true" +``` + +### Method 3: Manual Configuration Testing + +Use pre-configured cluster templates: + +```bash +# Test with AutoNode enabled template +oc apply -f rosa-control-plane-autonode-enabled.yaml + +# Test with AutoNode disabled template +oc apply -f rosa-control-plane-autonode-disabled.yaml +``` + +## Validation Steps + +### 1. Pre-Test Validation + +Run validation tasks separately: + +```bash +ansible-playbook -i localhost, tasks/validate_autonode_setup.yml \ + -e "skip_ansible_runner=true" +``` + +Expected output: +``` +โœ… AWS CLI available +โœ… ROSA CLI logged in +โœ… Role ARN format valid +โœ… IAM Role exists +Ready for AutoNode Testing: YES +``` + +### 2. Monitor Cluster Creation + +```bash +# Watch cluster creation progress +oc get clusters -n ns-rosa-hcp -w + +# Check control plane status +oc get rosacontrolplanes -n ns-rosa-hcp -o yaml + +# View AutoNode configuration +oc get rosacontrolplanes -n ns-rosa-hcp -o yaml | grep -A 10 autoNode +``` + +### 3. Validate AutoNode Functionality + +For AutoNode enabled clusters: + +```bash +# Check if cluster is ready +rosa describe cluster rosa-autonode-test + +# Verify Karpenter installation (once cluster is ready) +oc get pods -n karpenter + +# Test workload scaling +kubectl apply -f - < --output json | jq '.aws.sts.oidc_endpoint_url' + +# Update role trust relationship with correct OIDC provider +aws iam update-assume-role-policy --role-name KarpenterNodeRole \ + --policy-document file://updated-trust-policy.json +``` + +#### 3. Cluster Creation Failures +``` +Error: AutoNode configuration validation failed +``` + +**Solution:** +```bash +# Check CAPA controller logs +oc logs -n multicluster-engine -l app.kubernetes.io/name=cluster-api-provider-aws --tail=100 + +# Verify cluster configuration +oc get rosacontrolplanes -n ns-rosa-hcp -o yaml + +# Check cluster events +oc get events -n ns-rosa-hcp --sort-by='.lastTimestamp' +``` + +#### 4. Karpenter Not Installing +``` +Cluster created but Karpenter pods not found +``` + +**Solution:** +```bash +# Wait for cluster to be fully ready +rosa describe cluster + +# Check cluster version compatibility +rosa describe cluster --output json | jq '.version' + +# Manual Karpenter installation (if needed) +# This should be automatic but may need manual intervention +``` + +### Diagnostic Commands + +```bash +# Check AutoNode configuration +oc get rosacontrolplanes -n ns-rosa-hcp -o yaml | grep -A 10 autoNode + +# Monitor CAPA controller +oc logs -n multicluster-engine -l app.kubernetes.io/name=cluster-api-provider-aws -f + +# Check cluster status +oc get clusters,rosacontrolplanes,rosaclusters -n ns-rosa-hcp + +# AWS resource validation +aws ec2 describe-instances --filters "Name=tag:karpenter.sh/cluster,Values=" +aws iam get-role --role-name KarpenterNodeRole +``` + +## Test Reports + +### Generated Reports + +After running end-to-end tests, reports are generated in: + +- **Detailed Report:** `results/autonode-tests/autonode-test-report-.md` +- **JSON Results:** `results/autonode-tests/autonode-test-results-.json` +- **Cleanup Report:** `results/autonode-tests/cleanup-report-.md` + +### Report Contents + +1. **Executive Summary:** Pass/fail rates, duration, environment details +2. **Scenario Results:** Individual test outcomes and metrics +3. **AutoNode Analysis:** Feature-specific validation results +4. **Troubleshooting:** Failed test analysis and remediation steps +5. **Next Steps:** Recommendations based on test outcomes + +## Cleanup + +### Automatic Cleanup + +End-to-end tests clean up automatically (if configured): + +```yaml +# In user_vars.yml +AUTONODE_TESTING: + cleanup_on_success: true + cleanup_on_failure: false +``` + +### Manual Cleanup + +```bash +# Remove test clusters +oc delete cluster rosa-autonode-test rosa-autonode-disabled -n ns-rosa-hcp + +# Remove test workloads +oc delete deployment karpenter-test autonode-scale-test -n default + +# Clean up AWS resources (if needed) +aws ec2 describe-instances --filters "Name=tag:AutoNodeTesting,Values=true" +aws ec2 terminate-instances --instance-ids +``` + +## Best Practices + +1. **Test Both Modes:** Always test both enabled and disabled AutoNode modes +2. **Validate Prerequisites:** Run validation tasks before cluster creation +3. **Monitor Resources:** Watch AWS costs during testing +4. **Document Issues:** Capture logs and configurations for any failures +5. **Clean Up:** Remove test resources to avoid unnecessary AWS charges + +## Getting Help + +If you encounter issues during testing: + +1. **Check Logs:** CAPA controller and cluster events +2. **Validate Configuration:** Role ARNs, OIDC providers, AWS permissions +3. **Review Documentation:** `KARPENTER_IAM_PERMISSIONS.md`, `AUTONODE_TROUBLESHOOTING_GUIDE.md` +4. **Contact Support:** Provide test reports and logs for assistance + +--- + +*This guide is specific to testing PR #5686 AutoNode functionality in the automation-capi environment.* \ No newline at end of file diff --git a/create_rosa_hcp_cluster_with_autonode.yaml b/create_rosa_hcp_cluster_with_autonode.yaml new file mode 100644 index 0000000..e72b008 --- /dev/null +++ b/create_rosa_hcp_cluster_with_autonode.yaml @@ -0,0 +1,45 @@ +- name: Create ROSA HCP cluster with AutoNode support + hosts: localhost + any_errors_fatal: true + vars_files: + - vars/vars.yml + - vars/user_vars.yml + tasks: + - set_fact: + ocp_user: "{{ OCP_HUB_CLUSTER_USER }}" + ocp_password: "{{ OCP_HUB_CLUSTER_PASSWORD }}" + api_url: "{{ OCP_HUB_API_URL }}" + mce_namespace: "{{ MCE_NAMESPACE }}" + + - name: Prepare ansible runner host + include_tasks: tasks/prepare_ansible_runner.yml + when: not skip_ansible_runner | bool + + - name: Login OCP + include_tasks: tasks/login_ocp.yml + + - name: Validate AutoNode setup (if enabled) + include_tasks: tasks/validate_autonode_setup.yml + when: AUTONODE_TESTING_ENABLED | default(false) + + - name: Determine cluster configuration file + set_fact: + cluster_config_file: | + {% if AUTONODE_DEFAULT_MODE | default('disabled') == 'enabled' %} + rosa-control-plane-autonode-enabled.yaml + {% else %} + rosa-control-plane-autonode-disabled.yaml + {% endif %} + + - name: Display selected configuration + debug: + msg: | + ๐Ÿš€ Creating ROSA HCP cluster with configuration: + - Config file: {{ cluster_config_file }} + - AutoNode mode: {{ AUTONODE_DEFAULT_MODE | default('disabled') }} + {% if AUTONODE_DEFAULT_MODE | default('disabled') == 'enabled' %} + - Karpenter role: {{ AUTONODE_KARPENTER_ROLE_ARN | default('Not configured') }} + {% endif %} + + - name: Create the ROSA HCP Cluster with AutoNode + include_tasks: tasks/create_rosa_control_plane_autonode.yml \ No newline at end of file diff --git a/end2end_tests_autonode.yaml b/end2end_tests_autonode.yaml new file mode 100644 index 0000000..0e1c244 --- /dev/null +++ b/end2end_tests_autonode.yaml @@ -0,0 +1,80 @@ +--- +- name: End-to-End AutoNode Testing for PR #5686 + hosts: localhost + any_errors_fatal: true + vars_files: + - vars/vars.yml + - vars/user_vars.yml + vars: + test_results: [] + autonode_test_scenarios: + - name: "autonode-enabled" + description: "Test ROSA HCP cluster with AutoNode/Karpenter enabled" + autonode_mode: "enabled" + expected_outcome: "success" + cluster_name: "rosa-e2e-autonode" + config_file: "rosa-control-plane-autonode-enabled.yaml" + + - name: "autonode-disabled" + description: "Test ROSA HCP cluster with AutoNode disabled (baseline)" + autonode_mode: "disabled" + expected_outcome: "success" + cluster_name: "rosa-e2e-traditional" + config_file: "rosa-control-plane-autonode-disabled.yaml" + + tasks: + - name: Initialize test environment + block: + - name: Set test start time + set_fact: + test_start_time: "{{ ansible_date_time.iso8601 }}" + + - name: Create test results directory + file: + path: "{{ AUTONODE_TESTING.reports_dir | default('results/autonode-tests') }}" + state: directory + + - name: Display test plan + debug: + msg: | + ๐Ÿงช AutoNode End-to-End Test Plan + ================================ + + Test Scenarios: {{ autonode_test_scenarios | length }} + {% for scenario in autonode_test_scenarios %} + {{ loop.index }}. {{ scenario.name }}: {{ scenario.description }} + {% endfor %} + + Environment: + - AWS Region: {{ AWS_REGION }} + - CAPI Namespace: {{ capi_namespace }} + - MCE Namespace: {{ mce_namespace }} + + - name: Prepare test environment + block: + - name: Prepare ansible runner host + include_tasks: tasks/prepare_ansible_runner.yml + when: not skip_ansible_runner | bool + + - name: Login to OCP + include_tasks: tasks/login_ocp.yml + + - name: Validate AutoNode prerequisites + include_tasks: tasks/validate_autonode_setup.yml + + - name: Execute AutoNode test scenarios + include_tasks: tasks/autonode_test_scenario.yml + vars: + current_scenario: "{{ item }}" + scenario_index: "{{ ansible_loop.index }}" + loop: "{{ autonode_test_scenarios }}" + loop_control: + loop_var: item + extended: yes + + - name: Generate test report + include_tasks: tasks/generate_autonode_test_report.yml + + - name: Cleanup test resources + include_tasks: tasks/cleanup_autonode_test_resources.yml + when: AUTONODE_TESTING.cleanup_on_success | default(true) \ No newline at end of file diff --git a/rosa-control-plane-autonode-disabled.yaml b/rosa-control-plane-autonode-disabled.yaml new file mode 100644 index 0000000..c45124f --- /dev/null +++ b/rosa-control-plane-autonode-disabled.yaml @@ -0,0 +1,92 @@ +--- +apiVersion: cluster.x-k8s.io/v1beta1 +kind: Cluster +metadata: + name: "rosa-autonode-disabled" + namespace: "ns-rosa-hcp" +spec: + clusterNetwork: + pods: + cidrBlocks: ["192.168.0.0/16"] + infrastructureRef: + apiVersion: infrastructure.cluster.x-k8s.io/v1beta2 + kind: ROSACluster + name: "rosa-autonode-disabled" + namespace: "ns-rosa-hcp" + controlPlaneRef: + apiVersion: controlplane.cluster.x-k8s.io/v1beta2 + kind: ROSAControlPlane + name: "rosa-cp-no-autonode" + namespace: "ns-rosa-hcp" +--- +apiVersion: infrastructure.cluster.x-k8s.io/v1beta2 +kind: ROSACluster +metadata: + name: "rosa-autonode-disabled" + namespace: "ns-rosa-hcp" +spec: {} +--- +apiVersion: controlplane.cluster.x-k8s.io/v1beta2 +kind: ROSAControlPlane +metadata: + name: "rosa-cp-no-autonode" + namespace: "ns-rosa-hcp" +spec: + rosaClusterName: rosa-autonode-disabled + domainPrefix: rosa-no-autonode + version: "4.20.0" + channelGroup: candidate + ## The region should match the aws region used to create the VPC and subnets + region: "us-west-2" + + ## Replace the IAM account roles below with the IAM roles created in the prerequisite steps + ## List the IAM account roles using command 'rosa list account-roles' + installerRoleARN: "arn:aws:iam::471112697682:role/rt3-HCP-ROSA-Installer-Role" + supportRoleARN: "arn:aws:iam::471112697682:role/rt3-HCP-ROSA-Support-Role" + workerRoleARN: "arn:aws:iam::471112697682:role/rt3-HCP-ROSA-Worker-Role" + + ## Replace the oidc config below with the oidc config created in the prerequisite steps + ## List the oidc config using command `rosa list oidc-providers` + oidcID: "2j1ob5s4mvqq9ra6fnnrdogi4l0c7dhq" + + ## Replace IAM operator roles below with the IAM roles created in the prerequisite steps + ## List the operator roles using command `rosa list operator-roles --prefix your-prefix` + rolesRef: + ingressARN: "arn:aws:iam::471112697682:role/rt3-openshift-ingress-operator-cloud-credentials" + imageRegistryARN: "arn:aws:iam::471112697682:role/rt3-openshift-image-registry-installer-cloud-credentials" + storageARN: "arn:aws:iam::471112697682:role/rt3-openshift-cluster-csi-drivers-ebs-cloud-credentials" + networkARN: "arn:aws:iam::471112697682:role/rt3-openshift-cloud-network-config-controller-cloud-credentials" + kubeCloudControllerARN: "arn:aws:iam::471112697682:role/rt3-kube-system-kube-controller-manager" + nodePoolManagementARN: "arn:aws:iam::471112697682:role/rt3-kube-system-capa-controller-manager" + controlPlaneOperatorARN: "arn:aws:iam::471112697682:role/rt3-kube-system-control-plane-operator" + + ## Replace the subnets and availabilityZones with the subnets created in the prerequisite steps + subnets: + - "subnet-062e797b5126b599a" + - "subnet-0bbe3b8c424bcc607" + + availabilityZones: + - "us-west-2b" + network: + machineCIDR: "10.0.0.0/16" + podCIDR: "10.128.0.0/14" + serviceCIDR: "172.30.0.0/16" + + ## AutoNode Configuration - Explicitly disabled + ## This cluster will use traditional machine pools instead of Karpenter + autoNode: + mode: disabled + # roleARN can be omitted when disabled, but including it for comparison testing + + ## Traditional machine pool configuration required when AutoNode is disabled + defaultMachinePoolSpec: + instanceType: "m5.xlarge" + autoscaling: + maxReplicas: 3 + minReplicas: 2 + + additionalTags: + env: "demo" + profile: "hcp" + autonode: "disabled" + scaling: "traditional" \ No newline at end of file diff --git a/rosa-control-plane-autonode-enabled.yaml b/rosa-control-plane-autonode-enabled.yaml new file mode 100644 index 0000000..e75d81c --- /dev/null +++ b/rosa-control-plane-autonode-enabled.yaml @@ -0,0 +1,93 @@ +--- +apiVersion: cluster.x-k8s.io/v1beta1 +kind: Cluster +metadata: + name: "rosa-autonode-test" + namespace: "ns-rosa-hcp" +spec: + clusterNetwork: + pods: + cidrBlocks: ["192.168.0.0/16"] + infrastructureRef: + apiVersion: infrastructure.cluster.x-k8s.io/v1beta2 + kind: ROSACluster + name: "rosa-autonode-test" + namespace: "ns-rosa-hcp" + controlPlaneRef: + apiVersion: controlplane.cluster.x-k8s.io/v1beta2 + kind: ROSAControlPlane + name: "rosa-cp-autonode" + namespace: "ns-rosa-hcp" +--- +apiVersion: infrastructure.cluster.x-k8s.io/v1beta2 +kind: ROSACluster +metadata: + name: "rosa-autonode-test" + namespace: "ns-rosa-hcp" +spec: {} +--- +apiVersion: controlplane.cluster.x-k8s.io/v1beta2 +kind: ROSAControlPlane +metadata: + name: "rosa-cp-autonode" + namespace: "ns-rosa-hcp" +spec: + rosaClusterName: rosa-autonode-test + domainPrefix: rosa-autonode + version: "4.20.0" + channelGroup: candidate + ## The region should match the aws region used to create the VPC and subnets + region: "us-west-2" + + ## Replace the IAM account roles below with the IAM roles created in the prerequisite steps + ## List the IAM account roles using command 'rosa list account-roles' + installerRoleARN: "arn:aws:iam::471112697682:role/rt3-HCP-ROSA-Installer-Role" + supportRoleARN: "arn:aws:iam::471112697682:role/rt3-HCP-ROSA-Support-Role" + workerRoleARN: "arn:aws:iam::471112697682:role/rt3-HCP-ROSA-Worker-Role" + + ## Replace the oidc config below with the oidc config created in the prerequisite steps + ## List the oidc config using command `rosa list oidc-providers` + oidcID: "2j1ob5s4mvqq9ra6fnnrdogi4l0c7dhq" + + ## Replace IAM operator roles below with the IAM roles created in the prerequisite steps + ## List the operator roles using command `rosa list operator-roles --prefix your-prefix` + rolesRef: + ingressARN: "arn:aws:iam::471112697682:role/rt3-openshift-ingress-operator-cloud-credentials" + imageRegistryARN: "arn:aws:iam::471112697682:role/rt3-openshift-image-registry-installer-cloud-credentials" + storageARN: "arn:aws:iam::471112697682:role/rt3-openshift-cluster-csi-drivers-ebs-cloud-credentials" + networkARN: "arn:aws:iam::471112697682:role/rt3-openshift-cloud-network-config-controller-cloud-credentials" + kubeCloudControllerARN: "arn:aws:iam::471112697682:role/rt3-kube-system-kube-controller-manager" + nodePoolManagementARN: "arn:aws:iam::471112697682:role/rt3-kube-system-capa-controller-manager" + controlPlaneOperatorARN: "arn:aws:iam::471112697682:role/rt3-kube-system-control-plane-operator" + + ## Replace the subnets and availabilityZones with the subnets created in the prerequisite steps + subnets: + - "subnet-062e797b5126b599a" + - "subnet-0bbe3b8c424bcc607" + + availabilityZones: + - "us-west-2b" + network: + machineCIDR: "10.0.0.0/16" + podCIDR: "10.128.0.0/14" + serviceCIDR: "172.30.0.0/16" + + ## AutoNode Configuration for Karpenter Integration + ## Enable AutoNode to use Karpenter for automatic node scaling + autoNode: + mode: enabled + roleARN: "arn:aws:iam::471112697682:role/KarpenterNodeRole" + + ## NOTE: When AutoNode is enabled, defaultMachinePoolSpec becomes optional + ## Karpenter will handle node provisioning automatically based on workload demands + defaultMachinePoolSpec: + instanceType: "m5.xlarge" + autoscaling: + maxReplicas: 3 + minReplicas: 2 + + additionalTags: + env: "demo" + profile: "hcp" + autonode: "enabled" + karpenter: "true" \ No newline at end of file diff --git a/tasks/autonode_test_scenario.yml b/tasks/autonode_test_scenario.yml new file mode 100644 index 0000000..d950ef5 --- /dev/null +++ b/tasks/autonode_test_scenario.yml @@ -0,0 +1,267 @@ +--- +# Individual AutoNode test scenario execution +# This task runs a single test scenario for AutoNode functionality + +- name: Initialize scenario variables + set_fact: + scenario_name: "{{ current_scenario.name }}" + scenario_autonode_mode: "{{ current_scenario.autonode_mode }}" + scenario_cluster_name: "{{ current_scenario.cluster_name }}" + scenario_config_file: "{{ current_scenario.config_file }}" + scenario_start_time: "{{ ansible_date_time.iso8601 }}" + scenario_result: + name: "{{ current_scenario.name }}" + status: "running" + start_time: "{{ ansible_date_time.iso8601 }}" + +- name: "Scenario {{ scenario_index }}: {{ current_scenario.description }}" + debug: + msg: | + ๐Ÿงช Starting Test Scenario {{ scenario_index }} + ========================================== + Name: {{ scenario_name }} + Description: {{ current_scenario.description }} + AutoNode Mode: {{ scenario_autonode_mode }} + Expected Outcome: {{ current_scenario.expected_outcome }} + +- name: Execute test scenario + block: + # Phase 1: Pre-test validation + - name: Phase 1 - Pre-test validation + block: + - name: Validate scenario configuration + debug: + msg: | + ๐Ÿ“‹ Validating scenario configuration: + - Config file: {{ scenario_config_file }} + - Cluster name: {{ scenario_cluster_name }} + - AutoNode mode: {{ scenario_autonode_mode }} + + - name: Set AutoNode environment for scenario + set_fact: + AUTONODE_DEFAULT_MODE: "{{ scenario_autonode_mode }}" + cluster_config_file: "{{ scenario_config_file }}" + + - name: Validate AutoNode setup for scenario + include_tasks: tasks/validate_autonode_setup.yml + when: scenario_autonode_mode == "enabled" + + # Phase 2: Cluster creation + - name: Phase 2 - Cluster creation + block: + - name: Create cluster with AutoNode configuration + include_tasks: tasks/create_rosa_control_plane_autonode.yml + + - name: Wait for cluster to be provisioning + shell: | + oc get cluster {{ scenario_cluster_name }} -n {{ capi_namespace }} -o jsonpath='{.status.phase}' + register: cluster_phase_check + until: cluster_phase_check.stdout in ["Provisioning", "Provisioned"] + retries: 10 + delay: 30 + failed_when: false + + - name: Record cluster creation status + set_fact: + cluster_creation_success: "{{ cluster_phase_check.rc == 0 }}" + + # Phase 3: AutoNode-specific validation (if enabled) + - name: Phase 3 - AutoNode functionality validation + block: + - name: Wait for control plane to be ready + shell: | + oc get rosacontrolplane {{ scenario_cluster_name }}-cp -n {{ capi_namespace }} -o jsonpath='{.status.ready}' + register: controlplane_ready + until: controlplane_ready.stdout == "true" + retries: 60 + delay: 60 + failed_when: false + + - name: Verify AutoNode configuration in cluster + shell: | + oc get rosacontrolplane {{ scenario_cluster_name }}-cp -n {{ capi_namespace }} -o yaml | grep -A 5 "autoNode:" + register: autonode_config_verify + failed_when: false + + - name: Validate AutoNode mode in cluster spec + assert: + that: + - "'mode: enabled' in autonode_config_verify.stdout" + fail_msg: "AutoNode mode not set to 'enabled' in cluster specification" + success_msg: "โœ… AutoNode mode correctly set to 'enabled'" + when: scenario_autonode_mode == "enabled" + + - name: Check for Karpenter role ARN in cluster spec + assert: + that: + - "AUTONODE_KARPENTER_ROLE_ARN in autonode_config_verify.stdout" + fail_msg: "Karpenter role ARN not found in cluster specification" + success_msg: "โœ… Karpenter role ARN correctly configured" + when: + - scenario_autonode_mode == "enabled" + - AUTONODE_KARPENTER_ROLE_ARN is defined + + - name: Wait for cluster to be fully provisioned + shell: | + oc get cluster {{ scenario_cluster_name }} -n {{ capi_namespace }} -o jsonpath='{.status.phase}' + register: final_cluster_phase + until: final_cluster_phase.stdout == "Provisioned" + retries: 120 # 2 hours max + delay: 60 + failed_when: false + + - name: Check cluster provisioning result + set_fact: + cluster_provisioned: "{{ final_cluster_phase.stdout == 'Provisioned' }}" + + when: scenario_autonode_mode == "enabled" + + # Phase 4: Workload testing (for AutoNode enabled scenarios) + - name: Phase 4 - AutoNode scaling validation + block: + - name: Get cluster kubeconfig + shell: | + rosa describe cluster {{ scenario_cluster_name }} --output json | jq -r '.api.url' + register: cluster_api_url + failed_when: false + + - name: Deploy test workload for AutoNode scaling + shell: | + oc apply -f - <= 3 + retries: 20 + delay: 30 + failed_when: false + + - name: Check if Karpenter provisioned nodes + shell: | + oc get nodes --show-labels | grep karpenter || echo "No Karpenter nodes found" + register: karpenter_nodes_check + failed_when: false + + - name: Record AutoNode scaling test results + set_fact: + autonode_scaling_success: "{{ running_pods_count.stdout|int >= 3 }}" + karpenter_nodes_found: "{{ 'karpenter' in karpenter_nodes_check.stdout }}" + + when: + - scenario_autonode_mode == "enabled" + - cluster_provisioned | default(false) + + # Update scenario result on success + - name: Record successful test result + set_fact: + scenario_result: "{{ scenario_result | combine({ + 'status': 'success', + 'end_time': ansible_date_time.iso8601, + 'cluster_provisioned': cluster_provisioned | default(false), + 'autonode_scaling_tested': autonode_scaling_success | default(false), + 'karpenter_nodes_found': karpenter_nodes_found | default(false), + 'duration_minutes': ((ansible_date_time.epoch|int - (scenario_start_time | to_datetime('%Y-%m-%dT%H:%M:%SZ')).strftime('%s')|int) / 60) | round(2) + }) }}" + + rescue: + # Handle test failures + - name: Record failed test result + set_fact: + scenario_result: "{{ scenario_result | combine({ + 'status': 'failed', + 'end_time': ansible_date_time.iso8601, + 'error_message': ansible_failed_result.msg | default('Unknown error'), + 'duration_minutes': ((ansible_date_time.epoch|int - (scenario_start_time | to_datetime('%Y-%m-%dT%H:%M:%SZ')).strftime('%s')|int) / 60) | round(2) + }) }}" + + - name: Display failure information + debug: + msg: | + โŒ Test Scenario {{ scenario_index }} Failed + ========================================= + Error: {{ ansible_failed_result.msg | default('Unknown error') }} + + Troubleshooting commands: + oc get cluster {{ scenario_cluster_name }} -n {{ capi_namespace }} -o yaml + oc get rosacontrolplane {{ scenario_cluster_name }}-cp -n {{ capi_namespace }} -o yaml + oc logs -n {{ capa_system_namespace }} -l app.kubernetes.io/name=cluster-api-provider-aws --tail=50 + + always: + # Always record test results + - name: Add scenario result to test results + set_fact: + test_results: "{{ test_results + [scenario_result] }}" + + - name: Display scenario completion + debug: + msg: | + ๐Ÿ“Š Scenario {{ scenario_index }} Completed + ============================== + Name: {{ scenario_name }} + Status: {{ scenario_result.status }} + Duration: {{ scenario_result.duration_minutes | default('N/A') }} minutes + {% if scenario_result.status == 'success' %} + โœ… Test passed successfully + {% else %} + โŒ Test failed: {{ scenario_result.error_message | default('See logs for details') }} + {% endif %} + + # Cleanup scenario resources (if configured) + - name: Cleanup scenario cluster + block: + - name: Delete test cluster + shell: | + oc delete cluster {{ scenario_cluster_name }} -n {{ capi_namespace }} --timeout=300s + register: cluster_delete_result + failed_when: false + + - name: Wait for cluster deletion + shell: | + oc get cluster {{ scenario_cluster_name }} -n {{ capi_namespace }} + register: cluster_exists_check + until: cluster_exists_check.rc != 0 + retries: 30 + delay: 30 + failed_when: false + + - name: Display cleanup status + debug: + msg: | + ๐Ÿงน Cleanup Status: + {% if cluster_exists_check.rc != 0 %} + โœ… Cluster {{ scenario_cluster_name }} successfully deleted + {% else %} + โš ๏ธ Cluster {{ scenario_cluster_name }} may still exist - manual cleanup may be required + {% endif %} + + when: + - AUTONODE_TESTING.cleanup_on_success | default(true) + - scenario_result.status == 'success' + ignore_errors: yes \ No newline at end of file diff --git a/tasks/cleanup_autonode_test_resources.yml b/tasks/cleanup_autonode_test_resources.yml new file mode 100644 index 0000000..67857a9 --- /dev/null +++ b/tasks/cleanup_autonode_test_resources.yml @@ -0,0 +1,193 @@ +--- +# Cleanup AutoNode test resources +# This task removes test clusters and associated resources after testing + +- name: Initialize cleanup process + debug: + msg: | + ๐Ÿงน Starting AutoNode Test Cleanup + ================================ + Cleanup Policy: {{ 'Enabled' if AUTONODE_TESTING.cleanup_on_success else 'Disabled' }} + +- name: Identify test clusters to cleanup + shell: | + oc get clusters -n {{ capi_namespace }} -o jsonpath='{.items[?(@.metadata.name=="rosa-e2e-autonode" || @.metadata.name=="rosa-e2e-traditional")].metadata.name}' + register: test_clusters_found + failed_when: false + +- name: Display clusters found for cleanup + debug: + msg: | + Test clusters found: {{ test_clusters_found.stdout.split() if test_clusters_found.stdout else 'None' }} + +- name: Cleanup test clusters + block: + - name: Delete AutoNode test clusters + shell: | + oc delete cluster {{ item }} -n {{ capi_namespace }} --timeout=600s + register: cluster_delete_result + failed_when: false + loop: "{{ test_clusters_found.stdout.split() }}" + when: test_clusters_found.stdout != "" + + - name: Wait for cluster deletion completion + shell: | + oc get cluster {{ item }} -n {{ capi_namespace }} + register: cluster_exists_check + until: cluster_exists_check.rc != 0 + retries: 60 # 30 minutes max + delay: 30 + failed_when: false + loop: "{{ test_clusters_found.stdout.split() }}" + when: test_clusters_found.stdout != "" + + - name: Display cleanup results + debug: + msg: | + ๐Ÿงน Cleanup Results: + {% for cluster in test_clusters_found.stdout.split() %} + - {{ cluster }}: {{ 'Deleted' if hostvars[inventory_hostname]['cluster_exists_check']['results'][loop.index0]['rc'] != 0 else 'May still exist' }} + {% endfor %} + + when: + - AUTONODE_TESTING.cleanup_on_success | default(true) + - test_clusters_found.stdout != "" + +- name: Cleanup test workloads + block: + - name: Remove test deployments + shell: | + oc delete deployment autonode-scale-test -n default --ignore-not-found=true + oc delete deployment autonode-test -n default --ignore-not-found=true + register: workload_cleanup + failed_when: false + + - name: Display workload cleanup status + debug: + msg: "Test workloads cleanup: {{ 'Completed' if workload_cleanup.rc == 0 else 'Some issues encountered' }}" + + when: AUTONODE_TESTING.cleanup_on_success | default(true) + +- name: Cleanup temporary files + block: + - name: Find temporary cluster configuration files + find: + paths: "/tmp" + patterns: "*autonode*.yaml" + file_type: file + register: temp_config_files + + - name: Remove temporary configuration files + file: + path: "{{ item.path }}" + state: absent + loop: "{{ temp_config_files.files }}" + + - name: Display file cleanup status + debug: + msg: "Removed {{ temp_config_files.files | length }} temporary configuration files" + +- name: Check for remaining AutoNode resources + block: + - name: Check for remaining ROSAControlPlanes with AutoNode + shell: | + oc get rosacontrolplanes -n {{ capi_namespace }} -o yaml | grep -B 5 -A 5 "autoNode:" || echo "No AutoNode configurations found" + register: remaining_autonode_resources + failed_when: false + + - name: Check for remaining test pods + shell: | + oc get pods -A -l app=autonode-scale-test -o name || echo "No test pods found" + register: remaining_test_pods + failed_when: false + + - name: Display remaining resources check + debug: + msg: | + ๐Ÿ” Remaining Resources Check: + + AutoNode Configurations: + {{ remaining_autonode_resources.stdout | default('None found') }} + + Test Pods: + {{ remaining_test_pods.stdout | default('None found') }} + +- name: Generate cleanup report + copy: + content: | + # AutoNode Test Cleanup Report + + **Cleanup Timestamp:** {{ ansible_date_time.iso8601 }} + **Cleanup Policy:** {{ 'Enabled' if AUTONODE_TESTING.cleanup_on_success else 'Disabled' }} + + ## Cleanup Summary + + ### Test Clusters + {% if test_clusters_found.stdout %} + Clusters processed for cleanup: + {% for cluster in test_clusters_found.stdout.split() %} + - {{ cluster }} + {% endfor %} + {% else %} + No test clusters found for cleanup. + {% endif %} + + ### Temporary Files + - Configuration files removed: {{ temp_config_files.files | length }} + + ### Test Workloads + - Test deployments cleanup: {{ 'Completed' if workload_cleanup.rc == 0 else 'Issues encountered' }} + + ## Manual Cleanup (if needed) + + If any resources remain, use these commands for manual cleanup: + + ```bash + # Remove any remaining test clusters + oc get clusters -n {{ capi_namespace }} | grep -E "(autonode|e2e)" | awk '{print $1}' | xargs -I {} oc delete cluster {} -n {{ capi_namespace }} + + # Remove test workloads + oc delete deployment autonode-scale-test autonode-test -n default --ignore-not-found=true + + # Check for any AutoNode-related resources + oc get rosacontrolplanes -n {{ capi_namespace }} -o yaml | grep -B 10 -A 10 autoNode + + # Remove temporary files + rm -f /tmp/*autonode*.yaml + ``` + + ## AWS Resources + + Note: Some AWS resources may have been created during testing: + - EC2 instances (should be terminated automatically) + - Launch templates (may need manual cleanup) + - IAM roles (test roles should be cleaned up manually) + + Check AWS console or use AWS CLI: + ```bash + # List EC2 instances with AutoNode tags + aws ec2 describe-instances --filters "Name=tag:AutoNodeTesting,Values=true" --query 'Reservations[*].Instances[*].[InstanceId,State.Name,Tags[?Key==`Name`].Value|[0]]' --output table + + # List launch templates created during testing + aws ec2 describe-launch-templates --filters "Name=tag:AutoNodeTesting,Values=true" --query 'LaunchTemplates[*].[LaunchTemplateName,LaunchTemplateId]' --output table + ``` + + dest: "{{ AUTONODE_TESTING.reports_dir | default('results/autonode-tests') }}/cleanup-report-{{ ansible_date_time.epoch }}.md" + +- name: Final cleanup status + debug: + msg: | + โœ… AutoNode Test Cleanup Completed + ================================= + + {% if AUTONODE_TESTING.cleanup_on_success %} + - Test clusters: {{ 'Cleaned up' if test_clusters_found.stdout else 'None to clean' }} + - Temporary files: Removed + - Test workloads: Removed + {% else %} + - Cleanup was disabled - manual cleanup may be required + {% endif %} + + ๐Ÿ“‹ Cleanup report: {{ AUTONODE_TESTING.reports_dir | default('results/autonode-tests') }}/cleanup-report-{{ ansible_date_time.epoch }}.md + + โš ๏ธ Note: Check AWS console for any remaining AWS resources that may incur costs. \ No newline at end of file diff --git a/tasks/create_rosa_control_plane_autonode.yml b/tasks/create_rosa_control_plane_autonode.yml new file mode 100644 index 0000000..2f89e72 --- /dev/null +++ b/tasks/create_rosa_control_plane_autonode.yml @@ -0,0 +1,198 @@ +--- +# Enhanced cluster creation task with AutoNode support +# This task creates ROSA HCP clusters with optional AutoNode/Karpenter integration + +- name: Set cluster configuration facts + set_fact: + cluster_name: "{{ 'rosa-autonode-test' if AUTONODE_DEFAULT_MODE == 'enabled' else 'rosa-autonode-disabled' }}" + config_file: "{{ cluster_config_file }}" + autonode_enabled: "{{ AUTONODE_DEFAULT_MODE | default('disabled') == 'enabled' }}" + +- name: Generate dynamic cluster configuration + block: + - name: Create temporary cluster config from template + template: + src: "{{ config_file }}" + dest: "/tmp/{{ cluster_name }}-config.yaml" + vars: + # Template variables for dynamic configuration + cluster_name_override: "{{ cluster_name }}" + autonode_mode: "{{ AUTONODE_DEFAULT_MODE | default('disabled') }}" + karpenter_role_arn: "{{ AUTONODE_KARPENTER_ROLE_ARN | default('') }}" + aws_account_id: "{{ AWS_ACCOUNT_ID | default('') }}" + aws_region: "{{ AWS_REGION }}" + + - name: Display generated cluster configuration + debug: + msg: | + ๐Ÿ“„ Generated cluster configuration: + - File: /tmp/{{ cluster_name }}-config.yaml + - Cluster: {{ cluster_name }} + - AutoNode: {{ autonode_enabled }} + {% if autonode_enabled %} + - Karpenter Role: {{ AUTONODE_KARPENTER_ROLE_ARN }} + {% endif %} + + rescue: + - name: Fallback to static configuration file + set_fact: + cluster_config_path: "{{ config_file }}" + + - name: Use static configuration + debug: + msg: "Using static configuration file: {{ cluster_config_path }}" + +- name: Pre-cluster creation validation + block: + - name: Validate cluster configuration file exists + stat: + path: "/tmp/{{ cluster_name }}-config.yaml" + register: config_file_stat + + - name: Set final configuration path + set_fact: + final_config_path: "{{ '/tmp/' + cluster_name + '-config.yaml' if config_file_stat.stat.exists else config_file }}" + + - name: Display final configuration path + debug: + msg: "Final configuration file: {{ final_config_path }}" + + - name: Validate AutoNode configuration in YAML + shell: | + grep -A 5 "autoNode:" {{ final_config_path }} + register: autonode_config_check + failed_when: false + + - name: Display AutoNode configuration from file + debug: + msg: | + AutoNode configuration in cluster file: + {{ autonode_config_check.stdout | default('No AutoNode configuration found') }} + +- name: Apply cluster configuration + block: + - name: Create ROSA HCP cluster + shell: | + oc apply -f {{ final_config_path }} + register: cluster_creation_result + + - name: Display cluster creation result + debug: + msg: | + ๐ŸŽ‰ Cluster creation initiated: + {{ cluster_creation_result.stdout }} + + rescue: + - name: Handle cluster creation failure + debug: + msg: | + โŒ Cluster creation failed: + {{ cluster_creation_result.stderr | default('Unknown error') }} + + - name: Provide troubleshooting guidance + debug: + msg: | + ๐Ÿ”ง Troubleshooting steps: + 1. Check cluster configuration: cat {{ final_config_path }} + 2. Verify CAPA controller logs: oc logs -n {{ capa_system_namespace }} -l app.kubernetes.io/name=cluster-api-provider-aws + 3. Check cluster events: oc get events -n {{ capi_namespace }} + {% if autonode_enabled %} + 4. Validate AutoNode IAM role: aws iam get-role --role-name {{ extracted_role_name | default('KarpenterRole') }} + 5. Check OIDC provider configuration + {% endif %} + + - fail: + msg: "Cluster creation failed. See troubleshooting guidance above." + +- name: Post-creation monitoring (AutoNode clusters) + block: + - name: Wait for ROSAControlPlane to be created + shell: | + oc get rosacontrolplane {{ cluster_name }}-cp -n {{ capi_namespace }} -o jsonpath='{.status.ready}' + register: controlplane_status + until: controlplane_status.stdout == "true" + retries: 30 + delay: 60 + failed_when: false + + - name: Check AutoNode status in cluster + shell: | + oc get rosacontrolplane {{ cluster_name }}-cp -n {{ capi_namespace }} -o yaml | grep -A 5 autoNode + register: autonode_status_check + failed_when: false + + - name: Display AutoNode status + debug: + msg: | + ๐Ÿ” AutoNode Status Check: + {{ autonode_status_check.stdout | default('Could not retrieve AutoNode status') }} + + - name: Monitor cluster creation progress + shell: | + oc get cluster {{ cluster_name }} -n {{ capi_namespace }} -o jsonpath='{.status.phase}' + register: cluster_phase + failed_when: false + + - name: Display cluster status + debug: + msg: | + ๐Ÿ“Š Cluster Creation Status: + - Cluster Phase: {{ cluster_phase.stdout | default('Unknown') }} + - Control Plane Ready: {{ controlplane_status.stdout | default('Unknown') }} + {% if autonode_enabled %} + - AutoNode Mode: enabled + - Next: Karpenter will handle node provisioning automatically + {% else %} + - AutoNode Mode: disabled + - Next: Traditional machine pools will handle node provisioning + {% endif %} + + when: autonode_enabled + +- name: Create cluster monitoring task + debug: + msg: | + ๐ŸŽฏ Next Steps: + + Monitor cluster creation: + oc get cluster {{ cluster_name }} -n {{ capi_namespace }} -w + + Check control plane status: + oc get rosacontrolplane {{ cluster_name }}-cp -n {{ capi_namespace }} -o yaml + + {% if autonode_enabled %} + Monitor AutoNode/Karpenter activity: + # Once cluster is ready, check Karpenter installation + oc get pods -n karpenter + + # Monitor node provisioning + oc get nodes --show-labels | grep karpenter + + # Test AutoNode scaling + kubectl apply -f - < 0 else 0 }}" + total_duration: "{{ test_results | map(attribute='duration_minutes') | map('default', 0) | list | sum | round(2) }}" + +- name: Create detailed test report + copy: + content: | + # AutoNode Testing Report - PR #5686 + + **Generated:** {{ report_timestamp }} + **Test Environment:** {{ OCP_HUB_API_URL }} + **AWS Region:** {{ AWS_REGION }} + + ## Executive Summary + + | Metric | Value | + |--------|-------| + | Total Scenarios | {{ total_scenarios }} | + | Successful | {{ successful_scenarios }} | + | Failed | {{ failed_scenarios }} | + | Success Rate | {{ success_rate }}% | + | Total Duration | {{ total_duration }} minutes | + + ## Test Environment Details + + ### Configuration + - **CAPI Namespace:** {{ capi_namespace }} + - **CAPA System Namespace:** {{ capa_system_namespace }} + - **MCE Namespace:** {{ mce_namespace }} + - **AutoNode Testing Enabled:** {{ AUTONODE_TESTING_ENABLED | default(false) }} + - **Default AutoNode Mode:** {{ AUTONODE_DEFAULT_MODE | default('disabled') }} + + ### Prerequisites Validation + {% if AUTONODE_KARPENTER_ROLE_ARN is defined %} + - **Karpenter Role ARN:** {{ AUTONODE_KARPENTER_ROLE_ARN }} + {% else %} + - **Karpenter Role ARN:** Not configured + {% endif %} + - **AWS Account ID:** {{ AWS_ACCOUNT_ID | default('Not specified') }} + - **ROSA OIDC Provider:** {{ ROSA_OIDC_PROVIDER_URL | default('Not specified') }} + + ## Test Scenarios Results + + {% for result in test_results %} + ### Scenario {{ loop.index }}: {{ result.name }} + + **Status:** {% if result.status == 'success' %}โœ… PASSED{% else %}โŒ FAILED{% endif %} + **Duration:** {{ result.duration_minutes | default('N/A') }} minutes + **Start Time:** {{ result.start_time }} + **End Time:** {{ result.end_time | default('N/A') }} + + {% if result.status == 'success' %} + **Success Metrics:** + - Cluster Provisioned: {{ 'โœ… Yes' if result.cluster_provisioned else 'โŒ No' }} + {% if result.autonode_scaling_tested is defined %} + - AutoNode Scaling Tested: {{ 'โœ… Yes' if result.autonode_scaling_tested else 'โŒ No' }} + - Karpenter Nodes Found: {{ 'โœ… Yes' if result.karpenter_nodes_found else 'โŒ No' }} + {% endif %} + {% else %} + **Failure Details:** + - Error Message: {{ result.error_message | default('No error message recorded') }} + {% endif %} + + --- + {% endfor %} + + ## Detailed Analysis + + ### AutoNode Feature Validation + + {% set autonode_enabled_tests = test_results | selectattr('name', 'search', 'autonode-enabled') | list %} + {% if autonode_enabled_tests | length > 0 %} + #### AutoNode Enabled Tests + {% for test in autonode_enabled_tests %} + - **{{ test.name }}:** {{ 'PASSED' if test.status == 'success' else 'FAILED' }} + {% if test.status == 'success' %} + - Cluster provisioning successful + {% if test.autonode_scaling_tested %} + - AutoNode scaling validation completed + {% endif %} + {% if test.karpenter_nodes_found %} + - Karpenter nodes detected in cluster + {% endif %} + {% endif %} + {% endfor %} + {% else %} + No AutoNode enabled tests were executed. + {% endif %} + + ### Traditional Scaling Baseline + + {% set traditional_tests = test_results | selectattr('name', 'search', 'disabled') | list %} + {% if traditional_tests | length > 0 %} + #### Traditional Machine Pool Tests (Baseline) + {% for test in traditional_tests %} + - **{{ test.name }}:** {{ 'PASSED' if test.status == 'success' else 'FAILED' }} + {% endfor %} + {% else %} + No traditional scaling baseline tests were executed. + {% endif %} + + ## Key Findings + + {% if successful_scenarios | int == total_scenarios | int %} + ### โœ… All Tests Passed + - All AutoNode test scenarios completed successfully + - PR #5686 AutoNode feature is functioning correctly + - Both enabled and disabled AutoNode modes work as expected + {% elif successful_scenarios | int > 0 %} + ### โš ๏ธ Partial Success + - {{ successful_scenarios }}/{{ total_scenarios }} test scenarios passed + - Review failed scenarios for specific issues + {% else %} + ### โŒ All Tests Failed + - Critical issues detected with AutoNode implementation + - Requires immediate attention before merge + {% endif %} + + ### AutoNode Feature Assessment + + {% set autonode_tests = test_results | selectattr('name', 'search', 'autonode') | list %} + {% if autonode_tests | length > 0 %} + {% set autonode_success = autonode_tests | selectattr('status', 'equalto', 'success') | list | length %} + - AutoNode feature success rate: {{ ((autonode_success / (autonode_tests | length)) * 100) | round(1) }}% + {% if autonode_success == autonode_tests | length %} + - **Recommendation:** AutoNode feature is ready for production use + {% elif autonode_success > 0 %} + - **Recommendation:** AutoNode feature needs minor fixes before production + {% else %} + - **Recommendation:** AutoNode feature requires significant fixes + {% endif %} + {% endif %} + + ## Troubleshooting Information + + ### Failed Test Analysis + {% set failed_tests = test_results | selectattr('status', 'equalto', 'failed') | list %} + {% if failed_tests | length > 0 %} + {% for failed_test in failed_tests %} + #### {{ failed_test.name }} + - **Error:** {{ failed_test.error_message | default('No error message') }} + - **Duration before failure:** {{ failed_test.duration_minutes | default('N/A') }} minutes + - **Recommended investigation:** + - Check CAPA controller logs: `oc logs -n {{ capa_system_namespace }} -l app.kubernetes.io/name=cluster-api-provider-aws` + - Review cluster events: `oc get events -n {{ capi_namespace }}` + - Validate IAM permissions for AutoNode role + {% endfor %} + {% else %} + No failed tests to analyze. + {% endif %} + + ### Commands for Manual Verification + + ```bash + # Check CAPA controller status + oc get pods -n {{ capa_system_namespace }} -l app.kubernetes.io/name=cluster-api-provider-aws + + # Monitor cluster creation + oc get clusters -n {{ capi_namespace }} -w + + # Check AutoNode configuration + oc get rosacontrolplanes -n {{ capi_namespace }} -o yaml | grep -A 10 autoNode + + # Validate Karpenter (for enabled clusters) + oc get pods -n karpenter + oc get nodes --show-labels | grep karpenter + + # Check AWS IAM role + aws iam get-role --role-name KarpenterNodeRole + ``` + + ## Next Steps + + {% if failed_scenarios | int == 0 %} + 1. **Ready for Merge:** All tests passed - PR #5686 is ready for integration + 2. **Documentation:** Update user documentation with AutoNode configuration examples + 3. **Monitoring:** Set up production monitoring for AutoNode clusters + {% else %} + 1. **Fix Issues:** Address failed test scenarios before proceeding + 2. **Re-test:** Run failed scenarios individually for debugging + 3. **IAM Validation:** Verify Karpenter IAM role permissions + 4. **OIDC Configuration:** Confirm OIDC provider trust relationships + {% endif %} + + --- + *Report generated by automation-capi AutoNode testing framework* + dest: "{{ report_dir }}/autonode-test-report-{{ ansible_date_time.epoch }}.md" + +- name: Create JSON report for programmatic access + copy: + content: | + { + "test_summary": { + "timestamp": "{{ report_timestamp }}", + "environment": { + "ocp_hub_url": "{{ OCP_HUB_API_URL }}", + "aws_region": "{{ AWS_REGION }}", + "capi_namespace": "{{ capi_namespace }}", + "capa_system_namespace": "{{ capa_system_namespace }}" + }, + "statistics": { + "total_scenarios": {{ total_scenarios }}, + "successful_scenarios": {{ successful_scenarios }}, + "failed_scenarios": {{ failed_scenarios }}, + "success_rate_percent": {{ success_rate }}, + "total_duration_minutes": {{ total_duration }} + }, + "configuration": { + "autonode_testing_enabled": {{ AUTONODE_TESTING_ENABLED | default(false) | to_json }}, + "default_autonode_mode": "{{ AUTONODE_DEFAULT_MODE | default('disabled') }}", + "karpenter_role_arn": "{{ AUTONODE_KARPENTER_ROLE_ARN | default('') }}", + "aws_account_id": "{{ AWS_ACCOUNT_ID | default('') }}" + } + }, + "test_results": {{ test_results | to_json }} + } + dest: "{{ report_dir }}/autonode-test-results-{{ ansible_date_time.epoch }}.json" + +- name: Display test summary + debug: + msg: | + ๐ŸŽฏ AutoNode Testing Summary + ========================== + + ๐Ÿ“Š **Results Overview:** + - Total Scenarios: {{ total_scenarios }} + - Successful: {{ successful_scenarios }} + - Failed: {{ failed_scenarios }} + - Success Rate: {{ success_rate }}% + - Total Duration: {{ total_duration }} minutes + + {% if successful_scenarios | int == total_scenarios | int %} + ๐ŸŽ‰ **All tests passed!** PR #5686 AutoNode feature is working correctly. + {% elif failed_scenarios | int > 0 %} + โš ๏ธ **Some tests failed.** Review the detailed report for troubleshooting guidance. + {% endif %} + + ๐Ÿ“‹ **Reports Generated:** + - Detailed Report: {{ report_dir }}/autonode-test-report-{{ ansible_date_time.epoch }}.md + - JSON Results: {{ report_dir }}/autonode-test-results-{{ ansible_date_time.epoch }}.json + + ๐Ÿ” **Quick Analysis:** + {% for result in test_results %} + - {{ result.name }}: {{ 'PASSED' if result.status == 'success' else 'FAILED' }} ({{ result.duration_minutes | default('N/A') }}min) + {% endfor %} + +- name: Set overall test status + set_fact: + autonode_testing_passed: "{{ failed_scenarios | int == 0 }}" + +- name: Final test status + debug: + msg: | + ๐Ÿ **Final Status: {{ 'PASSED' if autonode_testing_passed else 'FAILED' }}** + + {{ 'AutoNode testing completed successfully. PR #5686 is ready for integration!' if autonode_testing_passed else 'AutoNode testing identified issues. Review failed scenarios before proceeding.' }} \ No newline at end of file diff --git a/tasks/validate_autonode_setup.yml b/tasks/validate_autonode_setup.yml new file mode 100644 index 0000000..2d01ef1 --- /dev/null +++ b/tasks/validate_autonode_setup.yml @@ -0,0 +1,227 @@ +--- +# AutoNode (Karpenter) Setup Validation Tasks +# These tasks validate IAM role setup and prerequisites before cluster creation + +- name: Check if AutoNode testing is enabled + debug: + msg: "AutoNode testing enabled: {{ AUTONODE_TESTING_ENABLED | default(false) }}" + +- name: Validate AutoNode configuration + block: + - name: Check if Karpenter role ARN is provided when AutoNode is enabled + fail: + msg: | + AUTONODE_KARPENTER_ROLE_ARN is required when AutoNode mode is 'enabled'. + Please set this variable in vars/user_vars.yml. + Example: AUTONODE_KARPENTER_ROLE_ARN: "arn:aws:iam::123456789012:role/KarpenterNodeRole" + when: + - AUTONODE_DEFAULT_MODE | default('disabled') == 'enabled' + - AUTONODE_KARPENTER_ROLE_ARN is not defined or AUTONODE_KARPENTER_ROLE_ARN == "" + + - name: Validate Karpenter role ARN format + fail: + msg: | + Invalid Karpenter role ARN format: {{ AUTONODE_KARPENTER_ROLE_ARN }} + Expected format: arn:aws:iam::ACCOUNT-ID:role/ROLE-NAME + Example: arn:aws:iam::123456789012:role/KarpenterNodeRole + when: + - AUTONODE_KARPENTER_ROLE_ARN is defined + - AUTONODE_KARPENTER_ROLE_ARN != "" + - not (AUTONODE_KARPENTER_ROLE_ARN | regex_search('^arn:aws:iam::\d{12}:role\/[a-zA-Z0-9+=,.@_-]+$')) + + - name: Extract AWS account ID from Karpenter role ARN + set_fact: + extracted_aws_account: "{{ AUTONODE_KARPENTER_ROLE_ARN.split(':')[4] }}" + extracted_role_name: "{{ AUTONODE_KARPENTER_ROLE_ARN.split('/')[-1] }}" + when: + - AUTONODE_KARPENTER_ROLE_ARN is defined + - AUTONODE_KARPENTER_ROLE_ARN != "" + + - name: Display extracted role information + debug: + msg: | + Karpenter Role Information: + - Account ID: {{ extracted_aws_account | default('Not extracted') }} + - Role Name: {{ extracted_role_name | default('Not extracted') }} + - Full ARN: {{ AUTONODE_KARPENTER_ROLE_ARN | default('Not provided') }} + + when: AUTONODE_TESTING_ENABLED | default(false) + +- name: Validate AWS credentials for AutoNode + block: + - name: Check AWS credentials are provided + fail: + msg: "AWS credentials are required for AutoNode testing. Please set AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY." + when: + - AWS_ACCESS_KEY_ID is not defined or AWS_ACCESS_KEY_ID == "" + - AWS_SECRET_ACCESS_KEY is not defined or AWS_SECRET_ACCESS_KEY == "" + + - name: Check AWS region is set + fail: + msg: "AWS_REGION is required for AutoNode testing." + when: AWS_REGION is not defined or AWS_REGION == "" + + - name: Verify AWS CLI availability + shell: which aws + register: aws_cli_check + failed_when: aws_cli_check.rc != 0 + ignore_errors: yes + + - name: Warning about AWS CLI + debug: + msg: | + WARNING: AWS CLI not found. IAM role validation will be limited. + To enable full validation, install AWS CLI: pip install awscli + when: aws_cli_check.rc != 0 + + when: AUTONODE_TESTING_ENABLED | default(false) + +- name: Validate Karpenter IAM role (if AWS CLI available) + block: + - name: Check if Karpenter IAM role exists + shell: | + aws iam get-role --role-name {{ extracted_role_name }} --region {{ AWS_REGION }} + environment: + AWS_ACCESS_KEY_ID: "{{ AWS_ACCESS_KEY_ID }}" + AWS_SECRET_ACCESS_KEY: "{{ AWS_SECRET_ACCESS_KEY }}" + AWS_DEFAULT_REGION: "{{ AWS_REGION }}" + register: karpenter_role_check + failed_when: false + when: + - extracted_role_name is defined + - aws_cli_check.rc == 0 + + - name: Display role validation result + debug: + msg: | + Karpenter Role Validation: + {% if karpenter_role_check.rc == 0 %} + โœ… Role {{ extracted_role_name }} exists in AWS account + {% else %} + โŒ Role {{ extracted_role_name }} not found or access denied + Error: {{ karpenter_role_check.stderr }} + {% endif %} + when: karpenter_role_check is defined + + - name: Validate role trust relationship (basic check) + shell: | + aws iam get-role --role-name {{ extracted_role_name }} \ + --query 'Role.AssumeRolePolicyDocument' --output text --region {{ AWS_REGION }} + environment: + AWS_ACCESS_KEY_ID: "{{ AWS_ACCESS_KEY_ID }}" + AWS_SECRET_ACCESS_KEY: "{{ AWS_SECRET_ACCESS_KEY }}" + AWS_DEFAULT_REGION: "{{ AWS_REGION }}" + register: role_trust_policy + failed_when: false + when: + - karpenter_role_check.rc == 0 + - extracted_role_name is defined + + - name: Check if trust policy includes OIDC + debug: + msg: | + Trust Policy Check: + {% if role_trust_policy.rc == 0 %} + {% if 'oidc-provider' in role_trust_policy.stdout %} + โœ… Role trust policy includes OIDC provider configuration + {% else %} + โš ๏ธ Role trust policy may not include OIDC provider. + Manual verification recommended. + {% endif %} + {% else %} + โŒ Could not retrieve trust policy + {% endif %} + when: role_trust_policy is defined + + when: + - AUTONODE_TESTING_ENABLED | default(false) + - AUTONODE_IAM_VALIDATION.validate_before_cluster_creation | default(true) + - aws_cli_check.rc == 0 + - extracted_role_name is defined + +- name: Validate ROSA CLI for AutoNode + block: + - name: Check ROSA CLI availability + shell: which rosa + register: rosa_cli_check + failed_when: rosa_cli_check.rc != 0 + + - name: Verify ROSA login status + shell: rosa whoami + register: rosa_whoami_check + failed_when: rosa_whoami_check.rc != 0 + + - name: Display ROSA status + debug: + msg: | + ROSA CLI Status: + โœ… ROSA CLI available + โœ… Logged in as: {{ rosa_whoami_check.stdout.split('\n')[0] }} + + rescue: + - name: ROSA CLI validation failed + fail: + msg: | + ROSA CLI validation failed. AutoNode testing requires: + 1. ROSA CLI installed and in PATH + 2. Logged into ROSA environment + + Run: rosa login --token= + Or: rosa login + + when: AUTONODE_TESTING_ENABLED | default(false) + +- name: Check CAPA controller readiness for AutoNode + block: + - name: Check if CAPA controller is running + shell: | + oc get pods -n {{ capa_system_namespace }} -l app.kubernetes.io/name=cluster-api-provider-aws + register: capa_controller_check + failed_when: false + + - name: Display CAPA controller status + debug: + msg: | + CAPA Controller Status: + {% if capa_controller_check.rc == 0 and 'Running' in capa_controller_check.stdout %} + โœ… CAPA controller is running + {% else %} + โš ๏ธ CAPA controller status unclear or not running + {% endif %} + + - name: Check CAPA controller version/support for AutoNode + shell: | + oc get deployment -n {{ capa_system_namespace }} capa-controller-manager \ + -o jsonpath='{.spec.template.spec.containers[0].image}' + register: capa_image_check + failed_when: false + + - name: Display CAPA image information + debug: + msg: | + CAPA Controller Image: {{ capa_image_check.stdout | default('Could not retrieve') }} + Note: Ensure this image includes PR #5686 AutoNode support + + when: + - AUTONODE_TESTING_ENABLED | default(false) + - capa_system_namespace is defined + +- name: AutoNode validation summary + debug: + msg: | + ๐ŸŽฏ AutoNode Validation Summary: + ================================ + + Configuration: + - AutoNode Mode: {{ AUTONODE_DEFAULT_MODE | default('disabled') }} + - Karpenter Role: {{ AUTONODE_KARPENTER_ROLE_ARN | default('Not configured') }} + - AWS Region: {{ AWS_REGION | default('Not set') }} + + Prerequisites: + {% if aws_cli_check.rc == 0 %}โœ…{% else %}โŒ{% endif %} AWS CLI + {% if rosa_whoami_check.rc == 0 %}โœ…{% else %}โŒ{% endif %} ROSA Login + {% if extracted_role_name is defined %}โœ…{% else %}โŒ{% endif %} Role ARN Format + {% if karpenter_role_check.rc == 0 %}โœ…{% else %}โš ๏ธ {% endif %} IAM Role Exists + + Ready for AutoNode Testing: {{ 'YES' if (aws_cli_check.rc == 0 and rosa_whoami_check.rc == 0 and extracted_role_name is defined) else 'NO - Fix issues above' }} + when: AUTONODE_TESTING_ENABLED | default(false) \ No newline at end of file diff --git a/ui/frontend/src/pages/WhatCanIHelp.js b/ui/frontend/src/pages/WhatCanIHelp.js index eff1a83..da9200d 100644 --- a/ui/frontend/src/pages/WhatCanIHelp.js +++ b/ui/frontend/src/pages/WhatCanIHelp.js @@ -2535,11 +2535,6 @@ Need detailed help? Click "Help me configure everything" for step-by-step guidan onClick={() => toggleSection('rosa-hcp-resources')} >
- - - -
- Configure ROSA Resources
+ {/* Prefix Configuration */} +
+
+

toggleSection('prefix-configuration')} + > + + + + Prefix +
+ + + +
+ + + +

+ +
+ + {!collapsedSections.has('prefix-configuration') && ( + <> + {savedPrefix ? ( +
+
{savedPrefix}
+
+ ) : ( +
+
No prefix configured
+
Click "Enter Prefix" to set a resource naming prefix
+
+ )} + + )} +
+ {/* Account Roles */}
-

+

toggleSection('account-roles')} + > Account Roles ({rosaHcpResources.accountRoles.length})
+ + +

+ {!collapsedSections.has('account-roles') && ( + <> {rosaHcpResources.accountRoles.length === 0 ? (
No account roles found
@@ -2685,28 +2742,37 @@ Need detailed help? Click "Help me configure everything" for step-by-step guidan
)} + + )}
{/* Operator Roles */}
-

+

toggleSection('operator-roles')} + > Operator Roles ({rosaHcpResources.operatorRoles.length})
+ + +

+ {!collapsedSections.has('operator-roles') && ( + <> {rosaHcpResources.operatorRoles.length === 0 ? (
No operator roles found
@@ -2781,30 +2849,125 @@ Need detailed help? Click "Help me configure everything" for step-by-step guidan
)} + + )}
{/* OIDC Configuration */}
-

- - - - OIDC Configuration -

-
-
OIDC Issuer URL:
-
{rosaHcpResources.oidcId}
+
+

toggleSection('oidc-configuration')} + > + + + + OIDC Configuration +
+ + + +
+ + + +

+
+ + +
+ + {!collapsedSections.has('oidc-configuration') && ( + <> + {rosaHcpResources.oidcId ? ( +
+
OIDC Issuer URL:
+
{rosaHcpResources.oidcId}
+
+ ) : ( +
+
No OIDC provider configured
+
Click "Create OIDC Provider" to set up authentication
+
+ )} + + )}
{/* Subnets */}
-

- - - - Subnets ({rosaHcpResources.subnets.length}) -

+
+

toggleSection('subnets')} + > + + + + Subnets ({rosaHcpResources.subnets.length}) +
+ + + +
+ + + +

+
+ + +
+
+ {!collapsedSections.has('subnets') && (
{rosaHcpResources.subnets.map((subnet, index) => (
@@ -2824,6 +2987,7 @@ Need detailed help? Click "Help me configure everything" for step-by-step guidan
))}
+ )}
{rosaHcpResources.lastChecked && ( @@ -2837,18 +3001,6 @@ Need detailed help? Click "Help me configure everything" for step-by-step guidan )}
- {/* Manage ROSA HCP Clusters - Moved below Configure ROSA HCP Resources */} -
-

toggleSection('rosa-hcp-resources')} - > -
- - - -
- Configure ROSA Resources
--output json | jq '.aws.sts.oidc_endpoint_url' +ROSA_OIDC_PROVIDER_URL: "rh-oidc.s3.us-east-1.amazonaws.com/your-oidc-id" + +#==== AutoNode Testing Scenarios ==== +# Define different test scenarios for AutoNode +AUTONODE_TEST_SCENARIOS: + - name: "autonode-enabled" + description: "Test cluster with AutoNode enabled using Karpenter" + autonode_mode: "enabled" + karpenter_role_arn: "arn:aws:iam::{{ AWS_ACCOUNT_ID }}:role/KarpenterNodeRole" + cluster_name_suffix: "autonode" + enable_traditional_machinepool: false + + - name: "autonode-disabled" + description: "Test cluster with AutoNode disabled using traditional machine pools" + autonode_mode: "disabled" + karpenter_role_arn: "" + cluster_name_suffix: "traditional" + enable_traditional_machinepool: true + + - name: "autonode-transition" + description: "Test transitioning cluster from disabled to enabled AutoNode" + autonode_mode: "disabled" # Start disabled, then enable in test + karpenter_role_arn: "arn:aws:iam::{{ AWS_ACCOUNT_ID }}:role/KarpenterNodeRole" + cluster_name_suffix: "transition" + enable_traditional_machinepool: true + test_mode_transition: true + +#==== AutoNode IAM Role Validation ==== +# Settings for validating Karpenter IAM role setup +AUTONODE_IAM_VALIDATION: + # Enable pre-flight IAM role validation + validate_before_cluster_creation: true + + # Timeout for IAM role validation (seconds) + validation_timeout: 300 + + # Required IAM permissions to check (subset for basic validation) + required_permissions: + - "ec2:RunInstances" + - "ec2:TerminateInstances" + - "ec2:DescribeInstances" + - "iam:PassRole" + + # Trust relationship validation + validate_trust_relationship: true + expected_oidc_provider: "{{ ROSA_OIDC_PROVIDER_URL }}" + +#==== AutoNode Testing Configuration ==== +AUTONODE_TESTING: + # Timeout for AutoNode cluster creation (seconds) + cluster_creation_timeout: 3600 # 1 hour + + # Timeout for Karpenter node provisioning tests (seconds) + node_provisioning_timeout: 600 # 10 minutes + + # Test workload configuration for validating Karpenter + test_workload: + name: "autonode-test-workload" + replicas: 5 + resource_requests: + cpu: "500m" + memory: "1Gi" + + # Cleanup configuration + cleanup_on_success: true + cleanup_on_failure: false + + # Generate detailed test reports + generate_reports: true + reports_dir: "{{ output_dir }}/autonode-reports" + +#==== AutoNode Template Configuration ==== +AUTONODE_TEMPLATES: + # Base template directory for AutoNode configurations + template_dir: "templates/autonode" + + # Default cluster name prefix for AutoNode tests + cluster_name_prefix: "rosa-autonode" + + # Default domain prefix for AutoNode clusters + domain_prefix_template: "autonode-{{ ansible_date_time.epoch }}" + + # Default tags for AutoNode test clusters + default_tags: + AutoNodeTesting: "true" + KarpenterEnabled: "{{ 'true' if AUTONODE_DEFAULT_MODE == 'enabled' else 'false' }}" + TestType: "PR5686-Validation" + CreatedBy: "automation-capi" + +#==== Advanced AutoNode Configuration ==== +# These settings are for advanced testing scenarios +AUTONODE_ADVANCED: + # Test different Karpenter configurations + test_spot_instances: true + test_mixed_instance_types: true + + # Node consolidation testing + test_node_consolidation: true + consolidation_wait_time: 300 # 5 minutes + + # Scaling behavior testing + test_scale_up: true + test_scale_down: true + scale_test_pod_count: 10 + + # Resource limits for Karpenter testing + max_nodes: 10 + max_cpu: "100" + max_memory: "400Gi" \ No newline at end of file