diff --git a/.claude/commands/jetlag-review.md b/.claude/commands/jetlag-review.md new file mode 100644 index 00000000..d4783c49 --- /dev/null +++ b/.claude/commands/jetlag-review.md @@ -0,0 +1,42 @@ +--- +description: Fetch and review a GitHub PR +--- + +You are tasked with reviewing a GitHub Pull Request. Follow these steps: + +1. **Fetch PR details**: Use `gh pr view {{arg:1}}` to get PR information (title, description, author, files changed) + +2. **Checkout the PR**: Use `gh pr checkout {{arg:1}}` to check out the PR branch locally + +3. **Analyze the changes**: + - Use `gh pr diff {{arg:1}}` to see the full diff + - Read the modified files to understand the context + - Pay attention to the PR description and any linked issues + +4. **Provide a comprehensive review** covering: + - **Summary**: Brief overview of what the PR does + - **Code Quality**: Architecture, patterns, readability, maintainability + - **Potential Issues**: Bugs, edge cases, security concerns, performance issues + - **Testing**: Are tests adequate? Are there missing test cases? + - **Documentation**: Is documentation updated if needed? + - **Ansible-Specific Checks**: + - **Idempotency**: Tasks should be idempotent (can be run multiple times safely) + - **Module Selection**: Use of appropriate Ansible modules (avoid shell/command when native modules exist) + - **Variable Naming**: Follow consistent naming conventions, proper scoping (group_vars, host_vars, role defaults) + - **Task Naming**: All tasks have clear, descriptive names + - **YAML Formatting**: Proper YAML syntax, consistent indentation, use of multi-line strings where appropriate + - **Handlers**: Proper use of handlers for service restarts and notify/listen patterns + - **Jinja2 Templates**: Correct usage of filters, tests, and variable references + - **Error Handling**: Use of failed_when, changed_when, ignore_errors appropriately + - **Tags**: Meaningful tags for task organization and selective execution + - **Secrets Management**: No plain-text passwords, proper use of ansible-vault if applicable + - **Conditionals**: Proper use of when clauses, check for undefined variables + - **Loops**: Efficient use of loop, with_items, etc. + - **Role Structure**: Follows standard role directory structure if roles are modified + - **Deprecations**: No use of deprecated Ansible features or modules + - **Performance**: Consideration of serial, async, poll for long-running tasks + - **Suggestions**: Specific, actionable improvements with code examples where helpful + +5. **Format your review** in a clear, structured markdown format that's easy to read + +Be thorough but constructive. Focus on meaningful feedback that helps improve the code. \ No newline at end of file diff --git a/CLAUDE.md b/CLAUDE.md new file mode 100644 index 00000000..554d7a44 --- /dev/null +++ b/CLAUDE.md @@ -0,0 +1,201 @@ +# CLAUDE.md + +This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. + +## Overview + +Jetlag is an OpenShift cluster deployment tool that uses Ansible automation to deploy Multi Node OpenShift (MNO) and Single Node OpenShift (SNO) clusters via the Assisted Installer. It supports Red Hat performance labs, Scale Labs, and IBMcloud environments. + +## Essential Commands + +### Environment Setup +```bash +# Bootstrap ansible virtual environment (run from repo root) +source bootstrap.sh + +# Red Hat Labs (Scale Lab/Performance Lab) +# Copy and edit configuration file +cp ansible/vars/all.sample.yml ansible/vars/all.yml +# Edit all.yml with your lab configuration (lab, lab_cloud, cluster_type, etc.) + +# Create inventory file for your lab environment +ansible-playbook ansible/create-inventory.yml +# Setup bastion host (replace cloud99.local with your inventory file) +ansible-playbook -i ansible/inventory/cloud99.local ansible/setup-bastion.yml + +# IBMcloud +# Copy and edit configuration file +cp ansible/vars/ibmcloud.sample.yml ansible/vars/ibmcloud.yml +# Edit ibmcloud.yml with your IBMcloud configuration (cluster_type, worker_node_count, etc.) + +# Create inventory file from IBMcloud CLI data +ansible-playbook ansible/ibmcloud-create-inventory.yml +# Setup bastion host for IBMcloud +ansible-playbook -i ansible/inventory/ibmcloud.local ansible/ibmcloud-setup-bastion.yml +``` + +### Cluster Deployment +```bash +# Red Hat Labs (Scale Lab/Performance Lab) +# Deploy Multi Node OpenShift cluster +ansible-playbook -i ansible/inventory/cloud99.local ansible/mno-deploy.yml + +# Deploy Single Node OpenShift clusters +ansible-playbook -i ansible/inventory/cloud99.local ansible/sno-deploy.yml + +# Deploy Virtual Multi Node OpenShift (VMNO) - requires hypervisor setup first +ansible-playbook -i ansible/inventory/cloud99.local ansible/hv-setup.yml +ansible-playbook -i ansible/inventory/cloud99.local ansible/hv-vm-create.yml +ansible-playbook -i ansible/inventory/cloud99.local ansible/mno-deploy.yml + +# IBMcloud +# Deploy Multi Node OpenShift on IBMcloud +ansible-playbook -i ansible/inventory/ibmcloud.local ansible/ibmcloud-mno-deploy.yml + +# Deploy Single Node OpenShift on IBMcloud +ansible-playbook -i ansible/inventory/ibmcloud.local ansible/ibmcloud-sno-deploy.yml +``` + +### Cluster Management +```bash +# Scale out MNO cluster +ansible-playbook ansible/ocp-scale-out.yml + +# Setup hypervisor nodes for VMs +ansible-playbook ansible/hv-setup.yml + +# Create VMs on hypervisor nodes +ansible-playbook ansible/hv-vm-create.yml + +# Delete VMs from hypervisor nodes +ansible-playbook ansible/hv-vm-delete.yml + +# Replace VMs on hypervisor nodes (delete + recreate) +ansible-playbook ansible/hv-vm-replace.yml + +# Sync OpenShift releases +ansible-playbook ansible/sync-ocp-release.yml +``` + +## Project Architecture + +### Key Configuration Files +- `ansible/vars/all.yml` - Main configuration for Red Hat labs (copy from `ansible/vars/all.sample.yml`) +- `ansible/vars/ibmcloud.yml` - IBMcloud-specific configuration (copy from `ansible/vars/ibmcloud.sample.yml`) +- `pull-secret.txt` - OpenShift pull secret (place in repo root) +- `ansible/inventory/$CLOUDNAME.local` - Generated inventory file for your lab + +### Critical Variables +- `lab`: Environment type (`performancelab`, `scalelab`, or `ibmcloud`) +- `lab_cloud`: Specific cloud allocation (e.g., `cloud42`) +- `cluster_type`: Either `mno`, `sno`, or `vmno` +- `worker_node_count`: Number of bare metal worker nodes for MNO clusters +- `hybrid_worker_count`: Number of virtual worker nodes for hybrid MNO clusters (requires hypervisor setup) +- `ocp_build`: OpenShift build type (`ga`, `dev`, or `ci`) +- `ocp_version`: OpenShift version (e.g., `latest-4.20`) + +### Ansible Role Structure +Jetlag uses a modular Ansible role architecture: + +- **Bastion roles**: `bastion-*` roles configure the bastion host with services like Assisted Installer, DNS, registry +- **Installation roles**: `install-cluster`, `sno-post-cluster-install` handle cluster deployment +- **Hypervisor roles**: `hv-*` roles manage VM infrastructure on hypervisor nodes +- **Utility roles**: `boot-iso`, `sync-*` roles provide supporting functionality + +### Cluster Types +- **MNO (Multi Node OpenShift)**: 3 control-plane nodes + configurable bare metal worker nodes +- **SNO (Single Node OpenShift)**: Single node clusters, one per available machine +- **VMNO (Virtual Multi Node OpenShift)**: MNO cluster using VMs instead of bare metal (Jetlag-specific term) +- **Hybrid MNO**: MNO cluster with both bare metal and virtual worker nodes + +#### Virtual and Hybrid Cluster Details +- **VMNO clusters** allow multi-node deployment with fewer physical machines (minimum: 1 bastion + 1-2 hypervisors) +- **Hybrid clusters** combine bare metal workers (`worker_node_count`) with virtual workers (`hybrid_worker_count`) +- **Hypervisor nodes**: Unused machines become VM hosts for additional clusters or hybrid workers +- Virtual workers are created as VMs on hypervisor nodes and added to the worker inventory section +- VM placement distributed across available hypervisors based on hardware-specific VM count configurations + +### Lab Environment Support +- **Performance Lab**: Dell r750, 740xd hardware +- **Scale Lab**: Various Dell models (r750, r660, r650, r640, r630, fc640), Supermicro systems +- **IBMcloud**: Supermicro E5-2620, Lenovo SR630 bare metal + +## Development Workflow + +### Standard MNO/SNO Deployment (Red Hat Labs) +1. Edit `ansible/vars/all.yml` with your lab configuration +2. Run `ansible-playbook ansible/create-inventory.yml` to generate inventory +3. Run `ansible-playbook -i ansible/inventory/cloud99.local ansible/setup-bastion.yml` to configure bastion host +4. Run deployment playbook (`ansible/mno-deploy.yml` or `ansible/sno-deploy.yml`) +5. Access clusters using kubeconfig files in `/root/mno/` or `/root/sno/` + +### IBMcloud MNO/SNO Deployment +1. Edit `ansible/vars/ibmcloud.yml` with your IBMcloud configuration +2. Run `ansible-playbook ansible/ibmcloud-create-inventory.yml` to generate `ansible/inventory/ibmcloud.local` from IBMcloud CLI data +3. Run `ansible-playbook -i ansible/inventory/ibmcloud.local ansible/ibmcloud-setup-bastion.yml` to configure bastion host +4. Run deployment playbook (`ansible-playbook -i ansible/inventory/ibmcloud.local ansible/ibmcloud-mno-deploy.yml` or `ansible/ibmcloud-sno-deploy.yml`) +5. Access clusters using kubeconfig files in `/root/mno/` or `/root/sno/` + +### VMNO Deployment (Red Hat Labs Only) +1. Edit `ansible/vars/all.yml` with `cluster_type: vmno` and VM-specific settings +2. Edit `ansible/vars/hv.yml` for hypervisor configuration +3. Run `ansible-playbook ansible/create-inventory.yml` to generate inventory with VM entries +4. Run `ansible-playbook -i ansible/inventory/cloud99.local ansible/setup-bastion.yml` to configure bastion host +5. Run `ansible-playbook -i ansible/inventory/cloud99.local ansible/hv-setup.yml` to configure hypervisor nodes +6. Run `ansible-playbook -i ansible/inventory/cloud99.local ansible/hv-vm-create.yml` to create VMs +7. Run `ansible-playbook -i ansible/inventory/cloud99.local ansible/mno-deploy.yml` to deploy cluster to VMs +8. Access cluster using kubeconfig in `/root/vmno/` + +### Hybrid Cluster Deployment (Red Hat Labs Only) +1. Configure both `worker_node_count` (bare metal) and `hybrid_worker_count` (VMs) in `ansible/vars/all.yml` +2. Ensure hypervisor nodes are available in allocation +3. Follow standard Red Hat Labs MNO workflow - hybrid workers automatically added to inventory + +## Special Considerations + +- Inventory files are generated, not manually created (except for "Bring Your Own Lab" scenarios) +- Bastion machine is always the first machine in allocation and hosts Assisted Installer +- Unused machines in MNO deployments become hypervisor nodes +- SNO deployments create one cluster per available machine after bastion +- Public VLAN support available for routable environments (`public_vlan: true`) +- Disconnected/air-gapped deployments supported with registry mirroring + +### Virtual and Hybrid Cluster Considerations +- **Hardware Requirements**: VMNO requires additional CPU/memory capacity for VM overhead +- **VM Management**: Use `hv-vm-delete.yml` or `hv-vm-replace.yml` between VMNO deployments to avoid conflicts +- **Resource Planning**: Configure `hw_vm_counts` per hardware type to optimize VM distribution across hypervisors +- **Disk Configuration**: VMs can span multiple disks on hypervisors (e.g., default disk + nvme for higher VM counts) +- **Network Configuration**: VMs use libvirt networking with static IP assignment from controlplane network range +- **Scale Lab/Performance Lab Only**: VMNO and hybrid deployments only supported in Scale Lab and Performance Lab environments + +## Troubleshooting and Tips + +When encountering issues with Jetlag deployments, consult these comprehensive documentation resources: + +### Primary Troubleshooting Resources +- **[docs/troubleshooting.md](docs/troubleshooting.md)**: Comprehensive troubleshooting guide covering: + - Common deployment issues and solutions + - Hardware-specific problems (Dell, Supermicro) + - Bastion configuration and recovery procedures + - BMC/iDRAC reset procedures + - Virtual media and discovery issues + +- **[docs/tips-and-vars.md](docs/tips-and-vars.md)**: Advanced configuration guidance including: + - Network interface configuration and overrides + - Install disk configuration options + - OCP version management + - NVMe disk configuration for install and etcd + - Post-deployment tasks and optimizations + - Bastion registry management + +### Common Issues to Check First +1. **Network Configuration**: Verify `bastion_lab_interface` and `bastion_controlplane_interface` match your hardware +2. **BMC Access**: Ensure BMC credentials and network connectivity are correct +3. **DNS Services**: Check bastion DNS services are running and configured correctly +4. **Disk Selection**: Verify install disk paths and available storage +5. **Resource Limits**: Ensure sufficient CPU/memory for VM deployments (VMNO/hybrid) + +### When to Consult Documentation +- Before troubleshooting deployment failures, read the relevant sections in `troubleshooting.md` +- For advanced configuration needs, reference the specific sections in `tips-and-vars.md` +- When working with specific hardware vendors, check the hardware-specific troubleshooting sections \ No newline at end of file