|
| 1 | +# ARO-HCP Development Environment |
| 2 | + |
| 3 | +This directory contains Taskfiles for setting up an AKS management cluster with ARO-HCP (Azure Red Hat OpenShift Hosted Control Plane). |
| 4 | + |
| 5 | +## Prerequisites |
| 6 | + |
| 7 | +- [Task](https://taskfile.dev/) - Install with `brew install go-task/tap/go-task` |
| 8 | +- [Azure CLI](https://docs.microsoft.com/en-us/cli/azure/install-azure-cli) |
| 9 | +- [ccoctl](https://github.com/openshift/cloud-credential-operator) - Cloud Credential Operator CLI |
| 10 | +- [kubectl](https://kubernetes.io/docs/tasks/tools/) |
| 11 | +- [jq](https://stedolan.github.io/jq/) |
| 12 | +- [gum](https://github.com/charmbracelet/gum) - For styled terminal output |
| 13 | +- [hypershift CLI](https://hypershift-docs.netlify.app/) - Either in PATH or set `HYPERSHIFT_BINARY_PATH` |
| 14 | +- An Azure subscription with appropriate permissions |
| 15 | +- A pull secret from [console.redhat.com](https://console.redhat.com/openshift/install/pull-secret) |
| 16 | + |
| 17 | +## Quick Start |
| 18 | + |
| 19 | +1. **Create Azure credentials file:** |
| 20 | + ```bash |
| 21 | + cp azure-credentials.json.example azure-credentials.json |
| 22 | + # Edit azure-credentials.json with your SP credentials |
| 23 | + ``` |
| 24 | + |
| 25 | +2. **Configure environment:** |
| 26 | + ```bash |
| 27 | + cp config.example.env .envrc |
| 28 | + # Edit .envrc with your values (PREFIX, OIDC_ISSUER_NAME, RELEASE_IMAGE) |
| 29 | + direnv allow # or source .envrc |
| 30 | + ``` |
| 31 | + |
| 32 | +3. **Login to Azure:** |
| 33 | + ```bash |
| 34 | + task prereq:login |
| 35 | + ``` |
| 36 | + |
| 37 | +4. **Create management cluster (first time):** |
| 38 | + ```bash |
| 39 | + task mgmt:create |
| 40 | + ``` |
| 41 | + |
| 42 | +5. **Create hosted cluster:** |
| 43 | + ```bash |
| 44 | + task cluster:create |
| 45 | + ``` |
| 46 | + |
| 47 | +6. **Destroy hosted cluster:** |
| 48 | + ```bash |
| 49 | + task cluster:destroy |
| 50 | + ``` |
| 51 | + |
| 52 | +7. **Destroy management cluster:** |
| 53 | + ```bash |
| 54 | + task mgmt:destroy |
| 55 | + ``` |
| 56 | + |
| 57 | +## Usage Pattern |
| 58 | + |
| 59 | +The typical workflow is: |
| 60 | + |
| 61 | +1. **Once every few months:** `task mgmt:create` - Creates a long-lived AKS management cluster |
| 62 | +2. **Every few days:** `task cluster:create` / `task cluster:destroy` - Iterate on hosted clusters |
| 63 | +3. **Rarely:** `task mgmt:destroy` - When done with the environment |
| 64 | + |
| 65 | +## Primary Tasks |
| 66 | + |
| 67 | +| Task | Description | |
| 68 | +|------|-------------| |
| 69 | +| `task mgmt:create` | Create management cluster (AKS) with all dependencies | |
| 70 | +| `task mgmt:destroy` | Destroy management cluster | |
| 71 | +| `task cluster:create` | Create hosted cluster (most frequent operation) | |
| 72 | +| `task cluster:destroy` | Destroy hosted cluster | |
| 73 | + |
| 74 | +## Utility Tasks |
| 75 | + |
| 76 | +| Task | Description | |
| 77 | +|------|-------------| |
| 78 | +| `task prereq:login` | Login to Azure using azure-credentials.json | |
| 79 | +| `task prereq:whoami` | Show current Azure identity vs credentials file | |
| 80 | +| `task prereq:validate` | Validate all prerequisites including Azure identity | |
| 81 | +| `task prereq:show-config` | Display current configuration | |
| 82 | +| `task first-time` | One-time setup only (Key Vault, OIDC, identities) | |
| 83 | +| `task teardown-all` | Complete teardown including one-time resources | |
| 84 | +| `task status` | Show status of all components | |
| 85 | + |
| 86 | +## Standard Workflow (Recommended) |
| 87 | + |
| 88 | +For most users, these commands are all you need: |
| 89 | + |
| 90 | +**First-time setup:** |
| 91 | +```bash |
| 92 | +task prereq:login # Login to Azure |
| 93 | +task mgmt:create # Create everything (~20 min) |
| 94 | +``` |
| 95 | + |
| 96 | +**Daily use:** |
| 97 | +```bash |
| 98 | +task cluster:create # Create hosted cluster |
| 99 | +task cluster:destroy # Destroy hosted cluster |
| 100 | +``` |
| 101 | + |
| 102 | +**Cleanup:** |
| 103 | +```bash |
| 104 | +task mgmt:destroy # Destroy management cluster |
| 105 | +task teardown-all # Complete teardown including persistent resources |
| 106 | +``` |
| 107 | + |
| 108 | +## Step-by-Step Workflow (For Debugging) |
| 109 | + |
| 110 | +Use this when you need granular control for debugging or testing individual steps. |
| 111 | + |
| 112 | +**Legend:** |
| 113 | +| Symbol | Meaning | |
| 114 | +|--------|---------| |
| 115 | +| `○` | Aggregator - only orchestrates subtasks, can skip if you run all children manually | |
| 116 | +| `●` | Does work - has actual commands/logic, must run this task | |
| 117 | +| `⚠` | Has internal subtasks - CANNOT skip, must use this parent task | |
| 118 | + |
| 119 | +``` |
| 120 | +● prereq:login # Login using azure-credentials.json |
| 121 | +● prereq:whoami # Verify identity matches |
| 122 | +● prereq:validate # Validate all prerequisites |
| 123 | +
|
| 124 | +○ mgmt:create # Aggregator - orchestrates all setup |
| 125 | +├── ● prereq:validate |
| 126 | +├── ○ keyvault:setup # Aggregator |
| 127 | +│ ├── ● keyvault:create |
| 128 | +│ ├── ● keyvault:create-sps ⚠ has internal create-sp |
| 129 | +│ ├── ● keyvault:generate-sp-jsons |
| 130 | +│ ├── ● keyvault:store-creds |
| 131 | +│ └── ● keyvault:generate-cp-json |
| 132 | +├── ● oidc:create ⚠ has internal create-issuer |
| 133 | +│ └── ● oidc:create-keypair |
| 134 | +├── ○ dataplane:create # Aggregator |
| 135 | +│ ├── ● dataplane:create-identities ⚠ has internal |
| 136 | +│ ├── ● dataplane:create-federated-creds ⚠ has internal |
| 137 | +│ └── ● dataplane:generate-dp-json |
| 138 | +├── ● aks:create-identities |
| 139 | +├── ○ aks:create # Aggregator |
| 140 | +│ ├── ● aks:create-rg |
| 141 | +│ ├── ● aks:create-cluster |
| 142 | +│ ├── ● aks:get-kubeconfig |
| 143 | +│ └── ● aks:assign-kv-role |
| 144 | +├── ○ dns:setup # Aggregator |
| 145 | +│ ├── ● dns:create-zone |
| 146 | +│ ├── ● dns:delegate-zone |
| 147 | +│ ├── ● dns:create-sp |
| 148 | +│ └── ● dns:create-secret |
| 149 | +└── ● operator:install |
| 150 | + └── ● operator:apply-crds |
| 151 | +
|
| 152 | +● operator:wait # Wait for operator (standalone) |
| 153 | +● operator:verify # Verify operator status (standalone) |
| 154 | +● operator:logs # Show operator logs (standalone) |
| 155 | +
|
| 156 | +● status # Show status of all components |
| 157 | +
|
| 158 | +○ cluster:create # Aggregator |
| 159 | +├── ● cluster:create-rgs |
| 160 | +├── ● cluster:create-network ⚠ has internal create-nsg, create-vnet |
| 161 | +└── ● cluster:create-hc |
| 162 | +
|
| 163 | +● cluster:wait # Wait for cluster ready |
| 164 | +● cluster:get-kubeconfig # Get kubeconfig |
| 165 | +● cluster:show # Show cluster status |
| 166 | +
|
| 167 | +○ cluster:destroy # Aggregator |
| 168 | +├── ● cluster:destroy-hc |
| 169 | +└── ● cluster:delete-rgs |
| 170 | +
|
| 171 | +○ mgmt:destroy # Aggregator |
| 172 | +├── ● operator:uninstall |
| 173 | +├── ● dns:delete |
| 174 | +└── ● aks:delete |
| 175 | +
|
| 176 | +○ teardown-all # Aggregator |
| 177 | +├── ○ cluster:destroy |
| 178 | +├── ○ mgmt:destroy |
| 179 | +├── ● dataplane:delete |
| 180 | +├── ● oidc:delete |
| 181 | +├── ● keyvault:delete |
| 182 | +└── ● aks:delete-identities |
| 183 | +``` |
| 184 | + |
| 185 | +**Important:** |
| 186 | +- Tasks marked with `⚠` have internal subtasks that you CANNOT run directly |
| 187 | +- Example: `oidc:create` calls both `create-keypair` (public) AND `create-issuer` (internal) |
| 188 | +- Running only `oidc:create-keypair` will NOT create the OIDC issuer - you must run `oidc:create` |
| 189 | + |
| 190 | +## Task Namespaces |
| 191 | + |
| 192 | +### prereq: - Prerequisites and Azure Authentication |
| 193 | +- `task prereq:login` - Login to Azure using azure-credentials.json |
| 194 | +- `task prereq:whoami` - Show current Azure identity and verify it matches credentials file |
| 195 | +- `task prereq:validate` - Validate tools, environment variables, and Azure identity |
| 196 | +- `task prereq:show-config` - Display current configuration |
| 197 | + |
| 198 | +### keyvault: - Key Vault and Control Plane SPs |
| 199 | +- `task keyvault:setup` - Complete Key Vault setup (idempotent) |
| 200 | +- `task keyvault:rotate-creds` - Rotate all SP credentials |
| 201 | +- `task keyvault:delete` - Delete Key Vault and SPs |
| 202 | + |
| 203 | +### oidc: - OIDC Provider |
| 204 | +- `task oidc:create` - Create OIDC provider (idempotent) |
| 205 | +- `task oidc:delete` - Delete OIDC issuer |
| 206 | + |
| 207 | +### dataplane: - Data Plane Managed Identities |
| 208 | +- `task dataplane:create` - Complete data plane setup (idempotent) |
| 209 | +- `task dataplane:delete` - Delete data plane identities |
| 210 | + |
| 211 | +### aks: - AKS Management Cluster |
| 212 | +- `task aks:create` - Complete AKS setup |
| 213 | +- `task aks:get-kubeconfig` - Get/restore AKS kubeconfig (re-run if file is lost) |
| 214 | +- `task aks:delete` - Delete AKS cluster |
| 215 | +- `task aks:show` - Show AKS status |
| 216 | + |
| 217 | +### dns: - External DNS |
| 218 | +- `task dns:setup` - Complete DNS setup (idempotent) |
| 219 | +- `task dns:delete` - Delete DNS resources |
| 220 | + |
| 221 | +### operator: - HyperShift Operator |
| 222 | +- `task operator:install` - Install HyperShift operator (ARO-HCP mode) |
| 223 | +- `task operator:verify` - Verify operator installation |
| 224 | +- `task operator:uninstall` - Uninstall operator |
| 225 | + |
| 226 | +### cluster: - Hosted Cluster |
| 227 | +- `task cluster:create-hc` - Create hosted cluster |
| 228 | +- `task cluster:destroy-hc` - Destroy hosted cluster |
| 229 | +- `task cluster:get-kubeconfig` - Get hosted cluster kubeconfig |
| 230 | +- `task cluster:show` - Show hosted cluster status |
| 231 | +- `task cluster:wait` - Wait for cluster to be ready |
| 232 | + |
| 233 | +## Required Configuration |
| 234 | + |
| 235 | +| File/Variable | Description | |
| 236 | +|---------------|-------------| |
| 237 | +| `AZURE_CREDS` | Path to azure-credentials.json (contains subscriptionId, tenantId, clientId, clientSecret) | |
| 238 | +| `PULL_SECRET` | Path to pull secret file | |
| 239 | +| `PREFIX` | Unique prefix for all resources | |
| 240 | +| `OIDC_ISSUER_NAME` | Unique name for OIDC storage account | |
| 241 | +| `RELEASE_IMAGE` | OpenShift release image | |
| 242 | + |
| 243 | +## Optional Environment Variables |
| 244 | + |
| 245 | +| Variable | Default | Description | |
| 246 | +|----------|---------|-------------| |
| 247 | +| `LOCATION` | `eastus` | Azure region for resources | |
| 248 | +| `PERSISTENT_RG_NAME` | `os4-common` | Shared resource group | |
| 249 | +| `PARENT_DNS_ZONE` | `hypershift.azure.devcluster.openshift.com` | Parent DNS zone | |
| 250 | +| `AKS_NODE_COUNT` | `3` | Number of AKS nodes | |
| 251 | +| `AKS_NODE_VM_SIZE` | `Standard_D4s_v4` | VM size for AKS nodes | |
| 252 | +| `NODE_POOL_REPLICAS` | `2` | Number of worker nodes | |
| 253 | +| `HYPERSHIFT_IMAGE` | (none) | Override HyperShift operator image | |
| 254 | +| `HYPERSHIFT_BINARY_PATH` | (none) | Path to hypershift binary | |
| 255 | +| `KUBECONFIG` | `./mgmt-kubeconfig` | Path where mgmt cluster kubeconfig will be saved | |
| 256 | + |
| 257 | +## Generated Files |
| 258 | + |
| 259 | +The following files are generated during setup: |
| 260 | + |
| 261 | +| File | Description | |
| 262 | +|------|-------------| |
| 263 | +| `mgmt-kubeconfig` | Management (AKS) cluster kubeconfig - created by `task aks:get-kubeconfig` | |
| 264 | +| `cp-output.json` | Control plane managed identities | |
| 265 | +| `dp-output.json` | Data plane managed identities | |
| 266 | +| `serviceaccount-signer.public` | SA token issuer public key | |
| 267 | +| `serviceaccount-signer.private` | SA token issuer private key | |
| 268 | +| `external-dns-creds.json` | External DNS credentials | |
| 269 | +| `kubeconfig-<cluster-name>` | Hosted cluster kubeconfig | |
| 270 | + |
| 271 | +**Note:** The `KUBECONFIG` environment variable is set in `.envrc` to point to `mgmt-kubeconfig`. With direnv, all `kubectl`, `hypershift`, and `oc` commands automatically use this file. If the file is lost, run `task aks:get-kubeconfig` to restore it. |
| 272 | + |
| 273 | +## Architecture |
| 274 | + |
| 275 | +This setup uses the MIv3 (Managed Identity v3) pattern: |
| 276 | + |
| 277 | +1. **Control Plane Components** use Service Principals with certificates stored in Azure Key Vault |
| 278 | +2. **Data Plane Components** use Managed Identities with federated credentials |
| 279 | +3. **AKS** uses the Key Vault Secrets Provider addon to mount certificates |
| 280 | + |
| 281 | +## Migrating from Shell Scripts |
| 282 | + |
| 283 | +If you were using the shell scripts in `contrib/managed-azure/`: |
| 284 | + |
| 285 | +1. Install Task: `brew install go-task/tap/go-task` |
| 286 | +2. Copy your `user-vars.sh` values to `.envrc` |
| 287 | +3. Run `task mgmt:create` (equivalent to `setup_all.sh --first-time`) |
| 288 | +4. Run `task cluster:create` (equivalent to `create_basic_hosted_cluster.sh`) |
| 289 | + |
| 290 | +## Troubleshooting |
| 291 | + |
| 292 | +### Azure identity mismatch |
| 293 | +If you see "Identity mismatch" or "Forbidden" errors, your Azure CLI is logged in as a different service principal than the one in your credentials file: |
| 294 | +```bash |
| 295 | +# Check current identity |
| 296 | +task prereq:whoami |
| 297 | + |
| 298 | +# Login with correct credentials |
| 299 | +task prereq:login |
| 300 | + |
| 301 | +# Verify |
| 302 | +task prereq:validate |
| 303 | +``` |
| 304 | + |
| 305 | +### Clean up after failed setup |
| 306 | +If a task fails partway through (e.g., Key Vault created but SPs failed): |
| 307 | +```bash |
| 308 | +# Clean up Key Vault resources |
| 309 | +task keyvault:delete |
| 310 | + |
| 311 | +# Fix the issue (e.g., login correctly) |
| 312 | +task prereq:login |
| 313 | + |
| 314 | +# Retry |
| 315 | +task keyvault:setup |
| 316 | +``` |
| 317 | + |
| 318 | +### Check operator logs |
| 319 | +```bash |
| 320 | +task operator:logs |
| 321 | +``` |
| 322 | + |
| 323 | +### Check hosted cluster status |
| 324 | +```bash |
| 325 | +task cluster:show |
| 326 | +``` |
| 327 | + |
| 328 | +### Verify all components |
| 329 | +```bash |
| 330 | +task status |
| 331 | +``` |
| 332 | + |
| 333 | +### Re-run with verbose output |
| 334 | +```bash |
| 335 | +task -v mgmt:create |
| 336 | +``` |
0 commit comments