Skip to content

Commit 5eb4783

Browse files
devguyioclaude
andcommitted
feat(azure): add ARO-HCP Taskfile automation for dev workflows
Adds Taskfile-based automation for ARO-HCP development under hack/aro-hcp/. Provides modular tasks for managing Azure infrastructure, AKS clusters, HyperShift operator deployment, and hosted cluster lifecycle. Key features: - Modular task structure with prereq, keyvault, oidc, dataplane, aks, dns, operator, and cluster task files - Example configuration files for credentials and environment setup - Comprehensive README with prerequisites and workflow documentation Commit-Message-Assisted-by: Claude (via Claude Code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> Signed-off-by: Ahmed Abdalla <aabdelre@redhat.com>
1 parent 6472cb1 commit 5eb4783

16 files changed

+2507
-0
lines changed

hack/aro-hcp/.gitignore

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
# Generated credentials and keys
2+
cp-output.json
3+
dp-output.json
4+
serviceaccount-signer.public
5+
serviceaccount-signer.private
6+
external-dns-creds.json
7+
azure-credentials.json
8+
pull-secret.json
9+
10+
# Temporary credential files
11+
creds-tmp/
12+
13+
# Kubeconfig files
14+
kubeconfig-*
15+
mgmt-kubeconfig
16+
17+
# OIDC files
18+
jwks
19+
openid-configuration
20+
21+
# TLS keys
22+
tls/
23+
24+
# Screenshots
25+
*.png
26+
27+
# Environment configuration (contains secrets)
28+
.envrc
29+
30+
# SP output files
31+
*.sp-output.json
32+
*.reset-output.json

hack/aro-hcp/README.md

Lines changed: 336 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,336 @@
1+
# ARO-HCP Development Environment
2+
3+
This directory contains Taskfiles for setting up an AKS management cluster with ARO-HCP (Azure Red Hat OpenShift Hosted Control Plane).
4+
5+
## Prerequisites
6+
7+
- [Task](https://taskfile.dev/) - Install with `brew install go-task/tap/go-task`
8+
- [Azure CLI](https://docs.microsoft.com/en-us/cli/azure/install-azure-cli)
9+
- [ccoctl](https://github.com/openshift/cloud-credential-operator) - Cloud Credential Operator CLI
10+
- [kubectl](https://kubernetes.io/docs/tasks/tools/)
11+
- [jq](https://stedolan.github.io/jq/)
12+
- [gum](https://github.com/charmbracelet/gum) - For styled terminal output
13+
- [hypershift CLI](https://hypershift-docs.netlify.app/) - Either in PATH or set `HYPERSHIFT_BINARY_PATH`
14+
- An Azure subscription with appropriate permissions
15+
- A pull secret from [console.redhat.com](https://console.redhat.com/openshift/install/pull-secret)
16+
17+
## Quick Start
18+
19+
1. **Create Azure credentials file:**
20+
```bash
21+
cp azure-credentials.json.example azure-credentials.json
22+
# Edit azure-credentials.json with your SP credentials
23+
```
24+
25+
2. **Configure environment:**
26+
```bash
27+
cp config.example.env .envrc
28+
# Edit .envrc with your values (PREFIX, OIDC_ISSUER_NAME, RELEASE_IMAGE)
29+
direnv allow # or source .envrc
30+
```
31+
32+
3. **Login to Azure:**
33+
```bash
34+
task prereq:login
35+
```
36+
37+
4. **Create management cluster (first time):**
38+
```bash
39+
task mgmt:create
40+
```
41+
42+
5. **Create hosted cluster:**
43+
```bash
44+
task cluster:create
45+
```
46+
47+
6. **Destroy hosted cluster:**
48+
```bash
49+
task cluster:destroy
50+
```
51+
52+
7. **Destroy management cluster:**
53+
```bash
54+
task mgmt:destroy
55+
```
56+
57+
## Usage Pattern
58+
59+
The typical workflow is:
60+
61+
1. **Once every few months:** `task mgmt:create` - Creates a long-lived AKS management cluster
62+
2. **Every few days:** `task cluster:create` / `task cluster:destroy` - Iterate on hosted clusters
63+
3. **Rarely:** `task mgmt:destroy` - When done with the environment
64+
65+
## Primary Tasks
66+
67+
| Task | Description |
68+
|------|-------------|
69+
| `task mgmt:create` | Create management cluster (AKS) with all dependencies |
70+
| `task mgmt:destroy` | Destroy management cluster |
71+
| `task cluster:create` | Create hosted cluster (most frequent operation) |
72+
| `task cluster:destroy` | Destroy hosted cluster |
73+
74+
## Utility Tasks
75+
76+
| Task | Description |
77+
|------|-------------|
78+
| `task prereq:login` | Login to Azure using azure-credentials.json |
79+
| `task prereq:whoami` | Show current Azure identity vs credentials file |
80+
| `task prereq:validate` | Validate all prerequisites including Azure identity |
81+
| `task prereq:show-config` | Display current configuration |
82+
| `task first-time` | One-time setup only (Key Vault, OIDC, identities) |
83+
| `task teardown-all` | Complete teardown including one-time resources |
84+
| `task status` | Show status of all components |
85+
86+
## Standard Workflow (Recommended)
87+
88+
For most users, these commands are all you need:
89+
90+
**First-time setup:**
91+
```bash
92+
task prereq:login # Login to Azure
93+
task mgmt:create # Create everything (~20 min)
94+
```
95+
96+
**Daily use:**
97+
```bash
98+
task cluster:create # Create hosted cluster
99+
task cluster:destroy # Destroy hosted cluster
100+
```
101+
102+
**Cleanup:**
103+
```bash
104+
task mgmt:destroy # Destroy management cluster
105+
task teardown-all # Complete teardown including persistent resources
106+
```
107+
108+
## Step-by-Step Workflow (For Debugging)
109+
110+
Use this when you need granular control for debugging or testing individual steps.
111+
112+
**Legend:**
113+
| Symbol | Meaning |
114+
|--------|---------|
115+
| `` | Aggregator - only orchestrates subtasks, can skip if you run all children manually |
116+
| `` | Does work - has actual commands/logic, must run this task |
117+
| `` | Has internal subtasks - CANNOT skip, must use this parent task |
118+
119+
```
120+
● prereq:login # Login using azure-credentials.json
121+
● prereq:whoami # Verify identity matches
122+
● prereq:validate # Validate all prerequisites
123+
124+
○ mgmt:create # Aggregator - orchestrates all setup
125+
├── ● prereq:validate
126+
├── ○ keyvault:setup # Aggregator
127+
│ ├── ● keyvault:create
128+
│ ├── ● keyvault:create-sps ⚠ has internal create-sp
129+
│ ├── ● keyvault:generate-sp-jsons
130+
│ ├── ● keyvault:store-creds
131+
│ └── ● keyvault:generate-cp-json
132+
├── ● oidc:create ⚠ has internal create-issuer
133+
│ └── ● oidc:create-keypair
134+
├── ○ dataplane:create # Aggregator
135+
│ ├── ● dataplane:create-identities ⚠ has internal
136+
│ ├── ● dataplane:create-federated-creds ⚠ has internal
137+
│ └── ● dataplane:generate-dp-json
138+
├── ● aks:create-identities
139+
├── ○ aks:create # Aggregator
140+
│ ├── ● aks:create-rg
141+
│ ├── ● aks:create-cluster
142+
│ ├── ● aks:get-kubeconfig
143+
│ └── ● aks:assign-kv-role
144+
├── ○ dns:setup # Aggregator
145+
│ ├── ● dns:create-zone
146+
│ ├── ● dns:delegate-zone
147+
│ ├── ● dns:create-sp
148+
│ └── ● dns:create-secret
149+
└── ● operator:install
150+
└── ● operator:apply-crds
151+
152+
● operator:wait # Wait for operator (standalone)
153+
● operator:verify # Verify operator status (standalone)
154+
● operator:logs # Show operator logs (standalone)
155+
156+
● status # Show status of all components
157+
158+
○ cluster:create # Aggregator
159+
├── ● cluster:create-rgs
160+
├── ● cluster:create-network ⚠ has internal create-nsg, create-vnet
161+
└── ● cluster:create-hc
162+
163+
● cluster:wait # Wait for cluster ready
164+
● cluster:get-kubeconfig # Get kubeconfig
165+
● cluster:show # Show cluster status
166+
167+
○ cluster:destroy # Aggregator
168+
├── ● cluster:destroy-hc
169+
└── ● cluster:delete-rgs
170+
171+
○ mgmt:destroy # Aggregator
172+
├── ● operator:uninstall
173+
├── ● dns:delete
174+
└── ● aks:delete
175+
176+
○ teardown-all # Aggregator
177+
├── ○ cluster:destroy
178+
├── ○ mgmt:destroy
179+
├── ● dataplane:delete
180+
├── ● oidc:delete
181+
├── ● keyvault:delete
182+
└── ● aks:delete-identities
183+
```
184+
185+
**Important:**
186+
- Tasks marked with `` have internal subtasks that you CANNOT run directly
187+
- Example: `oidc:create` calls both `create-keypair` (public) AND `create-issuer` (internal)
188+
- Running only `oidc:create-keypair` will NOT create the OIDC issuer - you must run `oidc:create`
189+
190+
## Task Namespaces
191+
192+
### prereq: - Prerequisites and Azure Authentication
193+
- `task prereq:login` - Login to Azure using azure-credentials.json
194+
- `task prereq:whoami` - Show current Azure identity and verify it matches credentials file
195+
- `task prereq:validate` - Validate tools, environment variables, and Azure identity
196+
- `task prereq:show-config` - Display current configuration
197+
198+
### keyvault: - Key Vault and Control Plane SPs
199+
- `task keyvault:setup` - Complete Key Vault setup (idempotent)
200+
- `task keyvault:rotate-creds` - Rotate all SP credentials
201+
- `task keyvault:delete` - Delete Key Vault and SPs
202+
203+
### oidc: - OIDC Provider
204+
- `task oidc:create` - Create OIDC provider (idempotent)
205+
- `task oidc:delete` - Delete OIDC issuer
206+
207+
### dataplane: - Data Plane Managed Identities
208+
- `task dataplane:create` - Complete data plane setup (idempotent)
209+
- `task dataplane:delete` - Delete data plane identities
210+
211+
### aks: - AKS Management Cluster
212+
- `task aks:create` - Complete AKS setup
213+
- `task aks:get-kubeconfig` - Get/restore AKS kubeconfig (re-run if file is lost)
214+
- `task aks:delete` - Delete AKS cluster
215+
- `task aks:show` - Show AKS status
216+
217+
### dns: - External DNS
218+
- `task dns:setup` - Complete DNS setup (idempotent)
219+
- `task dns:delete` - Delete DNS resources
220+
221+
### operator: - HyperShift Operator
222+
- `task operator:install` - Install HyperShift operator (ARO-HCP mode)
223+
- `task operator:verify` - Verify operator installation
224+
- `task operator:uninstall` - Uninstall operator
225+
226+
### cluster: - Hosted Cluster
227+
- `task cluster:create-hc` - Create hosted cluster
228+
- `task cluster:destroy-hc` - Destroy hosted cluster
229+
- `task cluster:get-kubeconfig` - Get hosted cluster kubeconfig
230+
- `task cluster:show` - Show hosted cluster status
231+
- `task cluster:wait` - Wait for cluster to be ready
232+
233+
## Required Configuration
234+
235+
| File/Variable | Description |
236+
|---------------|-------------|
237+
| `AZURE_CREDS` | Path to azure-credentials.json (contains subscriptionId, tenantId, clientId, clientSecret) |
238+
| `PULL_SECRET` | Path to pull secret file |
239+
| `PREFIX` | Unique prefix for all resources |
240+
| `OIDC_ISSUER_NAME` | Unique name for OIDC storage account |
241+
| `RELEASE_IMAGE` | OpenShift release image |
242+
243+
## Optional Environment Variables
244+
245+
| Variable | Default | Description |
246+
|----------|---------|-------------|
247+
| `LOCATION` | `eastus` | Azure region for resources |
248+
| `PERSISTENT_RG_NAME` | `os4-common` | Shared resource group |
249+
| `PARENT_DNS_ZONE` | `hypershift.azure.devcluster.openshift.com` | Parent DNS zone |
250+
| `AKS_NODE_COUNT` | `3` | Number of AKS nodes |
251+
| `AKS_NODE_VM_SIZE` | `Standard_D4s_v4` | VM size for AKS nodes |
252+
| `NODE_POOL_REPLICAS` | `2` | Number of worker nodes |
253+
| `HYPERSHIFT_IMAGE` | (none) | Override HyperShift operator image |
254+
| `HYPERSHIFT_BINARY_PATH` | (none) | Path to hypershift binary |
255+
| `KUBECONFIG` | `./mgmt-kubeconfig` | Path where mgmt cluster kubeconfig will be saved |
256+
257+
## Generated Files
258+
259+
The following files are generated during setup:
260+
261+
| File | Description |
262+
|------|-------------|
263+
| `mgmt-kubeconfig` | Management (AKS) cluster kubeconfig - created by `task aks:get-kubeconfig` |
264+
| `cp-output.json` | Control plane managed identities |
265+
| `dp-output.json` | Data plane managed identities |
266+
| `serviceaccount-signer.public` | SA token issuer public key |
267+
| `serviceaccount-signer.private` | SA token issuer private key |
268+
| `external-dns-creds.json` | External DNS credentials |
269+
| `kubeconfig-<cluster-name>` | Hosted cluster kubeconfig |
270+
271+
**Note:** The `KUBECONFIG` environment variable is set in `.envrc` to point to `mgmt-kubeconfig`. With direnv, all `kubectl`, `hypershift`, and `oc` commands automatically use this file. If the file is lost, run `task aks:get-kubeconfig` to restore it.
272+
273+
## Architecture
274+
275+
This setup uses the MIv3 (Managed Identity v3) pattern:
276+
277+
1. **Control Plane Components** use Service Principals with certificates stored in Azure Key Vault
278+
2. **Data Plane Components** use Managed Identities with federated credentials
279+
3. **AKS** uses the Key Vault Secrets Provider addon to mount certificates
280+
281+
## Migrating from Shell Scripts
282+
283+
If you were using the shell scripts in `contrib/managed-azure/`:
284+
285+
1. Install Task: `brew install go-task/tap/go-task`
286+
2. Copy your `user-vars.sh` values to `.envrc`
287+
3. Run `task mgmt:create` (equivalent to `setup_all.sh --first-time`)
288+
4. Run `task cluster:create` (equivalent to `create_basic_hosted_cluster.sh`)
289+
290+
## Troubleshooting
291+
292+
### Azure identity mismatch
293+
If you see "Identity mismatch" or "Forbidden" errors, your Azure CLI is logged in as a different service principal than the one in your credentials file:
294+
```bash
295+
# Check current identity
296+
task prereq:whoami
297+
298+
# Login with correct credentials
299+
task prereq:login
300+
301+
# Verify
302+
task prereq:validate
303+
```
304+
305+
### Clean up after failed setup
306+
If a task fails partway through (e.g., Key Vault created but SPs failed):
307+
```bash
308+
# Clean up Key Vault resources
309+
task keyvault:delete
310+
311+
# Fix the issue (e.g., login correctly)
312+
task prereq:login
313+
314+
# Retry
315+
task keyvault:setup
316+
```
317+
318+
### Check operator logs
319+
```bash
320+
task operator:logs
321+
```
322+
323+
### Check hosted cluster status
324+
```bash
325+
task cluster:show
326+
```
327+
328+
### Verify all components
329+
```bash
330+
task status
331+
```
332+
333+
### Re-run with verbose output
334+
```bash
335+
task -v mgmt:create
336+
```

0 commit comments

Comments
 (0)