Skip to content

Commit 6a8f231

Browse files
committed
feat: extend migration using vac approach
1 parent cc9c17b commit 6a8f231

File tree

5 files changed

+1143
-303
lines changed

5 files changed

+1143
-303
lines changed

docs/migrate-premiumlrs-to-premiumv2lrs.md

Lines changed: 136 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Premium_LRS → PremiumV2_LRS Migration Guide
22

3-
This guide explains how to use the migration scripts in `hack/` to move Azure Disk backed PVCs from Premium_LRS to PremiumV2_LRS. It covers the two supported modes (`inplace` and `dual`), prerequisites, validation steps, safety / rollback, cleanup, and troubleshooting.
3+
This guide explains how to use the migration scripts in `hack/` to move Azure Disk backed PVCs from Premium_LRS to PremiumV2_LRS. It now covers three supported modes (`inplace`, `dual`, and `attrclass` / VolumeAttributesClass), prerequisites, validation steps, safety / rollback, cleanup, and troubleshooting.
44

55
---
66

@@ -22,11 +22,39 @@ They are intended for controlled batches (not fire‑and‑forget across an enti
2222
|------|--------|---------|------|-------------------|-------------|
2323
| In-place | `hack/premium-to-premiumv2-migrator-inplace.sh` | Deletes original PVC (keeping original PV), recreates same name PVC pointing to snapshot and PremiumV2 SC | Same name preserved; minimal object sprawl | Short window where PVC is absent; workload must be quiesced/detached; rollback relies on retained PV | Smaller batches, controlled maintenance windows |
2424
| Dual (pv1→pv2) | `hack/premium-to-premiumv2-migrator-dualpvc.sh` | Creates intermediate CSI PV/PVC (if source was in-tree), snapshots, creates a *pv2* PVC (suffix), monitors migration events | Keeps original PVC around longer (reduced disruption); clearer staged artifacts | More objects (intermediate PV/PVC + target); higher cleanup burden; naming complexity | Migration where minimizing initial disruption matters or need visibility before switch |
25+
| AttrClass (in-place attribute update) | `hack/premium-to-premiumv2-migrator-vac.sh` | (Optionally) converts in-tree PV to CSI same-name first, then applies a `VolumeAttributesClass` to mutate the disk SKU | No new pv2 PVC; minimal object churn; preserves PVC name; avoids creating SC variants | Requires cluster & driver support for VolumeAttributesClass; rollback of SKU change requires another class or snapshot-based restore | Clusters already CSI-enabled or ready to convert; desire lowest object churn |
2526

2627
Recommendation:
2728
1. Pilot on a tiny subset using `inplace` (simpler) in a non-prod namespace.
28-
2. If operational constraints demand minimal rename churn or extra observation time, use `dual` for broader rollout.
29-
3. Always label PVCs explicitly to opt them in (staged adoption).
29+
2. If you need prolonged coexistence / observation, use `dual`.
30+
3. If your cluster + Azure Disk CSI driver support `VolumeAttributesClass`, prefer `attrclass` for lowest object churn (especially when most PVs are already CSI).
31+
4. Always label PVCs explicitly to opt them in (staged adoption).
32+
33+
### 2.1 AttrClass Mode Details
34+
35+
`hack/premium-to-premiumv2-migrator-vac.sh`:
36+
- Ensures (or recreates if forced) a `VolumeAttributesClass` (default `azuredisk-premiumv2`) with `parameters.skuName=PremiumV2_LRS`.
37+
- For CSI Premium_LRS PVCs: patches `spec.volumeAttributesClassName` only (no new PVC/PV).
38+
- For in-tree azureDisk PVs: performs a one-time snapshot-based same-name CSI recreation (like a narrowed “inplace” convert) then patches attr class.
39+
- Central monitoring loop watches both:
40+
- PV `.spec.csi.volumeAttributes.skuName|skuname` flip to `PremiumV2_LRS`.
41+
- `SKUMigration*` events (if emitted) similar to other modes.
42+
- Rollback before SKU change: same as inplace (retained original PV + annotation / backup). After successful SKU mutation: must apply a different attr class pointing back to Premium_LRS (not auto-created) or restore from snapshot.
43+
44+
Example:
45+
```bash
46+
kubectl label pvc data-app-a -n team-a disk.csi.azure.com/pv2migration=true
47+
cd hack
48+
./premium-to-premiumv2-migrator-vac.sh | tee run-attrclass-$(date +%Y%m%d-%H%M%S).log
49+
```
50+
51+
Additional env (see section 5):
52+
```
53+
ATTR_CLASS_NAME=azuredisk-premiumv2
54+
ATTR_CLASS_API_VERSION=storage.k8s.io/v1beta1 # or storage.k8s.io/v1 when GA
55+
TARGET_SKU=PremiumV2_LRS
56+
ATTR_CLASS_FORCE_RECREATE=false
57+
```
3058

3159
---
3260

@@ -82,6 +110,12 @@ Change the label (or add additional selectors externally) to control scope. Only
82110
| `MIGRATION_LABEL` | see above | PVC selection. |
83111
| `AUDIT_ENABLE` | `true` | Enable audit log lines. |
84112
| `AUDIT_LOG_FILE` | `pv1-pv2-migration-audit.log` | Rolling append log file. |
113+
| `ATTR_CLASS_NAME` | `azuredisk-premiumv2` | (AttrClass mode) Name of VolumeAttributesClass to apply. |
114+
| `ATTR_CLASS_API_VERSION` | `storage.k8s.io/v1beta1` | API version for VolumeAttributesClass (adjust if GA). |
115+
| `TARGET_SKU` | `PremiumV2_LRS` | Target skuName parameter for the VolumeAttributesClass. |
116+
| `ATTR_CLASS_FORCE_RECREATE` | `false` | Recreate the attr class each run. |
117+
| `PV_POLL_INTERVAL_SECONDS` | `10` | (AttrClass) Poll interval for sku check. |
118+
| `SKU_UPDATE_TIMEOUT_MINUTES` | `60` | (AttrClass optional blocking helper) Per-PVC sku update wait if used directly. |
85119

86120
(See top of `lib-premiumv2-migration-common.sh` for the complete list.)
87121

@@ -90,22 +124,74 @@ Change the label (or add additional selectors externally) to control scope. Only
90124
## 6. Prerequisites & Validation Checklist
91125

92126
Before running:
93-
1. RBAC: Ensure your principal can `get/list/create/patch/delete` PV/PVC/Snapshot/SC as required. Script will abort if critical verbs fail.
94-
2. Quota: Check PremiumV2 disk quotas in target subscription/region (script does NOT enforce).
95-
3. StorageClasses: Confirm original SC(s) are Premium_LRS (cachingMode=none, no unsupported encryption combos).
96-
4. Workload readiness: Plan for pods referencing target PVCs to be idle / safe to pause if using in-place.
97-
5. Snapshot CRDs: Ensure `VolumeSnapshot` CRDs installed (the script creates a class if absent).
98-
6. Label small test set:
127+
1. **RBAC**: Ensure your principal can `get/list/create/patch/delete` PV/PVC/Snapshot/SC as required. Script will abort if critical verbs fail.
128+
129+
2. **Quota**: Check PremiumV2 disk quotas in target subscription/region (script does NOT enforce).
130+
131+
3. **StorageClasses**: Confirm original SC(s) are Premium_LRS (cachingMode=none, no unsupported encryption combos).
132+
133+
4. **⚠️ Zone Topology Requirements (Critical for PremiumV2_LRS)**:
134+
135+
**PremiumV2_LRS disks can only be attached to VMs running in the same Availability Zone.** If your workloads are zone-constrained or you're using topology-aware scheduling, you **must** update your source StorageClasses with `allowedTopologies` before migration.
136+
137+
**Action Required**: Update your existing Premium_LRS StorageClasses to include the correct zone topology constraints:
138+
139+
```yaml
140+
apiVersion: storage.k8s.io/v1
141+
kind: StorageClass
142+
metadata:
143+
name: managed-premium # Your existing StorageClass name
144+
provisioner: disk.csi.azure.com
145+
parameters:
146+
skuName: Premium_LRS
147+
cachingMode: None
148+
allowedTopologies:
149+
- matchLabelExpressions:
150+
- key: topology.disk.csi.azure.com/zone
151+
values:
152+
- eastus2-1 # Replace with your target zone(s)
153+
- eastus2-2 # Add multiple zones if needed
154+
- eastus2-3
155+
reclaimPolicy: Delete
156+
allowVolumeExpansion: true
157+
volumeBindingMode: WaitForFirstConsumer # Recommended for zone-aware scheduling
99158
```
100-
kubectl label pvc data-app-a -n team-a disk.csi.azure.com/pv2migration=true
159+
160+
**How to determine your zones**:
161+
```bash
162+
# Check zones where your nodes are running
163+
kubectl get nodes -o custom-columns="NAME:.metadata.name,ZONE:.metadata.labels['topology\.kubernetes\.io/zone']"
164+
165+
# Check zones where your existing PVs are located
166+
kubectl get pv -o custom-columns="NAME:.metadata.name,ZONE:.spec.nodeAffinity.required.nodeSelectorTerms[0].matchExpressions[0].values[0]"
167+
168+
# Check current PVC zones
169+
kubectl get pvc -A -o custom-columns="NAMESPACE:.metadata.namespace,NAME:.metadata.name,ZONE:.metadata.annotations['volume\.kubernetes\.io/selected-node']" | grep -v '<none>'
101170
```
102-
7. Dry run *logic* (syntax & preflight only):
171+
172+
**Why this matters**:
173+
- The migration script inherits `allowedTopologies` from your source StorageClass when creating PremiumV2_LRS variants
174+
- Without proper topology constraints, PremiumV2 PVCs may be provisioned in zones where your workloads cannot access them
175+
- This can result in pod scheduling failures or volume attachment timeouts
176+
177+
5. **Workload readiness**: Plan for pods referencing target PVCs to be idle / safe to pause if using in-place.
178+
179+
6. **Snapshot CRDs**: Ensure `VolumeSnapshot` CRDs installed (the script creates a class if absent).
180+
181+
7. **Label small test set**:
182+
```bash
183+
kubectl label pvc data-app-a -n team-a disk.csi.azure.com/pv2migration=true
103184
```
185+
186+
8. **Dry run *logic* (syntax & preflight only)**:
187+
```bash
104188
bash -n hack/premium-to-premiumv2-migrator-inplace.sh
105189
bash -n hack/premium-to-premiumv2-migrator-dualpvc.sh
106190
```
107-
8. Optional: Run with a deliberately empty label selector to validate preflight (set `MIGRATION_LABEL="doesnotexist=true"` temporarily).
108191

192+
9. **Optional**: Run with a deliberately empty label selector to validate preflight (set `MIGRATION_LABEL="doesnotexist=true"` temporarily).
193+
194+
**Important**: After updating your source StorageClasses with topology constraints, verify that existing workloads can still schedule properly before proceeding with migration. The script will automatically inherit these topology settings when creating the PremiumV2_LRS variant StorageClasses.
109195
---
110196

111197
## 7. Running the Scripts
@@ -127,6 +213,20 @@ MAX_PVCS=5 MIG_SUFFIX=csi \
127213
./premium-to-premiumv2-migrator-dualpvc.sh 2>&1 | tee run-dual-$(date +%Y%m%d-%H%M%S).log
128214
```
129215

216+
AttrClass example:
217+
```bash
218+
cd hack
219+
MAX_PVCS=5 ATTR_CLASS_NAME=azuredisk-premiumv2 \
220+
./premium-to-premiumv2-migrator-vac.sh 2>&1 | tee run-attrclass-$(date +%Y%m%d-%H%M%S).log
221+
```
222+
223+
AttrClass with in-tree presence (override baseline CSI SC):
224+
```bash
225+
cd hack
226+
CSI_BASELINE_SC=csi-azuredisk-premium \
227+
MAX_PVCS=3 ./premium-to-premiumv2-migrator-vac.sh
228+
```
229+
130230
Important runtime phases (both):
131231
1. Pre-req scan (size, SC parameters, binding).
132232
2. RBAC preflight.
@@ -415,6 +515,9 @@ Summary:
415515
| No `SKUMigration*` events | Controller not emitting or watch delay | Force in-progress label (script auto after threshold) |
416516
| Released PV leftovers | Rollback or partial batch | Confirm not needed → delete PV |
417517
| Rollback failed to rebind | claimRef not cleared or PV reclaimPolicy=Delete | Ensure reclaimPolicy changed to Retain earlier |
518+
| AttrClass PVC never flips sku | Driver / cluster lacks VolumeAttributesClass update support | Confirm driver version & feature gate; inspect PV `.spec.csi.volumeAttributes` |
519+
| AttrClass run shows no events | Controller not emitting `SKUMigration*` | Rely on sku attribute polling; consider driver log inspection |
520+
| AttrClass rollback after sku change | SKU already mutated on disk | Apply alternate attr class (Premium_LRS) or snapshot restore |
418521

419522
---
420523

@@ -443,6 +546,17 @@ kubectl get pvc data-app-a -n team-a -o wide
443546
kubectl describe pv $(kubectl get pvc data-app-a -n team-a -o jsonpath='{.spec.volumeName}') | grep -i sku
444547
```
445548

549+
### 15.1 Example AttrClass (CSI-native PVC)
550+
```bash
551+
kubectl label pvc data-app-b -n team-b disk.csi.azure.com/pv2migration=true
552+
cd hack
553+
./premium-to-premiumv2-migrator-vac.sh | tee mig-attrclass-b.log
554+
# Verify:
555+
kubectl get pvc data-app-b -n team-b -o wide
556+
pv=$(kubectl get pvc data-app-b -n team-b -o jsonpath='{.spec.volumeName}')
557+
kubectl get pv "$pv" -o jsonpath='{.spec.csi.volumeAttributes.skuName}'; echo
558+
```
559+
446560
---
447561

448562
## 16. After Everything Looks Good
@@ -488,6 +602,11 @@ export BIND_TIMEOUT_SECONDS=120
488602
export WORKLOAD_DETACH_TIMEOUT_MINUTES=15
489603
export BACKUP_ORIGINAL_PVC=true
490604
export ROLLBACK_ON_TIMEOUT=true
605+
export ATTR_CLASS_NAME=azuredisk-premiumv2
606+
export TARGET_SKU=PremiumV2_LRS
607+
export ATTR_CLASS_FORCE_RECREATE=false
608+
export PV_POLL_INTERVAL_SECONDS=10
609+
export MIGRATION_FORCE_INPROGRESS_AFTER_MINUTES=10
491610
```
492611

493612
---
@@ -498,10 +617,11 @@ export ROLLBACK_ON_TIMEOUT=true
498617
2. `bash -n` passes
499618
3. Run script → watch logs until summary
500619
4. Review cleanup report
501-
5. Verify data & app workload on PremiumV2
502-
6. Cleanup intermediate / snapshot artifacts as appropriate
503-
7. Archive audit + backups
504-
8. Proceed to next batch
620+
5. Verify data & app workload on PremiumV2 (PV attributes or events)
621+
6. (Dual/In-place) Cleanup intermediate / snapshot artifacts
622+
7. (AttrClass) Confirm attr class applied (PVC.spec.volumeAttributesClassName) & PV sku updated
623+
8. Archive audit + backups
624+
9. Proceed to next batch
505625

506626
---
507627

0 commit comments

Comments
 (0)