[ARO-25489] Reduce the memory usage of ResourceSKUs querying#4713
[ARO-25489] Reduce the memory usage of ResourceSKUs querying#4713
Conversation
There was a problem hiding this comment.
Pull request overview
This PR reduces memory usage when querying Azure Compute Resource SKUs by switching from building full in-memory SKU lists to streaming/iterating SKUs and only retaining the subset needed by callers (or just names for admin listing).
Changes:
- Changed
ResourceSKUsClient.Listto return an iterator (iter.Seq2) instead of[]*ResourceSKU, enabling streaming pagination. - Added
computeskus.SelectVMSkusInCurrentRegionandcomputeskus.ListUnrestrictedVMSkusInCurrentRegionto avoid loading all SKUs into memory. - Updated frontend/admin/cluster flows and tests to use the new iterator-based SKU querying.
Reviewed changes
Copilot reviewed 16 out of 16 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| pkg/util/azureclient/azuresdk/armcompute/resourceskus_addons.go | Reworks SKU listing to stream results via iter.Seq2 instead of returning a full slice. |
| pkg/util/mocks/azureclient/azuresdk/armcompute/armcompute.go | Updates generated mock to match the iterator-based List signature. |
| pkg/util/computeskus/computeskus.go | Introduces streaming-based SKU selection/listing helpers and updates error handling accordingly. |
| pkg/util/computeskus/computeskus_test.go | Replaces old FilterVMSizes test with new unit tests for streaming-based helpers. |
| pkg/frontend/sku_validation.go | Uses SelectVMSkusInCurrentRegion to fetch only the SKUs referenced by the cluster doc. |
| pkg/frontend/sku_test.go | Updates mocks and expected errors for iterator-based SKU listing. |
| pkg/frontend/adminactions/azureactions.go | Changes admin VM SKU listing to return []string and adds VMGetSKUs for targeted queries. |
| pkg/util/mocks/adminactions/azureactions.go | Updates generated adminactions mocks for VMSizeList type change and new VMGetSKUs. |
| pkg/frontend/admin_openshiftcluster_vmsizelist.go | Removes local filtering logic; now sorts and returns the already-filtered SKU name list. |
| pkg/frontend/admin_openshiftcluster_vmsizelist_test.go | Updates expectations to reflect []string return type and stable sorting. |
| pkg/frontend/admin_openshiftcluster_vmresize_pre_validation.go | Switches pre-resize validation to query only the desired SKU via VMGetSKUs. |
| pkg/frontend/admin_openshiftcluster_vmresize_pre_validation_test.go | Updates mocks to use VMGetSKUs and map-based SKU lookup. |
| pkg/cluster/validate.go | Switches zone validation to only fetch SKUs needed for master/worker sizing. |
| pkg/cluster/validate_test.go | Updates SKU client mocks to return iterators rather than slices. |
| pkg/cluster/loadbalancerinternal.go | Changes load balancer zonal migration path to fetch only the needed SKU(s). |
| pkg/cluster/loadbalancerinternal_test.go | Updates SKU client mocks to return iterators rather than slices. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
/azp run ci |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
|
||
| filteredSkus, err := computeskus.SelectVMSkusInCurrentRegion(ctx, m.armResourceSKUs, location, []string{ | ||
| string(m.doc.OpenShiftCluster.Properties.MasterProfile.VMSize), | ||
| string(m.doc.OpenShiftCluster.Properties.WorkerProfiles[0].VMSize), |
There was a problem hiding this comment.
We'd want to validate all worker profiles here, correct? Not just the first entry in the array?
There was a problem hiding this comment.
This is the bootstrap version, which only supports 1 entry -- the actual sku check below it also only does the first entry
…s the iterators more efficiently
b9ac6f9 to
4e65a1a
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 16 out of 16 changed files in this pull request and generated 11 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| mrsc.EXPECT().List(gomock.Any(), "location eq northus2", false).Return( | ||
| maps.All(map[*armcompute.ResourceSKU]error{ | ||
| { | ||
| Name: pointerutils.ToPtr("bigmachine_v1"), | ||
| ResourceType: pointerutils.ToPtr("virtualMachines"), | ||
| Locations: pointerutils.ToSlicePtr([]string{"northus2"}), |
There was a problem hiding this comment.
This map literal uses key type *armcompute.ResourceSKU but the keys are written as struct literals ({ ... }) without taking an address. This will not compile; the keys need to be &armcompute.ResourceSKU{...} (and the same applies to the other maps.All(map[*armcompute.ResourceSKU]error{ ... }) blocks in this file).
| mrsc.EXPECT().List(gomock.Any(), "location eq northus2", false).Return( | ||
| maps.All(map[*armcompute.ResourceSKU]error{ | ||
| { | ||
| Name: pointerutils.ToPtr("bigmachine_v1"), | ||
| ResourceType: pointerutils.ToPtr("virtualMachines"), | ||
| Locations: pointerutils.ToSlicePtr([]string{"northus2"}), | ||
| LocationInfo: []*armcompute.ResourceSKULocationInfo{ |
There was a problem hiding this comment.
Same issue here: the maps.All(map[*armcompute.ResourceSKU]error{ ... }) literal is keyed by *armcompute.ResourceSKU but the keys are not address-of expressions. This won’t compile until the keys are changed to &armcompute.ResourceSKU{...}.
| mrsc.EXPECT().List(gomock.Any(), "location eq northus2", false).Return( | ||
| maps.All(map[*armcompute.ResourceSKU]error{ | ||
| { | ||
| Name: pointerutils.ToPtr("bigmachine_v1"), | ||
| ResourceType: pointerutils.ToPtr("virtualMachines"), | ||
| Locations: pointerutils.ToSlicePtr([]string{"northus2"}), |
There was a problem hiding this comment.
Same compile issue: keys in this map[*armcompute.ResourceSKU]error literal need to be pointers (&armcompute.ResourceSKU{...}), not struct literals.
| mrsc.EXPECT().List(gomock.Any(), "location eq northus2", false).Return( | ||
| maps.All(map[*armcompute.ResourceSKU]error{ | ||
| { | ||
| Name: pointerutils.ToPtr("bigmachine_v1"), | ||
| ResourceType: pointerutils.ToPtr("virtualMachines"), | ||
| Locations: pointerutils.ToSlicePtr([]string{"northus2"}), | ||
| LocationInfo: []*armcompute.ResourceSKULocationInfo{ | ||
| { |
There was a problem hiding this comment.
Same compile issue: keys in this map[*armcompute.ResourceSKU]error literal must be pointer values (use &armcompute.ResourceSKU{...}), otherwise the test won’t compile.
| mrsc.EXPECT().List(gomock.Any(), "location eq northus2", false).Return( | ||
| maps.All(map[*armcompute.ResourceSKU]error{ | ||
| { | ||
| Name: pointerutils.ToPtr("bigmachine_v1"), | ||
| ResourceType: pointerutils.ToPtr("virtualMachines"), | ||
| Locations: pointerutils.ToSlicePtr([]string{"northus2"}), | ||
| LocationInfo: []*armcompute.ResourceSKULocationInfo{ | ||
| { |
There was a problem hiding this comment.
Same compile issue: this map[*armcompute.ResourceSKU]error literal uses non-pointer struct literals as keys. Change the keys to &armcompute.ResourceSKU{...} so the test compiles.
| mrsc.EXPECT().List(gomock.Any(), "location eq northus2", false).Return( | ||
| maps.All(map[*armcompute.ResourceSKU]error{ | ||
| { | ||
| Name: pointerutils.ToPtr("bigmachine_v1"), | ||
| ResourceType: pointerutils.ToPtr("virtualMachines"), | ||
| Locations: pointerutils.ToSlicePtr([]string{"northus2"}), | ||
| LocationInfo: []*armcompute.ResourceSKULocationInfo{ |
There was a problem hiding this comment.
Same compile issue: this map[*armcompute.ResourceSKU]error uses struct literals as keys instead of &armcompute.ResourceSKU{...} pointers.
| for _, restriction := range sku.Restrictions { | ||
| for _, restrictedLocation := range restriction.RestrictionInfo.Locations { | ||
| if *restrictedLocation == location { | ||
| return true | ||
| if restriction.RestrictionInfo != nil { | ||
| for _, restrictedLocation := range restriction.RestrictionInfo.Locations { | ||
| if restrictedLocation != nil && strings.EqualFold(*restrictedLocation, location) { | ||
| return true | ||
| } |
There was a problem hiding this comment.
sku.Restrictions is a slice of pointers in the Azure SDK; individual entries can be nil. This code dereferences restriction.RestrictionInfo without checking restriction != nil, which can panic if a nil restriction is returned. Add a nil check for restriction before accessing its fields.
| for sku, err := range skusIter { | ||
| if err != nil { | ||
| return nil, fmt.Errorf("%w: %w", ErrListVMResourceSKUs, err) | ||
| } | ||
|
|
||
| if len(sku.LocationInfo) == 0 { // happened in eastus2euap | ||
| // We only care about VMs and ones with locations/locationinfo | ||
| if *sku.ResourceType != "virtualMachines" || len(sku.Locations) == 0 || len(sku.LocationInfo) == 0 { | ||
| continue | ||
| } | ||
|
|
||
| // We copy only part of the object so we don't have to keep | ||
| // a lot of data in memory. | ||
| vmskus[*sku.Name] = &sdkcompute.ResourceSKU{ | ||
| Name: sku.Name, | ||
| Restrictions: sku.Restrictions, | ||
| LocationInfo: sku.LocationInfo, | ||
| Capabilities: sku.Capabilities, | ||
| // Make sure it's actually in our location | ||
| if !slices.ContainsFunc(sku.Locations, func(s *string) bool { return s != nil && strings.EqualFold(*s, location) }) { | ||
| continue | ||
| } | ||
| } | ||
|
|
||
| return vmskus | ||
| } | ||
| if slices.Contains(skuNames, *sku.Name) { | ||
| vmskus[*sku.Name] = sku | ||
| } |
There was a problem hiding this comment.
Potential nil dereferences: sku can be nil (the iterator’s contract allows yielding nil), and the SDK fields ResourceType / Name are pointers. Dereferencing *sku.ResourceType and *sku.Name without nil checks can panic on unexpected API responses. Add guards for sku == nil, sku.ResourceType == nil, and sku.Name == nil before dereferencing.
Which issue this PR addresses:
Part of [ARO-25489]
What this PR does / why we need it:
I have an inkling that this list in larger regions is consuming a lot of memory. So, let's consume less :)
Test plan for issue:
Unit tests provided, E2E
Is there any documentation that needs to be updated for this PR?
N/A
How do you know this will function as expected in production?
E2E, hopefully