-
Notifications
You must be signed in to change notification settings - Fork 537
Description
Title
[Azure] Zero-Scale and external VMSS management fixes
Summary
Fixes three related bugs that prevent zero-scale configurations (initial_pool_size=0) and external VMSS management (just_star
Environment: CAPEv2 Latest (main), Azure VMSS, Python 3.13
Bugs Fixed
Bug #1: _add_machines_to_db throws ResourceNotFoundError (capacity=0)
File: modules/machinery/az.py, Line 780
Issue: Method throws unhandled ResourceNotFoundError when iterating over network interfaces of empty VMSS
Fix: Add try-except block around paging iterator with ResourceNotFoundError handler
Bug #2: _delete_machines_from_db_if_missing throws ResourceNotFoundError (capacity=0)
File: modules/machinery/az.py, Line 815
Issue: Method throws unhandled ResourceNotFoundError when listing VMs in empty VMSS
Fix: Add try-except block around paging iterator with ResourceNotFoundError handler
Bug #3: _process_pre_existing_vmsss ignores just_start parameter
File: modules/machinery/az.py, Line 343
Issue: Method deletes externally created VMSS without checking just_start parameter
Fix: Add just_start check before deleting VMSS, add logging for delete operations
Changes
- Add exception handling for capacity=0 in _add_machines_to_db
- Add exception handling for capacity=0 in _delete_machines_from_db_if_missing
- Add just_start check in _process_pre_existing_vmsss
- Add logging for VMSS delete operations
- Ensure ResourceNotFoundError is imported from azure.core.exceptions
Impact
- Zero-scale feature (initial_pool_size=0) works correctly
- External VMSS management (Ansible/Terraform) with just_start=true works
- VMSS remains stable with capacity=0 (no deletion loop)
- Cost savings: ~€40-60/month when using zero-scale vs. always-on VMSS
Testing
Tested scenarios:
- VMSS with capacity=0 (zero-scale): No exceptions, VMSS stable
- VMSS with capacity=1: Normal operation works
- External VMSS with just_start=true: VMSS not deleted on startup
- All scenarios: CAPEv2 starts without crashes, tasks can be submitted
Related Issues
- Azure: UnboundLocalError in _add_machines_to_db() when VMSS has zero instances (initial_pool_size=0) #2842: UnboundLocalError when VMSS has capacity=0
- [Azure] Zero-Scale fails: _delete_machines_from_db_if_missing throws ParentResourceNotFound when VMSS has capacity=0 #2848: Zero-Scale feature non-functional
- Related to PR Azure machinery major updates #2666: Zero-Scale feature introduction