Skip to content

Azure: UnboundLocalError in _add_machines_to_db() when VMSS has zero instances (initial_pool_size=0) #2842

@xtarget

Description

@xtarget

Title

[Azure] Zero-Scale and external VMSS management fixes

Summary

Fixes three related bugs that prevent zero-scale configurations (initial_pool_size=0) and external VMSS management (just_star

Environment: CAPEv2 Latest (main), Azure VMSS, Python 3.13

Bugs Fixed

Bug #1: _add_machines_to_db throws ResourceNotFoundError (capacity=0)

File: modules/machinery/az.py, Line 780
Issue: Method throws unhandled ResourceNotFoundError when iterating over network interfaces of empty VMSS
Fix: Add try-except block around paging iterator with ResourceNotFoundError handler

Bug #2: _delete_machines_from_db_if_missing throws ResourceNotFoundError (capacity=0)

File: modules/machinery/az.py, Line 815
Issue: Method throws unhandled ResourceNotFoundError when listing VMs in empty VMSS
Fix: Add try-except block around paging iterator with ResourceNotFoundError handler

Bug #3: _process_pre_existing_vmsss ignores just_start parameter

File: modules/machinery/az.py, Line 343
Issue: Method deletes externally created VMSS without checking just_start parameter
Fix: Add just_start check before deleting VMSS, add logging for delete operations

Changes

  • Add exception handling for capacity=0 in _add_machines_to_db
  • Add exception handling for capacity=0 in _delete_machines_from_db_if_missing
  • Add just_start check in _process_pre_existing_vmsss
  • Add logging for VMSS delete operations
  • Ensure ResourceNotFoundError is imported from azure.core.exceptions

Impact

  • Zero-scale feature (initial_pool_size=0) works correctly
  • External VMSS management (Ansible/Terraform) with just_start=true works
  • VMSS remains stable with capacity=0 (no deletion loop)
  • Cost savings: ~€40-60/month when using zero-scale vs. always-on VMSS

Testing

Tested scenarios:

  • VMSS with capacity=0 (zero-scale): No exceptions, VMSS stable
  • VMSS with capacity=1: Normal operation works
  • External VMSS with just_start=true: VMSS not deleted on startup
  • All scenarios: CAPEv2 starts without crashes, tasks can be submitted

Related Issues

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions