Skip to content

pool: stop and remove orphan controller services on supervisor startup#933

Merged
aliel merged 2 commits intomainfrom
aliel-fix-orphan-controller-cleanup
Apr 14, 2026
Merged

pool: stop and remove orphan controller services on supervisor startup#933
aliel merged 2 commits intomainfrom
aliel-fix-orphan-controller-cleanup

Conversation

@aliel
Copy link
Copy Markdown
Member

@aliel aliel commented Apr 14, 2026

Forgotten VMs left behind stale controller.json files and sometimes
running aleph-vm-controller@.service units with active qemu
processes that kept consuming host RAM. The admission check iterates
pool.executions and did not see these orphans, so the host's real
free memory was lower than the check computed.

On load_persistent_executions, scan EXECUTION_ROOT for controller
configs whose vm_hash is not in the pool, stop_and_disable the
matching systemd service, and delete the config file.

aliel added 2 commits April 14, 2026 10:17
  Forgotten VMs left behind stale controller.json files and sometimes
  running aleph-vm-controller@<hash>.service units with active qemu
  processes that kept consuming host RAM. The admission check iterates
  pool.executions and did not see these orphans, so the host's real
  free memory was lower than the check computed.

  On load_persistent_executions, scan EXECUTION_ROOT for controller
  configs whose vm_hash is not in the pool, stop_and_disable the
  matching systemd service, and delete the config file.
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 14, 2026

Codecov Report

❌ Patch coverage is 3.57143% with 27 lines in your changes missing coverage. Please review.
✅ Project coverage is 68.35%. Comparing base (5e501fb) to head (a4cbc7e).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
src/aleph/vm/pool.py 3.57% 27 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #933      +/-   ##
==========================================
- Coverage   68.48%   68.35%   -0.13%     
==========================================
  Files         104      104              
  Lines       11955    11983      +28     
  Branches     1020     1023       +3     
==========================================
+ Hits         8187     8191       +4     
- Misses       3503     3527      +24     
  Partials      265      265              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@aliel aliel merged commit c1c17d5 into main Apr 14, 2026
16 of 18 checks passed
@aliel aliel deleted the aliel-fix-orphan-controller-cleanup branch April 14, 2026 08:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant