Skip to content

Conversation

@sanderegg
Copy link
Member

@sanderegg sanderegg commented Aug 29, 2025

@sanderegg sanderegg added this to the Voyager milestone Aug 29, 2025
@sanderegg sanderegg self-assigned this Aug 29, 2025
@sanderegg sanderegg added the a:autoscaling autoscaling service in simcore's stack label Aug 29, 2025
@sanderegg sanderegg marked this pull request as draft August 29, 2025 14:09
@codecov
Copy link

codecov bot commented Aug 29, 2025

Codecov Report

❌ Patch coverage is 76.92308% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 89.79%. Comparing base (5affe86) to head (cb8d815).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #8277      +/-   ##
==========================================
+ Coverage   89.31%   89.79%   +0.48%     
==========================================
  Files        1678     1272     -406     
  Lines       65431    54789   -10642     
  Branches      828      225     -603     
==========================================
- Hits        58438    49199    -9239     
+ Misses       6774     5520    -1254     
+ Partials      219       70     -149     
Flag Coverage Δ
integrationtests 64.08% <ø> (-0.06%) ⬇️
unittests 87.97% <76.92%> (-0.19%) ⬇️
Components Coverage Δ
pkg_aws_library 93.59% <ø> (+0.45%) ⬆️
pkg_celery_library ∅ <ø> (∅)
pkg_dask_task_models_library ∅ <ø> (∅)
pkg_models_library ∅ <ø> (∅)
pkg_notifications_library ∅ <ø> (∅)
pkg_postgres_database ∅ <ø> (∅)
pkg_service_integration ∅ <ø> (∅)
pkg_service_library ∅ <ø> (∅)
pkg_settings_library ∅ <ø> (∅)
pkg_simcore_sdk 85.03% <ø> (ø)
agent 93.53% <ø> (ø)
api_server 92.73% <ø> (ø)
autoscaling 95.77% <76.92%> (-0.13%) ⬇️
catalog 92.34% <ø> (ø)
clusters_keeper 99.13% <ø> (ø)
dask_sidecar 92.37% <ø> (+0.22%) ⬆️
datcore_adapter 97.94% <ø> (ø)
director 75.81% <ø> (ø)
director_v2 90.90% <ø> (-0.02%) ⬇️
dynamic_scheduler ∅ <ø> (∅)
dynamic_sidecar 90.46% <ø> (+8.58%) ⬆️
efs_guardian 89.62% <ø> (ø)
invitations 91.44% <ø> (ø)
payments 92.61% <ø> (ø)
resource_usage_tracker 92.18% <ø> (+0.26%) ⬆️
storage 86.44% <ø> (∅)
webclient ∅ <ø> (∅)
webserver 88.07% <ø> (-0.03%) ⬇️

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5affe86...cb8d815. Read the comment docs.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@mergify
Copy link
Contributor

mergify bot commented Aug 29, 2025

🧪 CI Insights

Here's what we observed from your CI run for cb8d815.

✅ Passed Jobs With Interesting Signals

Pipeline Job Signal Health on master Retries 🔍 CI Insights 📄 Logs
CI unit-tests Base branch is healthy, but retries were needed. Could be early signs of flakiness 👀 Healthy 1 View View

@sanderegg sanderegg force-pushed the autoscaling/bugfix/8273/ensure-warm-buffer-not-starting-does-not-block-autoscaling branch from 0458399 to 40c258a Compare September 1, 2025 06:10
@sanderegg sanderegg marked this pull request as ready for review September 1, 2025 06:11
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes a bug in the autoscaling service where warm buffer instances that couldn't be started due to insufficient capacity would prevent new instances from being launched. The solution ensures that when warm buffer instances fail to start, their assigned tasks are de-assigned and can be fulfilled by launching new cold instances.

Key Changes

  • Modified the warm buffer starting logic to handle EC2InsufficientCapacityError exceptions gracefully
  • Added a mechanism to de-assign tasks from warm buffer instances that cannot be started
  • Updated the autoscaling flow to retry task assignment with cold instances when warm buffers fail

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
services/autoscaling/src/simcore_service_autoscaling/modules/cluster_scaling/_auto_scaling_core.py Core autoscaling logic updated to handle warm buffer start failures and de-assign tasks for retry with cold instances
services/autoscaling/tests/unit/test_modules_cluster_scaling_dynamic.py Removed xfail marker and increased test warm buffer count to properly test the fix
packages/aws-library/tests/test_ec2_client.py Added test coverage for insufficient capacity scenarios and improved code formatting

Copy link
Collaborator

@matusdrobuliak66 matusdrobuliak66 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Member

@pcrespov pcrespov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thx

@sanderegg sanderegg force-pushed the autoscaling/bugfix/8273/ensure-warm-buffer-not-starting-does-not-block-autoscaling branch from 7dc87da to cb8d815 Compare September 1, 2025 11:28
@sonarqubecloud
Copy link

sonarqubecloud bot commented Sep 1, 2025

@sanderegg sanderegg added the 🤖-automerge marks PR as ready to be merged for Mergify label Sep 1, 2025
@sanderegg
Copy link
Member Author

@mergify queue

@mergify
Copy link
Contributor

mergify bot commented Sep 1, 2025

queue

🛑 Configuration not compatible with a branch protection setting

The branch protection setting Require branches to be up to date before merging is not compatible with max_parallel_checks>1, queue_conditions != merge_conditions and must be unset.

@sanderegg sanderegg merged commit 6ed663d into ITISFoundation:master Sep 1, 2025
144 of 148 checks passed
@sanderegg sanderegg deleted the autoscaling/bugfix/8273/ensure-warm-buffer-not-starting-does-not-block-autoscaling branch September 1, 2025 12:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

🤖-automerge marks PR as ready to be merged for Mergify a:autoscaling autoscaling service in simcore's stack

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Autoscaling: in case of InsufficientCapacityError while starting warm buffers, this blocks the autoscaling completely

4 participants