-
Notifications
You must be signed in to change notification settings - Fork 32
🐛Autoscaling: ensure unstarteable warm buffer are replaced by cold instances if available #8277
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #8277 +/- ##
==========================================
+ Coverage 89.31% 89.79% +0.48%
==========================================
Files 1678 1272 -406
Lines 65431 54789 -10642
Branches 828 225 -603
==========================================
- Hits 58438 49199 -9239
+ Misses 6774 5520 -1254
+ Partials 219 70 -149
Continue to review full report in Codecov by Sentry.
🚀 New features to boost your workflow:
|
0458399 to
40c258a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR fixes a bug in the autoscaling service where warm buffer instances that couldn't be started due to insufficient capacity would prevent new instances from being launched. The solution ensures that when warm buffer instances fail to start, their assigned tasks are de-assigned and can be fulfilled by launching new cold instances.
Key Changes
- Modified the warm buffer starting logic to handle
EC2InsufficientCapacityErrorexceptions gracefully - Added a mechanism to de-assign tasks from warm buffer instances that cannot be started
- Updated the autoscaling flow to retry task assignment with cold instances when warm buffers fail
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
services/autoscaling/src/simcore_service_autoscaling/modules/cluster_scaling/_auto_scaling_core.py |
Core autoscaling logic updated to handle warm buffer start failures and de-assign tasks for retry with cold instances |
services/autoscaling/tests/unit/test_modules_cluster_scaling_dynamic.py |
Removed xfail marker and increased test warm buffer count to properly test the fix |
packages/aws-library/tests/test_ec2_client.py |
Added test coverage for insufficient capacity scenarios and improved code formatting |
...es/autoscaling/src/simcore_service_autoscaling/modules/cluster_scaling/_auto_scaling_core.py
Show resolved
Hide resolved
...es/autoscaling/src/simcore_service_autoscaling/modules/cluster_scaling/_auto_scaling_core.py
Show resolved
Hide resolved
...es/autoscaling/src/simcore_service_autoscaling/modules/cluster_scaling/_auto_scaling_core.py
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thx
...es/autoscaling/src/simcore_service_autoscaling/modules/cluster_scaling/_auto_scaling_core.py
Show resolved
Hide resolved
7dc87da to
cb8d815
Compare
|
|
@mergify queue |
🛑 Configuration not compatible with a branch protection settingThe branch protection setting |



What do these changes do?
Related issue/s
How to test
Dev-ops