-
Notifications
You must be signed in to change notification settings - Fork 32
🐛Autoscaling: Fix lost stopped EC2 instances and missing error logs #7493
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🐛Autoscaling: Fix lost stopped EC2 instances and missing error logs #7493
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #7493 +/- ##
==========================================
- Coverage 87.43% 87.34% -0.10%
==========================================
Files 1741 1389 -352
Lines 67322 57673 -9649
Branches 1142 640 -502
==========================================
- Hits 58865 50374 -8491
+ Misses 8136 7104 -1032
+ Partials 321 195 -126
Continue to review full report in Codecov by Sentry.
🚀 New features to boost your workflow:
|
matusdrobuliak66
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks 👍
|
@mergify queue |
🟠 Waiting for conditions to match
|
services/autoscaling/src/simcore_service_autoscaling/modules/auto_scaling_core.py
Outdated
Show resolved
Hide resolved
15ca6ed to
1aa81fb
Compare
|



What do these changes do?
It was found that sometimes there is a number of
stoppedEC2 instances that are showing the tags of astartedmachine in the AWS console (e.g. Name without-buffersuffix but with `io.simcore.autoscaling.buffer_machine' tag set to false).After analysis via AWS CloudTrail, it was found that the starting of the instance failed due to InsufficientInstanceCapacity

⚠️ From the Graylog trace it looks like autoscaling tries to start but the background task framework in in simcore is not logging errors, this was fixed as well.
Related issue/s
How to test
Dev-ops checklist