Skip to content

Conversation

@sanderegg
Copy link
Member

@sanderegg sanderegg commented Apr 9, 2025

What do these changes do?

It was found that sometimes there is a number of stopped EC2 instances that are showing the tags of a started machine in the AWS console (e.g. Name without -buffer suffix but with `io.simcore.autoscaling.buffer_machine' tag set to false).

After analysis via AWS CloudTrail, it was found that the starting of the instance failed due to InsufficientInstanceCapacity
image
⚠️ From the Graylog trace it looks like autoscaling tries to start but the background task framework in in simcore is not logging errors, this was fixed as well.

Related issue/s

How to test

Dev-ops checklist

@sanderegg sanderegg added the a:autoscaling autoscaling service in simcore's stack label Apr 9, 2025
@sanderegg sanderegg added this to the Pauwel Kwak milestone Apr 9, 2025
@sanderegg sanderegg self-assigned this Apr 9, 2025
@codecov
Copy link

codecov bot commented Apr 9, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 87.34%. Comparing base (0dc7846) to head (1aa81fb).
Report is 1 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #7493      +/-   ##
==========================================
- Coverage   87.43%   87.34%   -0.10%     
==========================================
  Files        1741     1389     -352     
  Lines       67322    57673    -9649     
  Branches     1142      640     -502     
==========================================
- Hits        58865    50374    -8491     
+ Misses       8136     7104    -1032     
+ Partials      321      195     -126     
Flag Coverage Δ
integrationtests 65.16% <ø> (+0.04%) ⬆️
unittests 86.38% <100.00%> (-0.24%) ⬇️
Components Coverage Δ
api ∅ <ø> (∅)
pkg_aws_library 93.91% <ø> (ø)
pkg_dask_task_models_library ∅ <ø> (∅)
pkg_models_library ∅ <ø> (∅)
pkg_notifications_library ∅ <ø> (∅)
pkg_postgres_database ∅ <ø> (∅)
pkg_service_integration ∅ <ø> (∅)
pkg_service_library 72.82% <100.00%> (+0.03%) ⬆️
pkg_settings_library ∅ <ø> (∅)
pkg_simcore_sdk 85.40% <ø> (ø)
agent 96.46% <ø> (ø)
api_server 90.02% <ø> (ø)
autoscaling 96.08% <100.00%> (ø)
catalog 91.92% <ø> (ø)
clusters_keeper 99.24% <ø> (ø)
dask_sidecar 91.29% <ø> (ø)
datcore_adapter 98.12% <ø> (ø)
director 76.78% <ø> (ø)
director_v2 91.30% <ø> (+0.02%) ⬆️
dynamic_scheduler 97.35% <ø> (ø)
dynamic_sidecar 90.11% <ø> (ø)
efs_guardian 89.79% <ø> (ø)
invitations 93.28% <ø> (ø)
payments 92.66% <ø> (ø)
resource_usage_tracker 89.12% <ø> (-0.11%) ⬇️
storage 87.83% <ø> (+0.14%) ⬆️
webclient ∅ <ø> (∅)
webserver 85.88% <ø> (-0.01%) ⬇️

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0dc7846...1aa81fb. Read the comment docs.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@sanderegg sanderegg marked this pull request as ready for review April 9, 2025 07:46
@sanderegg sanderegg requested a review from pcrespov as a code owner April 9, 2025 07:46
@sanderegg sanderegg changed the title 🐛Autoscaling: Fix lost stopped EC2 instances 🐛Autoscaling: Fix lost stopped EC2 instances and missing error logs Apr 9, 2025
Copy link
Collaborator

@matusdrobuliak66 matusdrobuliak66 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks 👍

@sanderegg sanderegg added the 🤖-automerge marks PR as ready to be merged for Mergify label Apr 9, 2025
@sanderegg
Copy link
Member Author

@mergify queue

@mergify
Copy link
Contributor

mergify bot commented Apr 9, 2025

queue

🟠 Waiting for conditions to match

  • -closed [📌 queue requirement]
  • any of: [🔀 queue conditions]
    • all of: [📌 queue conditions of queue default]
      • #approved-reviews-by >= 2 [🛡 GitHub branch protection]
      • #approved-reviews-by>=2
      • branch-protection-review-decision = APPROVED [🛡 GitHub branch protection]
      • #changes-requested-reviews-by = 0 [🛡 GitHub branch protection]
      • #changes-requested-reviews-by=0
      • #review-threads-unresolved = 0 [🛡 GitHub branch protection]
      • #review-threads-unresolved=0
      • -conflict
      • -draft
      • base=master
      • label!=🤖-do-not-merge
      • label=🤖-automerge
      • any of: [🛡 GitHub branch protection]
        • check-skipped = deploy to dockerhub
        • check-neutral = deploy to dockerhub
        • check-success = deploy to dockerhub
      • any of: [🛡 GitHub branch protection]
        • check-success = system-tests
        • check-neutral = system-tests
        • check-skipped = system-tests
      • any of: [🛡 GitHub branch protection]
        • check-success = unit-tests
        • check-neutral = unit-tests
        • check-skipped = unit-tests
      • any of: [🛡 GitHub branch protection]
        • check-success = check OAS' are up to date
        • check-neutral = check OAS' are up to date
        • check-skipped = check OAS' are up to date
      • any of: [🛡 GitHub branch protection]
        • check-success = integration-tests
        • check-neutral = integration-tests
        • check-skipped = integration-tests
      • any of: [🛡 GitHub branch protection]
        • check-success = [build] docker images (excluding frontend) (3.11, ubuntu-24.04)
        • check-neutral = [build] docker images (excluding frontend) (3.11, ubuntu-24.04)
        • check-skipped = [build] docker images (excluding frontend) (3.11, ubuntu-24.04)
  • -conflict [📌 queue requirement]
  • -draft [📌 queue requirement]
  • any of: [📌 queue -> configuration change requirements]
    • -mergify-configuration-changed
    • check-success = Configuration changed

@sanderegg sanderegg force-pushed the autoscaling/bugfix/ensure-no-stopped-buffer-lying-around branch from 15ca6ed to 1aa81fb Compare April 9, 2025 20:10
@sonarqubecloud
Copy link

sonarqubecloud bot commented Apr 9, 2025

@sanderegg sanderegg merged commit 2aa5080 into ITISFoundation:master Apr 9, 2025
94 checks passed
@sanderegg sanderegg deleted the autoscaling/bugfix/ensure-no-stopped-buffer-lying-around branch April 9, 2025 20:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

🤖-automerge marks PR as ready to be merged for Mergify a:autoscaling autoscaling service in simcore's stack

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Autoscaling: machines that are stopped and not set as buffer machines are lying around

3 participants