Skip to content

Conversation

@bisgaard-itis
Copy link
Contributor

@bisgaard-itis bisgaard-itis commented Oct 9, 2025

What do these changes do?

  • This PR follows up on the incident https://git.speag.com/oSparc/osparc-infra/-/issues/incident/81. See that report for the motivation for the changes done here
  • It changes the polling strategy the api-server (and its celery worker) uses when cloning projects. before a request would be sent every 0.5seconds. Now an exponential backoff strategy is use.
  • The PR also introduces a new Gauge Prometheus metric which records how many tasks are in the python asyncio event loop. This is done both for fastapi and aiohttp services. I will then follow up and add graphs in grafana so we can monitor this.

Related issue/s

How to test

  • Unit tests have been added in the PR

Dev-ops

@bisgaard-itis bisgaard-itis self-assigned this Oct 9, 2025
@codecov
Copy link

codecov bot commented Oct 9, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 87.43%. Comparing base (e6e1581) to head (bba912f).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #8491      +/-   ##
==========================================
- Coverage   87.62%   87.43%   -0.20%     
==========================================
  Files        2001     1571     -430     
  Lines       77893    65180   -12713     
  Branches     1338      683     -655     
==========================================
- Hits        68250    56987   -11263     
+ Misses       9243     7954    -1289     
+ Partials      400      239     -161     
Flag Coverage Δ
integrationtests 64.10% <ø> (+<0.01%) ⬆️
unittests 85.88% <100.00%> (-0.44%) ⬇️
Components Coverage Δ
pkg_aws_library ∅ <ø> (∅)
pkg_celery_library ∅ <ø> (∅)
pkg_dask_task_models_library ∅ <ø> (∅)
pkg_models_library ∅ <ø> (∅)
pkg_notifications_library ∅ <ø> (∅)
pkg_postgres_database ∅ <ø> (∅)
pkg_service_integration ∅ <ø> (∅)
pkg_service_library 70.98% <100.00%> (+0.02%) ⬆️
pkg_settings_library ∅ <ø> (∅)
pkg_simcore_sdk 84.89% <ø> (-0.12%) ⬇️
agent 93.10% <ø> (ø)
api_server 91.87% <100.00%> (-0.01%) ⬇️
autoscaling 95.72% <ø> (ø)
catalog 92.06% <ø> (ø)
clusters_keeper 99.14% <ø> (ø)
dask_sidecar 91.81% <ø> (-0.57%) ⬇️
datcore_adapter 97.95% <ø> (ø)
director 75.72% <ø> (ø)
director_v2 90.86% <ø> (-0.05%) ⬇️
dynamic_scheduler 96.80% <ø> (ø)
dynamic_sidecar 90.44% <ø> (ø)
efs_guardian 89.83% <ø> (ø)
invitations 90.90% <ø> (ø)
payments 92.80% <ø> (ø)
resource_usage_tracker 92.11% <ø> (-0.11%) ⬇️
storage 86.41% <ø> (-0.09%) ⬇️
webclient ∅ <ø> (∅)
webserver 87.30% <ø> (+0.02%) ⬆️

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e6e1581...bba912f. Read the comment docs.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@mergify
Copy link
Contributor

mergify bot commented Oct 9, 2025

🧪 CI Insights

Here's what we observed from your CI run for bba912f.

✅ Passed Jobs With Interesting Signals

Pipeline Job Signal Health on master Retries 🔍 CI Insights 📄 Logs
CI build-test-images (frontend) / build-test-images Base branch is healthy, but retries were needed. Could be early signs of flakiness 👀 Healthy 1 View View
integration-tests Base branch is healthy, but retries were needed. Could be early signs of flakiness 👀 Healthy 1 View View
system-tests Base branch is healthy, but retries were needed. Could be early signs of flakiness 👀 Healthy 1 View View

@bisgaard-itis bisgaard-itis changed the title use exponential backoff when polling long running tasks in api-server Follow up to osparc.io incident Oct 9, 2025
@bisgaard-itis bisgaard-itis added the t:maintenance Some planned maintenance work label Oct 9, 2025
@bisgaard-itis bisgaard-itis added this to the Cheops milestone Oct 9, 2025
@bisgaard-itis bisgaard-itis marked this pull request as ready for review October 9, 2025 10:27
Copy link
Contributor

@wvangeit wvangeit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

Copy link
Member

@pcrespov pcrespov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thx!

@bisgaard-itis
Copy link
Contributor Author

@Mergifyio queue

@bisgaard-itis bisgaard-itis added the 🤖-automerge marks PR as ready to be merged for Mergify label Oct 9, 2025
@mergify
Copy link
Contributor

mergify bot commented Oct 9, 2025

queue

🛑 Configuration not compatible with a branch protection setting

The branch protection setting Require branches to be up to date before merging is not compatible with draft PR checks. To keep this branch protection enabled, update your Mergify configuration to enable in-place checks: set merge_queue.max_parallel_checks: 1, set every queue rule batch_size: 1, and avoid two-step CI (make merge_conditions identical to queue_conditions). Otherwise, disable this branch protection.

@bisgaard-itis bisgaard-itis enabled auto-merge (squash) October 9, 2025 13:49
@sonarqubecloud
Copy link

sonarqubecloud bot commented Oct 9, 2025

@bisgaard-itis
Copy link
Contributor Author

@Mergifyio queue

@mergify
Copy link
Contributor

mergify bot commented Oct 9, 2025

queue

🛑 Configuration not compatible with a branch protection setting

The branch protection setting Require branches to be up to date before merging is not compatible with draft PR checks. To keep this branch protection enabled, update your Mergify configuration to enable in-place checks: set merge_queue.max_parallel_checks: 1, set every queue rule batch_size: 1, and avoid two-step CI (make merge_conditions identical to queue_conditions). Otherwise, disable this branch protection.

@bisgaard-itis bisgaard-itis merged commit 4b88d15 into ITISFoundation:master Oct 9, 2025
146 of 148 checks passed
@bisgaard-itis bisgaard-itis deleted the use-backoff-polling-strategy-and-expose-event-loop-tasks-to-prometheus branch October 10, 2025 05:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

🤖-automerge marks PR as ready to be merged for Mergify t:maintenance Some planned maintenance work

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants