Skip to content

Conversation

@GitHK
Copy link
Contributor

@GitHK GitHK commented Sep 4, 2025

What do these changes do?

Please πŸ™ do not panic! There is a lot of code but the core of it composed of 1000 lines, where most of the attention is required.

Suggested review order: focus your review starting from services/dynamic-scheduler/src/simcore_service_dynamic_scheduler/services/generic_scheduler/ in the following order:

  • start with a quick glance at _api.py to have an idea of the methods the scheduler exposes
  • continue with _core.py and look directly at _on_schedule_even which introduces the logic behind the scheduler works
  • to figure out the structure of an Operation look at _operation.py, it might make more sense to read it form the bottom to the top

There is test coverage for all modules, for some it's pretty extensive.


This is the first of two PRs, The second one will bring a much required interface for monitoring the status of the scheduler.

The generic_scheduler module is the core of the dynamic-scheduler service and was built to take into consideration some important requirements:

  • capability to recover in case of service restart
  • capability to recover in case of dependency outage (Redis or RabbitMQ)
  • ability to run in multiple instances
  • ability to define a sequence of small retryable and revertable actions
  • ability to define multiple sequence
  • ability to define actions that can be repeated forever (example usage: service status monitoring)
  • ability to automatically and predictably cleanup after errors
  • ability to define custom retry policies for each single defined action
  • ability to explicitly and clearly define and control the sequence of actions to execute

During the last years it became apparent that we needed more flexibility during the starting and and closing of services. The idea here is to have a robust system that runs as much as possible without human intervention that can be easily debugged in case of issues.

Related issue/s

How to test

Dev-ops

@GitHK GitHK self-assigned this Sep 4, 2025
@GitHK GitHK added this to the Cheops milestone Sep 4, 2025
@codecov
Copy link

codecov bot commented Sep 4, 2025

Codecov Report

❌ Patch coverage is 97.31243% with 24 lines in your changes missing coverage. Please review.
βœ… Project coverage is 87.92%. Comparing base (4b41a26) to head (6d78813).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #8307      +/-   ##
==========================================
+ Coverage   87.83%   87.92%   +0.09%     
==========================================
  Files        1964     1976      +12     
  Lines       76341    77220     +879     
  Branches     1342     1342              
==========================================
+ Hits        67051    67897     +846     
- Misses       8886     8919      +33     
  Partials      404      404              
Flag Coverage Ξ”
integrationtests 64.15% <ΓΈ> (-0.11%) ⬇️
unittests 86.62% <97.31%> (+0.12%) ⬆️
Components Coverage Ξ”
pkg_aws_library 93.59% <ΓΈ> (ΓΈ)
pkg_celery_library 83.41% <ΓΈ> (ΓΈ)
pkg_dask_task_models_library 79.33% <ΓΈ> (ΓΈ)
pkg_models_library 93.08% <ΓΈ> (ΓΈ)
pkg_notifications_library 85.20% <ΓΈ> (ΓΈ)
pkg_postgres_database 87.95% <ΓΈ> (ΓΈ)
pkg_service_integration 70.19% <ΓΈ> (ΓΈ)
pkg_service_library 72.55% <ΓΈ> (ΓΈ)
pkg_settings_library 90.19% <ΓΈ> (ΓΈ)
pkg_simcore_sdk 84.99% <ΓΈ> (ΓΈ)
agent 93.53% <ΓΈ> (ΓΈ)
api_server 91.94% <ΓΈ> (ΓΈ)
autoscaling 95.74% <ΓΈ> (ΓΈ)
catalog 92.36% <ΓΈ> (ΓΈ)
clusters_keeper 99.13% <ΓΈ> (ΓΈ)
dask_sidecar 91.82% <ΓΈ> (ΓΈ)
datcore_adapter 97.94% <ΓΈ> (ΓΈ)
director 75.81% <ΓΈ> (ΓΈ)
director_v2 90.90% <ΓΈ> (-0.02%) ⬇️
dynamic_scheduler 96.68% <97.31%> (+0.54%) ⬆️
dynamic_sidecar 90.43% <ΓΈ> (ΓΈ)
efs_guardian 89.62% <ΓΈ> (ΓΈ)
invitations 91.44% <ΓΈ> (ΓΈ)
payments 92.62% <ΓΈ> (ΓΈ)
resource_usage_tracker 92.18% <ΓΈ> (+0.05%) ⬆️
storage 86.74% <ΓΈ> (+0.08%) ⬆️
webclient βˆ… <ΓΈ> (βˆ…)
webserver 87.68% <ΓΈ> (-0.07%) ⬇️

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Ξ” = absolute <relative> (impact), ΓΈ = not affected, ? = missing data
Powered by Codecov. Last update 4b41a26...6d78813. Read the comment docs.

πŸš€ New features to boost your workflow:
  • πŸ“¦ JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@mergify
Copy link
Contributor

mergify bot commented Sep 4, 2025

πŸ§ͺ CI Insights

Here's what we observed from your CI run for 6d78813.

βœ… Passed Jobs With Interesting Signals

Pipeline Job Signal Health on master Retries πŸ” CI Insights πŸ“„ Logs
CI integration-tests Base branch is healthy, but retries were needed. Could be early signs of flakiness πŸ‘€ Healthy 2 View View
system-tests Base branch is broken, but the job passed. Looks like this might be a real fix πŸ’ͺ Broken 0 View View

@GitHK GitHK requested a review from pcrespov September 29, 2025 14:38
Copy link
Member

@pcrespov pcrespov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is too long for me to review in full detail. TBH I don’t fully understand all of the logic, but it seems this is still inactive after this PR, so I don’t think we should delay it further. Hopefully the other colleagues can review the remaining parts.

@sonarqubecloud
Copy link

sonarqubecloud bot commented Oct 1, 2025

@giancarloromeo giancarloromeo changed the title ✨ Adding generic scheduling capability to to dynamic-schduler [part 1/2] ✨ Adding generic scheduling capability to dynamic-schduler [part 1/2] Oct 1, 2025
@giancarloromeo giancarloromeo changed the title ✨ Adding generic scheduling capability to dynamic-schduler [part 1/2] ✨ Adding generic scheduling capability to dynamic-scheduler [part 1/2] Oct 1, 2025
@GitHK GitHK enabled auto-merge (squash) October 1, 2025 07:29
@GitHK GitHK merged commit 2eb46b7 into ITISFoundation:master Oct 1, 2025
196 of 201 checks passed
@GitHK GitHK deleted the pr-osparc-migrate-dy-scheduler-part2 branch October 1, 2025 09:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants