Skip to content

Conversation

@sanderegg
Copy link
Member

@sanderegg sanderegg commented Nov 8, 2024

What do these changes do?

Until now

  • dv-2 internal computational scheduler is run as a background task that concurrently schedules every ongoing computational job:
    • every time a new pipeline is started, it re-schedules all the pipelines even if unnecessary,
    • every time a pipeline is requested to stop, it re-schedules all the pipelines even if unnecessary,
    • with multiple replicas this generates a load of unnecessary scheduling as all the replicas would schedule the same pipelines,

What this PR brings

  • exclusive decorator changes (redis_utils.py, test_redis_utils.py):

    • decorator typing is now fixed
    • decorator can now create the lock_key dynamically via a lock_key_builder callable (a la aiocache),
    • decorator can now get the redis client dynamically via a redis_client_builder callable (a la aiocache),
  • dv-2 scheduling is now running 1 asyncio.Task per computational pipeline:

    • each pipeline is now scheduled independently from the others every 5 seconds,
    • when a pipeline is started, only that very pipeline is scheduled independently of any other pipeline,
    • when a pipeline is requested to stop, only that very pipeline is scheduled independently of any other pipeline,
    • when a computational node inside a pipeline is complete, only the very pipeline it lives in is re-scheduled without waiting for 5 seconds,
    • when a pipeline is complete (success, cancelled or errored), the scheduling task is removed,
    • with multiple replicas of the dv-2, only 1 replica schedule a pipeline at a time, the other replicas will skip it as the scheduling method is made exclusive

What this PR does not yet bring (will come in part 2 and is not part of this PR!!!)

  • mulitple replicas of the dv-2 shall share the load of scheduling pipelines, which will probably be done via RabbitMQ

Related issue/s

How to test

Dev-ops checklist

@sanderegg sanderegg added the a:director-v2 issue related with the director-v2 service label Nov 8, 2024
@sanderegg sanderegg added this to the Event Horizon milestone Nov 8, 2024
@sanderegg sanderegg self-assigned this Nov 8, 2024
@codecov
Copy link

codecov bot commented Nov 8, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 64.44%. Comparing base (17f486e) to head (d9df523).
Report is 1 commits behind head on master.

❗ There is a different number of reports uploaded between BASE (17f486e) and HEAD (d9df523). Click for more details.

HEAD has 30 uploads less than BASE
Flag BASE (17f486e) HEAD (d9df523)
unittests 31 1
Additional details and impacted files
@@             Coverage Diff             @@
##           master    #6696       +/-   ##
===========================================
- Coverage   87.73%   64.44%   -23.30%     
===========================================
  Files        1571      657      -914     
  Lines       63271    33103    -30168     
  Branches     2118      265     -1853     
===========================================
- Hits        55512    21333    -34179     
- Misses       7433    11709     +4276     
+ Partials      326       61      -265     
Flag Coverage Δ *Carryforward flag
integrationtests 64.85% <ø> (+<0.01%) ⬆️ Carriedforward from 17f486e
unittests 58.38% <ø> (-27.33%) ⬇️

*This pull request uses carry forward flags. Click here to find out more.

Components Coverage Δ
api ∅ <ø> (∅)
pkg_aws_library ∅ <ø> (∅)
pkg_dask_task_models_library ∅ <ø> (∅)
pkg_models_library ∅ <ø> (∅)
pkg_notifications_library ∅ <ø> (∅)
pkg_postgres_database ∅ <ø> (∅)
pkg_service_integration ∅ <ø> (∅)
pkg_service_library ∅ <ø> (∅)
pkg_settings_library ∅ <ø> (∅)
pkg_simcore_sdk 77.44% <ø> (-7.84%) ⬇️
agent ∅ <ø> (∅)
api_server ∅ <ø> (∅)
autoscaling ∅ <ø> (∅)
catalog ∅ <ø> (∅)
clusters_keeper ∅ <ø> (∅)
dask_sidecar ∅ <ø> (∅)
datcore_adapter ∅ <ø> (∅)
director 58.38% <ø> (ø)
director_v2 76.42% <ø> (-14.38%) ⬇️
dynamic_scheduler ∅ <ø> (∅)
dynamic_sidecar 59.79% <ø> (-30.01%) ⬇️
efs_guardian ∅ <ø> (∅)
invitations ∅ <ø> (∅)
osparc_gateway_server 79.42% <ø> (-5.73%) ⬇️
payments ∅ <ø> (∅)
resource_usage_tracker ∅ <ø> (∅)
storage ∅ <ø> (∅)
webclient ∅ <ø> (∅)
webserver 59.79% <ø> (-28.95%) ⬇️

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 17f486e...d9df523. Read the comment docs.

@sanderegg sanderegg force-pushed the computation-backend/release-scheduler-gil branch 2 times, most recently from 236a7d8 to b6428ee Compare November 11, 2024 14:44
@sanderegg sanderegg changed the title 🎨Computation backend: release director-v2 scheduler GIL 🎨Computation backend: director-v2 make computational scheduler share load among replicas Nov 12, 2024
@sanderegg sanderegg force-pushed the computation-backend/release-scheduler-gil branch 2 times, most recently from ef5a649 to 050fb67 Compare November 12, 2024 16:21
@sanderegg sanderegg changed the title 🎨Computation backend: director-v2 make computational scheduler share load among replicas 🎨Computation backend: refactor director-v2 internal computational scheduler to be less resource heavy Nov 12, 2024
@sanderegg sanderegg marked this pull request as ready for review November 12, 2024 16:46
Copy link
Member

@pcrespov pcrespov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool! thx

Copy link
Contributor

@bisgaard-itis bisgaard-itis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot! Just a few comments/questions from my side.

Copy link
Collaborator

@matusdrobuliak66 matusdrobuliak66 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool!

@sanderegg sanderegg force-pushed the computation-backend/release-scheduler-gil branch 2 times, most recently from 04a5915 to e0b03da Compare November 14, 2024 09:50
@sanderegg sanderegg force-pushed the computation-backend/release-scheduler-gil branch from dc69b24 to d9df523 Compare November 14, 2024 10:17
@sonarqubecloud
Copy link

@sanderegg sanderegg enabled auto-merge (squash) November 14, 2024 10:50
@sanderegg sanderegg merged commit 0781e63 into ITISFoundation:master Nov 14, 2024
88 of 89 checks passed
@sanderegg sanderegg deleted the computation-backend/release-scheduler-gil branch November 14, 2024 11:16
@matusdrobuliak66 matusdrobuliak66 mentioned this pull request Jan 15, 2025
58 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

a:director-v2 issue related with the director-v2 service

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants