-
Notifications
You must be signed in to change notification settings - Fork 32
🎨Computational backend: DV-2 computational scheduler becomes replicable (🗃️🚨) #6736
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🎨Computational backend: DV-2 computational scheduler becomes replicable (🗃️🚨) #6736
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #6736 +/- ##
==========================================
+ Coverage 86.93% 88.51% +1.58%
==========================================
Files 1553 1550 -3
Lines 61866 61702 -164
Branches 2110 2108 -2
==========================================
+ Hits 53781 54618 +837
+ Misses 7754 6753 -1001
Partials 331 331
Continue to review full report in Codecov by Sentry.
|
89783ed to
b53c9b8
Compare
4679bd1 to
bf472ba
Compare
bf472ba to
aab6aab
Compare
d27ff76 to
496aa9f
Compare
0fe3496 to
a0e3990
Compare
|
bisgaard-itis
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very interesting! Looking forward to seeing multiple director-v2 replicas. Unfortunately I am not deep enough in the details to give a more thorough review, but I did discover one little thing. 🙂



What do these changes do?
This PR heavily refactors the director-v2 internal computational scheduler by using tools to allow multiple director-v2 replicas to be able to share the load of scheduling computational pipelines.
As reminder the computational scheduler in the director-v2 is responsible for:
Until this PR, replicating the director-v2 would also duplicate network calls and end in wasted resources.
This PR aims to make the replication of the director-v2 more efficient by:
simcore.services.director-v2.scheduling,COMPUTATIONAL_BACKEND_SCHEDULING_CONCURRENCYis currently hard-coded to 50. Some testing will be necessary to see whether that is too low or too high. That is why this is not at the moment an ENV variable. It will be converted if necessary.comp_runstable is upgraded to contain new nullable scheduled and processed columns, this is used to keep track of when a pipeline was scheduled by the manager, and when the worker has processed it.comp_runstable is upgraded to contain only timezone-enabled timestampscomp_taskstable is upgraded to contain only timezone-enabled timestampsSchematic
--- config: theme: mc layout: dagre look: handDrawn --- flowchart LR subgraph s1["Director-v2.1"] n1["Manager:<br>5s: Schedule all pipelines"] n3["Worker1"] n4["Worker2"] n5["WorkerN"] n6["schedule pipeline1"] n7["schedule pipeline2"] n8["schedule pipeline3"] end subgraph s2["Cluster-UserX"] n9["Dask-Scheduler"] n11["Dask-Worker(s)"] end subgraph s3["Cluster-UserY"] n14["Dask-Scheduler"] n15["Dask-Worker(s)"] end subgraph s4["Cluster-UserZ"] n19["Dask-Scheduler"] n20["Dask-Worker(s)"] end subgraph s5["Director-v2.2"] n21["Manager:<br>5s: Schedule all pipelines"] n22["Worker1"] n23["Worker2"] n24["WorkerN"] n25["schedule pipeline4"] n26["schedule pipeline5"] n27["schedule pipeline6"] end subgraph s6["Cluster-UserA"] n90["Dask-Scheduler"] n91["Dask-Worker(s)"] end subgraph s7["Cluster-UserB"] n92["Dask-Scheduler"] n93["Dask-Worker(s)"] end subgraph s8["Cluster-UserC"] n94["Dask-Scheduler"] n95["Dask-Worker(s)"] end n1 ==> n2["RabbitMQ"] n2 --> n3 & n4 & n5 & n22 & n23 & n24 n3 --> n6 n4 --> n7 n5 --> n8 n6 --> n9 n7 --> n14 n8 --> n19 n22 --> n25 n23 --> n26 n24 --> n27 n21 ==> n2 n25 --> n90 n26 --> n92 n27 --> n94 n21@{ shape: rect} n22@{ shape: rect} n23@{ shape: rect} n24@{ shape: rect} n25@{ shape: rect} n26@{ shape: rect} n27@{ shape: rect} n2@{ shape: cyl} style n1 stroke:#D50000 style n6 stroke:#D50000 style n7 stroke:#D50000 style n8 stroke:#D50000 style n21 stroke:#D50000,stroke-width:1px,stroke-dasharray: 1 style n25 stroke:#D50000 style n26 stroke:#D50000 style n27 stroke:#D50000Legend:
Related issue/s
How to test
Dev-ops checklist