Skip to content

Conversation

jpsamaroo
Copy link
Member

At its core, this PR implements an numerical optimizer-based scheduler for Datadeps. This scheduler uses JuMP to implement the scheduler designed by @pszufe and documented at https://github.com/pszufe/DagScheduler. The idea of this scheduler is to aggressively, ahead-of-time optimize a Datadeps DAG based on all available information. This scheduler, by its nature, has the ability to make nearly-optimal scheduling decisions - this is different from our existing JIT-style schedulers, which don't optimize over the entire DAG, but only look at a few tasks currently in front of them.

To make this scheduler work, some additional improvements were made:

  • A new library, MetricsTracker.jl, was implemented to make it easy to declaratively configure which metrics to collect during task scheduling and execution. It also provides mechanisms to efficiently search through collected metric values for those matching a certain combination of target keys, like selected processor, task signature, and more. This is used by the scheduler to lookup information relevant to each task, like estimated execution time and transfer costs.
  • Schedules generated by Datadeps are now cached and reused, when possible, within the same session. Submitted DAGs are compared for similarity, and if a match is found, the previously-generated schedule is reused. This allows potentially expensive scheduling operations to be amortized when Datadeps operations are being called repeatedly.

Todo:

  • Think about a solution for stale metrics when reusing schedules
  • Add tests
  • Add docs

@jpsamaroo jpsamaroo force-pushed the jps/datadeps-opt-sched branch from c85eed4 to 256057d Compare March 31, 2025 19:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant