Planner redesign #1401

milroy · 2025-09-09T02:05:17Z

milroy
Sep 9, 2025
Maintainer

Motivation

Although planner provides O(log n) search complexity for the resource state at a given time and O(log n) search complexity for "the earliest schedulable point where your resource requirement can be satisfied," planner is complex, predisposed to memory mismanagement, and hard to maintain. It provides a large C API that maintains state for the C++ objects it owns, and is unlikely to be used outside Fluxion.

Early testing in 2024 on El Capitan indicated that planner can become a bottleneck on large clusters with a large number of jobs. Unfortunately I don't know where the test data went. A possible cause of the unexpectedly bad scaling is that while the "earliest time" tree indeed performs an O(log n) search, planner must check that the interval starting from the earliest discovered time satisfies the resource request count. That search is linear and involves removing and reinserting nodes in the "earliest time" tree (mt_tree_*):

flux-sched/resource/planner/c/planner_c_interface.cpp

Lines 165 to 179 in 8e1718d

    
           while ((start_point = ctx->plan->mt_tree_get_mintime (request))) { 
        
               at = start_point->at; 
        
               if (at < on_or_after) { 
        
                   ctx->plan->mt_tree_remove (start_point); 
        
                   track_points (ctx->plan->get_avail_time_iter (), start_point); 
        
                   at = -1; 
        
               } else if (span_ok (ctx, start_point, duration, request)) { 
        
                   ctx->plan->mt_tree_remove (start_point); 
        
                   track_points (ctx->plan->get_avail_time_iter (), start_point); 
        
                   if (static_cast<int64_t> (at + duration) > ctx->plan->get_plan_end ()) 
        
                       at = -1; 
        
                   break; 
        
               } 
        
           }

The overall complexity is bounded below by O(n log n).

`Boost.MultiIndex`

With the goal of reproducing planner with a simplified code structure, I developed a prototype "planner2" based on Boost.MultiIndex: https://github.com/milroy/flux-sched/tree/planner-v2. After implementing a large number of planner unit tests, I realized that Boost.MultiIndex doesn't support two-dimensional search on composite keys (e.g., a time, resource count pair). The query is multi-dimensional, so there is no way to sort by time and guarantee a total order on resource count (or vice-versa). Put more simply, the current planner provides an augmented red-black tree search, while Boost.MultiIndex does not.

k-d tree

It possible to perform a single, O(log n) search that returns the earliest time at which a given number of resources is available for a desired interval of time. A k-d tree (specifically, 3-D here) provides this capability, while maintaining O(log n) insertion and removal: https://en.wikipedia.org/wiki/K-d_tree. It should be possible to implement the required functionality of planner with a single k-d tree. nanoflann provides a C++ header-only library with a k-d tree: https://github.com/jlblancoc/nanoflann

`Boost.icl`

Along the course of implementing planner features in "planner2" I discovered that handling searches on open and closed intervals results in iterator edge cases that require complicated gymnastics and conditionals. It would be nice to use an interval library like Boost.icl: https://www.boost.org/doc/libs/latest/libs/icl/doc/html/index.html. Boost.icl could handle interval computations and aggregation, storing a value per interval with the number of free resources, and the max length of time (extending beyond the interval end) for which at least that quantity is available. Then the value can be used together with the interval start time as a 3-D key in the k-d tree.

Handling max intervals to enable a single O(log n) search

Insertion a (time, resource count, max available interval) into Boost.icl would require computing the max length of time for which at least the current request quantity is available. To do so, the insertion must check the next interval to determine if the free count is greater than or equal to the current free count. If so, that max interval is added to the interval length to be inserted. Removals would be more complicated, because multiple earlier intervals could need their max available interval values updated. It would be possible to avoid a linear search for these intervals by embedding start times of max available intervals in each scheduled point time. The additional memory overhead and complexity may be prohibitive.

Proposed work

I propose the following investigation:

Profile the current planner to identify scalability limitations, identify bottlenecks, and understand memory usage.
Create an experimental implementation of a k-d tree-based planner based on nanoflann to determine if all planner functionality can be provided by a single data structure.
Create an experimental scheduled point interval implementation based on Boost.icl and determine if it simplifies the interval computations.

After those items are complete, we can assess whether further work and performance testing is warranted. Note that it may be worthwhile to restructure the current planner if some combination of new data structures can result in a more maintainable planner even without performance or memory-usage improvements.

Let me know what you think!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Planner redesign #1401

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Planner redesign #1401

Uh oh!

milroy Sep 9, 2025 Maintainer

Motivation

Boost.MultiIndex

k-d tree

Boost.icl

Handling max intervals to enable a single O(log n) search

Proposed work

Replies: 0 comments

milroy
Sep 9, 2025
Maintainer

`Boost.MultiIndex`

`Boost.icl`