SCore uses the Stride Scheduling algorithm for process scheduling. Stride Scheduling is a proportional-share scheduling algorithm based on distributed tickets, which achieves fair competition and resource allocation among tasks in a simple and efficient manner.
The core idea of the Stride Scheduling algorithm is to assign a "stride" to each task, which is inversely proportional to its priority. The scheduler also maintains a "pass" value for each task. At each scheduling point, the system selects the task with the smallest pass value to run. After running, the task's pass value is incremented by its stride. This design ensures that tasks with higher priority (and thus smaller strides) are scheduled more frequently, achieving proportional sharing of the CPU while avoiding starvation. Because the algorithm relies solely on integer arithmetic, its computational overhead is low.
Although Stride Scheduling appears sound in theory, it faces some practical issues that require additional mechanisms to resolve.
When a new process joins the scheduler, if its pass value is initialized to 0, it will have a significantly higher priority than existing processes, especially those with large pass values. This can lead to the new process monopolizing the CPU for a period, potentially starving older processes. This situation undermines scheduling fairness, particularly in systems where tasks are frequently created.
To address this, SCore adopts a common strategy: when a process creates a child via fork, the child's pass value is initialized to its parent's current pass value. This ensures the new process joins the scheduling queue fairly.
During runtime, as tasks are continuously scheduled, their pass values gradually increase. When a pass value reaches the upper limit of its data type (e.g., u8), it wraps around to a small value. This overflow can lead to incorrect scheduling decisions.
For example, consider two processes with an initial stride of 10, using an 8-bit unsigned integer for the pass value:
- At a certain point, P1 has a pass value of 255, and P2 has a pass value of 250.
- P2 is scheduled to run. Its pass value should become 250 + 10 = 260. However, due to the 8-bit unsigned integer limit, the value overflows to
260 % 256 = 4. - In the next comparison, P1's pass is 255, and P2's is 4. Since 4 < 255, P2 will be incorrectly scheduled again.
To solve this, SCore implements a custom comparison logic for pass values that correctly handles wrap-around. This logic treats the numbers as if they are on a circle. If the distance between two pass values is large (more than half the circle), it assumes a wrap-around has occurred and inverts the comparison result.
The ProcessManager uses a priority queue (implemented with a BinaryHeap) to store ready processes and efficiently select the one with the smallest pass value for execution.
The core data structures for a process (PCB) and its inner mutable state (PCBInner) are defined as follows:
pub struct PCB {
pub pid: PidHandle,
pub kernel_stack: KernelStack,
inner: UPSafeCell<PCBInner>,
}
pub struct PCBInner {
pub prio: u8,
pub pass: Pass,
// ... other fields like trap_cx_ppn, task_cx, memory_set, etc.
pub process_status: ProcessStatus,
pub parent: Option<Weak<PCB>>,
pub children: Vec<Arc<PCB>>,
pub exit_code: i32,
}The ProcessManager holds the ready queue:
pub struct ProcessManager {
ready_queue: BinaryHeap<Reverse<Arc<PCB>>>,
}The add() method adds a new task (process) to the ProcessManager's ready queue:
pub fn add(&mut self, task: Arc<PCB>) {
self.ready_queue.push(Reverse(task));
}The fetch() method pops the process with the smallest pass value (i.e., the highest priority) from the ready_queue. Since BinaryHeap is a max-heap, we use Reverse when pushing the PCB to invert the sorting order. This makes the heap behave like a min-heap based on the pass value, ensuring that the process with the smallest pass value is popped.
pub fn fetch(&mut self) -> Option<Arc<PCB>> {
if let Some(Reverse(pcb)) = self.ready_queue.pop() {
Some(pcb)
} else {
None
}
}To enable comparison of PCB objects within the priority queue (based on their pass values), we need to implement the comparison traits (PartialEq, Eq, PartialOrd, Ord) for PCB:
impl PartialEq for PCB {
/// Compares two objects based on the size of their `pass` field.
fn eq(&self, other: &Self) -> bool {
self.inner.exclusive_access().pass == other.inner.exclusive_access().pass
}
}
impl Eq for PCB {}
impl PartialOrd for PCB {
/// Compares two objects based on the size of their `pass` field.
fn partial_cmp(&self, other: &Self) -> Option<core::cmp::Ordering> {
Some(self.inner.exclusive_access().pass.cmp(&other.inner.exclusive_access().pass))
}
}
impl Ord for PCB {
/// Compares two objects based on the size of their `pass` field.
fn cmp(&self, other: &Self) -> core::cmp::Ordering {
self.inner.exclusive_access().pass.cmp(&other.inner.exclusive_access().pass)
}
}If the difference between two pass values exceeds BigStride / 2, it indicates an overflow, and the comparison result is inverted:
fn cmp(&self, other: &Self) -> core::cmp::Ordering {
if self.0.abs_diff(other.0) <= u8::MAX / 2 {
self.0.cmp(&other.0)
} else {
other.0.cmp(&self.0)
}
}- The difference between 125 and 255 is 130, which is greater than
256 / 2 = 128. This indicates an overflow, so 125 is actually considered larger. - The difference between 129 and 255 is 126, which does not exceed 128, so a normal comparison is performed.
As shown in the figure, when Pass += stride overflows, the comparison result is inverted:
!image.png