Skip to content

anujmundu/reinforcement-learning-job-scheduling

Repository files navigation

Reinforcement Learning for Job Scheduling

Python Gymnasium Stable Baselines3 License

Optimizing single-machine job scheduling to minimize makespan using Proximal Policy Optimization (PPO).

Overview

This project explores the application of Deep Reinforcement Learning (DRL) to the classic Job Scheduling Problem. using a custom Gymnasium environment, we trained a PPO agent to make dynamic scheduling decisions that minimize the total makespan (completion time of the last job).

The learned policy is benchmarked against traditional heuristic algorithms:

  • FCFS (First-Come, First-Served)
  • SJF (Shortest Job First) - The theoretical optimal for minimizing mean waiting time, but not necessarily makespan in all contexts.

Key Features

  • Custom RL Environment: A fully gymnasium-compatible environment representing the job queue and processor.
  • State-of-the-Art RL: Utilizes Proximal Policy Optimization (PPO) from stable-baselines3.
  • Dynamic Decision Making: The agent observes the current queue state (job sizes) and time to make real-time dispatching decisions.
  • Comprehensive Benchmarking: Direct comparison vs. heuristic baselines with Gantt chart visualizations.

Results & Visualization

We evaluated the agents on a stochastic stream of jobs. Below are the Gantt charts showing the execution order for the same set of jobs.

1. First-Come, First-Served (FCFS)

Baseline strategy: Processes jobs strictly in arrival order. FCFS Gantt Chart

2. Shortest Job First (SJF)

Heuristic strategy: Prioritizes the smallest job in the queue. SJF Gantt Chart

3. PPO Agent (RL)

Learned strategy: Adapts to the queue state to optimize throughput. PPO Gantt Chart

Performance Summary:

Method Average Makespan Description
SJF Lowest Often optimal for this specific single-machine setting.
PPO Competitive Learns to approximate SJF-like behavior without hard-coded rules.
FCFS Highest Inefficient as small jobs get blocked by large ones.

Installation

  1. Clone the repository:

    git clone https://github.com/yourusername/rl-job-scheduling.git
    cd rl-job-scheduling
  2. Create a virtual environment (optional but recommended):

    python -m venv venv
    # Windows
    .\venv\Scripts\activate
    # Mac/Linux
    source venv/bin/activate
  3. Install dependencies:

    pip install gymnasium stable-baselines3 matplotlib numpy shimmy

Usage

1. Train the PPO Agent

Train the reinforcement learning model from scratch.

python train_ppo.py

This will save the trained model as ppo_scheduler.zip.

2. Run Baselines

Evaluate the FCFS and SJF heuristics.

python fcfs_baseline.py
# or
python sjf_baseline.py

3. Visualize Results

Generate Gantt charts to see the schedule visually.

python visualize_ppo.py

Project Structure

├── job_scheduling_env.py   # Custom Gymnasium Environment
├── train_ppo.py            # Script to train the PPO agent
├── test_random_policy.py   # Baseline: Random actions
├── fcfs_baseline.py        # Baseline: First-Come First-Served
├── sjf_baseline.py         # Baseline: Shortest Job First
├── visualize_ppo.py        # load model & plot Gantt chart
├── plot_gantt.py           # Helper utility for plotting
├── img/                    # Validation images & Gantt charts
└── README.md               # Project Documentation

Technical Details

  • Observation Space: Includes the current time, remaining time of the current job, and a padded vector of the top $N$ job sizes in the queue.
  • Action Space: Discrete action space $[0, N-1]$ selecting which job from the queue to process next.
  • Reward Function:
    • Negative penalty per timestep (to encourage speed).
    • Large penalty for invalid actions (picking empty slots).
    • Positive reward for valid job completion.

Author

Anuj Mundu MCA Student | ML & RL Practitioner Focused on applied machine learning and real-world optimization problems.

GitHub: https://github.com/anujmundu | LinkedIn: https://linkedin.com/in/anujmundu