Optimizing single-machine job scheduling to minimize makespan using Proximal Policy Optimization (PPO).
This project explores the application of Deep Reinforcement Learning (DRL) to the classic Job Scheduling Problem. using a custom Gymnasium environment, we trained a PPO agent to make dynamic scheduling decisions that minimize the total makespan (completion time of the last job).
The learned policy is benchmarked against traditional heuristic algorithms:
- FCFS (First-Come, First-Served)
- SJF (Shortest Job First) - The theoretical optimal for minimizing mean waiting time, but not necessarily makespan in all contexts.
- Custom RL Environment: A fully
gymnasium-compatible environment representing the job queue and processor. - State-of-the-Art RL: Utilizes Proximal Policy Optimization (PPO) from
stable-baselines3. - Dynamic Decision Making: The agent observes the current queue state (job sizes) and time to make real-time dispatching decisions.
- Comprehensive Benchmarking: Direct comparison vs. heuristic baselines with Gantt chart visualizations.
We evaluated the agents on a stochastic stream of jobs. Below are the Gantt charts showing the execution order for the same set of jobs.
Baseline strategy: Processes jobs strictly in arrival order.

Heuristic strategy: Prioritizes the smallest job in the queue.

Learned strategy: Adapts to the queue state to optimize throughput.

Performance Summary:
| Method | Average Makespan | Description |
|---|---|---|
| SJF | Lowest | Often optimal for this specific single-machine setting. |
| PPO | Competitive | Learns to approximate SJF-like behavior without hard-coded rules. |
| FCFS | Highest | Inefficient as small jobs get blocked by large ones. |
-
Clone the repository:
git clone https://github.com/yourusername/rl-job-scheduling.git cd rl-job-scheduling -
Create a virtual environment (optional but recommended):
python -m venv venv # Windows .\venv\Scripts\activate # Mac/Linux source venv/bin/activate
-
Install dependencies:
pip install gymnasium stable-baselines3 matplotlib numpy shimmy
Train the reinforcement learning model from scratch.
python train_ppo.pyThis will save the trained model as ppo_scheduler.zip.
Evaluate the FCFS and SJF heuristics.
python fcfs_baseline.py
# or
python sjf_baseline.pyGenerate Gantt charts to see the schedule visually.
python visualize_ppo.py├── job_scheduling_env.py # Custom Gymnasium Environment
├── train_ppo.py # Script to train the PPO agent
├── test_random_policy.py # Baseline: Random actions
├── fcfs_baseline.py # Baseline: First-Come First-Served
├── sjf_baseline.py # Baseline: Shortest Job First
├── visualize_ppo.py # load model & plot Gantt chart
├── plot_gantt.py # Helper utility for plotting
├── img/ # Validation images & Gantt charts
└── README.md # Project Documentation
-
Observation Space: Includes the current time, remaining time of the current job, and a padded vector of the top
$N$ job sizes in the queue. -
Action Space: Discrete action space
$[0, N-1]$ selecting which job from the queue to process next. -
Reward Function:
- Negative penalty per timestep (to encourage speed).
- Large penalty for invalid actions (picking empty slots).
- Positive reward for valid job completion.
Anuj Mundu MCA Student | ML & RL Practitioner Focused on applied machine learning and real-world optimization problems.
GitHub: https://github.com/anujmundu | LinkedIn: https://linkedin.com/in/anujmundu