Reinforcement Learning for Job Scheduling

Optimizing single-machine job scheduling to minimize makespan using Proximal Policy Optimization (PPO).

Overview

This project explores the application of Deep Reinforcement Learning (DRL) to the classic Job Scheduling Problem. using a custom Gymnasium environment, we trained a PPO agent to make dynamic scheduling decisions that minimize the total makespan (completion time of the last job).

The learned policy is benchmarked against traditional heuristic algorithms:

FCFS (First-Come, First-Served)
SJF (Shortest Job First) - The theoretical optimal for minimizing mean waiting time, but not necessarily makespan in all contexts.

Key Features

Custom RL Environment: A fully gymnasium-compatible environment representing the job queue and processor.
State-of-the-Art RL: Utilizes Proximal Policy Optimization (PPO) from stable-baselines3.
Dynamic Decision Making: The agent observes the current queue state (job sizes) and time to make real-time dispatching decisions.
Comprehensive Benchmarking: Direct comparison vs. heuristic baselines with Gantt chart visualizations.

Results & Visualization

We evaluated the agents on a stochastic stream of jobs. Below are the Gantt charts showing the execution order for the same set of jobs.

1. First-Come, First-Served (FCFS)

Baseline strategy: Processes jobs strictly in arrival order.

2. Shortest Job First (SJF)

Heuristic strategy: Prioritizes the smallest job in the queue.

3. PPO Agent (RL)

Learned strategy: Adapts to the queue state to optimize throughput.

Performance Summary:

Method	Average Makespan	Description
SJF	Lowest	Often optimal for this specific single-machine setting.
PPO	Competitive	Learns to approximate SJF-like behavior without hard-coded rules.
FCFS	Highest	Inefficient as small jobs get blocked by large ones.

Installation

Clone the repository:

git clone https://github.com/yourusername/rl-job-scheduling.git
cd rl-job-scheduling

Create a virtual environment (optional but recommended):

python -m venv venv
# Windows
.\venv\Scripts\activate
# Mac/Linux
source venv/bin/activate

Install dependencies:

pip install gymnasium stable-baselines3 matplotlib numpy shimmy

Usage

1. Train the PPO Agent

Train the reinforcement learning model from scratch.

python train_ppo.py

This will save the trained model as ppo_scheduler.zip.

2. Run Baselines

Evaluate the FCFS and SJF heuristics.

python fcfs_baseline.py
# or
python sjf_baseline.py

3. Visualize Results

Generate Gantt charts to see the schedule visually.

python visualize_ppo.py

Project Structure

├── job_scheduling_env.py   # Custom Gymnasium Environment
├── train_ppo.py            # Script to train the PPO agent
├── test_random_policy.py   # Baseline: Random actions
├── fcfs_baseline.py        # Baseline: First-Come First-Served
├── sjf_baseline.py         # Baseline: Shortest Job First
├── visualize_ppo.py        # load model & plot Gantt chart
├── plot_gantt.py           # Helper utility for plotting
├── img/                    # Validation images & Gantt charts
└── README.md               # Project Documentation

Technical Details

Observation Space: Includes the current time, remaining time of the current job, and a padded vector of the top $N$ job sizes in the queue.
Action Space: Discrete action space $[0, N-1]$ selecting which job from the queue to process next.
Reward Function:
- Negative penalty per timestep (to encourage speed).
- Large penalty for invalid actions (picking empty slots).
- Positive reward for valid job completion.

Author

Anuj Mundu MCA Student | ML & RL Practitioner Focused on applied machine learning and real-world optimization problems.

GitHub: https://github.com/anujmundu | LinkedIn: https://linkedin.com/in/anujmundu

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Reinforcement Learning for Job Scheduling

Overview

Key Features

Results & Visualization

1. First-Come, First-Served (FCFS)

2. Shortest Job First (SJF)

3. PPO Agent (RL)

Installation

Usage

1. Train the PPO Agent

2. Run Baselines

3. Visualize Results

Project Structure

Technical Details

Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
img		img
.gitignore		.gitignore
README.md		README.md
baselines.py		baselines.py
fcfs_baseline.py		fcfs_baseline.py
job_scheduling_env.py		job_scheduling_env.py
plot_gantt.py		plot_gantt.py
sjf_baseline.py		sjf_baseline.py
test_random_policy.py		test_random_policy.py
train_ppo.py		train_ppo.py
visualize_fcfs.py		visualize_fcfs.py
visualize_ppo.py		visualize_ppo.py
visualize_sjf.py		visualize_sjf.py

Folders and files

Latest commit

History

Repository files navigation

Reinforcement Learning for Job Scheduling

Overview

Key Features

Results & Visualization

1. First-Come, First-Served (FCFS)

2. Shortest Job First (SJF)

3. PPO Agent (RL)

Installation

Usage

1. Train the PPO Agent

2. Run Baselines

3. Visualize Results

Project Structure

Technical Details

Author

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages