This repository demonstrates three AI-specific Kubernetes scheduling approaches:
- Reinforcement Learning–based Scheduler (PPO)
- GPU-aware AI Scheduler (DCGM / Prometheus metrics)
- Production-grade Kubernetes Scheduler Plugin (Go)
Pod (schedulerName=ai-scheduler) → AI Scheduler → Metrics (CPU/GPU/Node) → AI Model → Node Binding
- GenAI / LLM inference
- Distributed ML training
- GPU cost optimization