Skip to content

Latest commit

 

History

History
16 lines (14 loc) · 1.32 KB

File metadata and controls

16 lines (14 loc) · 1.32 KB

Docs

This directory maintains the documentation for PRIME-RL. It is organized into the following sections:

  • Entrypoints - Overview of the main components (orchestrator, trainer, inference) and how to run SFT, RL, and evals
  • Configs - Configuration system using TOML files, CLI arguments, and environment variables
  • Environments - Installing and using verifiers environments from the Environments Hub
  • Async Training - Understanding asynchronous off-policy training and step semantics
  • Logging - Logging with loguru, torchrun, and Weights & Biases
  • Platform Monitoring - Register runs on the Prime Intellect platform and stream training metrics
  • MultiRunManager - Multi-run training with the MultiRunManager object for concurrent LoRA adapters
  • Checkpointing - Saving and resuming training from checkpoints
  • Benchmarking - Performance benchmarking and throughput measurement
  • Deployment - Training deployment on single-GPU, multi-GPU, and multi-node clusters
  • Kubernetes - Deploying PRIME-RL on Kubernetes with Helm
  • Troubleshooting - Common issues and their solutions