SkyRL-Train: v0.4.0
Highlights
Tinker API Integration: SkyRL now fully implements the Tinker API, a simple training and sampling API introduced by Thinking Machines Lab. Any training script written against the Tinker API can run locally on your own GPUs using SkyRL's backends with zero code changes. See the Tinker API docs to get started.
Supported Tinker features include:
- Supervised fine-tuning (
cross_entropyloss) and RL training (importance_samplingloss) - LoRA and full-parameter fine-tuning
- Sampling with logprobs via colocated vLLM inference engines
- FSDP2 and Megatron training backends
- Lazy inference engine initialization for SFT-only workloads
- Ephemeral and persistent weight sync modes
Repo Reorganization: The skyrl-tx and skyrl-train packages are being unified into a single skyrl/ folder. The existing packages remain fully functional and will be migrated to new paths shortly.
Megatron Backend for Tinker: The Megatron strategy is now fully supported for Tinker workloads, including RL training with loss_fn_outputs passthrough.
HTTP Inference Integration: A new HTTP-based inference server integration (feature-flagged) enables decoupled inference engine deployments.
Pythonic Configs: Introduced configuration dataclasses as an alternative to YAML-only configuration, with migration of tests to the new system.
Off-Policy Correction Refactor: Refactored truncated importance sampling (TIS) into a more comprehensive off-policy correction config with support for token-level and sequence-level ratio types.
Harbor Integration: Upstream Harbor integration for evaluation, with Modal support and configurable rate limiting.
Documentation: Migrated documentation to fumadocs, with comprehensive Tinker API docs including quickstart, architecture, cookbook scripts, and configuration pages.
New Model Support (TX):
- DeepSeekV3 implementation with expert parallelism
- GLM-4.7 Flash support
- Qwen3 stacked weights optimization
What's Changed
- [tx] Add experimental SkyRL-train backend that supports SFT by @pcmoritz in #871
- Add sampling support for Tinker SkyRL backend by @pcmoritz in #999
- Add checkpointing support for Tinker SkyRL backend by @pcmoritz in #992
- Unify Megatron and FSDP training interfaces with forward_backward + optim_step by @pcmoritz in #901
- Implement forward-only pass and populate metrics by @tyler-griggs in #1046
- Emit loss_fn_outputs with logprobs for RL losses in forward_backward by @tyler-griggs in #1047
- [tx] Lazy inference engine initialization by @tyler-griggs in #1069
- Support colocate_all=False in Tinker backend by @tyler-griggs in #1097
- [skyrl-train] Return loss_fn_outputs for megatron backend to support tinker RL by @erictang000 in #1102
- [tx][megatron] making megatron skyrl-train worker usable as TX backend by @erictang000 in #1067
- [tx][train][merge] make the
skyrlfolder standalone by @erictang000 in #1084 - [WIP][skyrl] Create new skyrl folder combining tx + train by @erictang000 in #1068
- [skyrl-train] Add SFT support via forward_backward(loss_fn="cross_entropy") by @pcmoritz in #961
- Add set_lr() for dynamic learning rate updates from Tinker by @pcmoritz in #978
- Fix placement group creation in SkyRL-Train backend by @pcmoritz in #1010
- [skyrl-train][inference] HTTP Inference Integration (Feature-Flagged) 4/N by @CharlieFRuan in #931
- [skyrl-train][inference] Inference Server Refactor (1/N) by @CharlieFRuan in #899
- [skyrl-train][refactor] Inference Server Refactor -- RemoteInferenceClient 2/N by @CharlieFRuan in #904
- [train] Pythonic Configs 1/N - Introduce configuration dataclasses by @CharlieFRuan in #1001
- [skyrl-train] Refactor TIS to use more comprehensive off policy correction config by @erictang000 in #849
- [train][Harbor][1/N] Upstream Harbor integration by @CharlieFRuan in #923
- [Harbor] Add Modal support and bump Harbor version by @CharlieFRuan in #1022
- [Harbor] Add rate limit for trials/sec and max concurrency by @CharlieFRuan in #1074
- [tx] DeepseekV3 implementation by @pcmoritz in #889
- [tx] Add support for GLM-4.7 Flash by @pcmoritz in #1023
- [tx] Stack weights — Qwen3 by @pcmoritz in #1079
- [tx] Add EP axis to deepseek by @pcmoritz in #993
- [tx] chunked logprobs computation for memory efficiency by @pcmoritz in #902
- [skyrl-train] Add example for 235B LoRA training with Megatron on 4 H100 nodes by @erictang000 in #1000
- [train] Enable RayPrometheusStatLogger for async vLLM engine by @CharlieFRuan in #900
- [train][OpenAI] Add generator.served_model_name for /chat/completions by @CharlieFRuan in #970
- [train] Enable custom chat template for get_response_ids_and_loss_mask_from_messages by @CharlieFRuan in #981
- [train][vllm] Add enable_log_requests and max_log_len support by @tyler-griggs in #1071
- [tx] Use WAL mode for sqlite by @pcmoritz in #1054
- Increase busy timeout for sqlite to avoid
database is lockederror by @pcmoritz in #1105 - [tx] Gracefully handle stale save_weights_for_sampler requests on engine restart by @pcmoritz in #1073
- Migrate documentation to fumadocs by @tyler-griggs in #941
- Add Tinker integration documentation by @tyler-griggs in #1050
- [agent] Add YouCom search engine by @caoshiyi in #803
Full Changelog: skyrl_train-v0.3.0...skyrl_train-v0.4.0