Highlights

Tinker API Integration: SkyRL now fully implements the Tinker API, a simple training and sampling API introduced by Thinking Machines Lab. Any training script written against the Tinker API can run locally on your own GPUs using SkyRL's backends with zero code changes. See the Tinker API docs to get started.

Supported Tinker features include:

Supervised fine-tuning (cross_entropy loss) and RL training (importance_sampling loss)
LoRA and full-parameter fine-tuning
Sampling with logprobs via colocated vLLM inference engines
FSDP2 and Megatron training backends
Lazy inference engine initialization for SFT-only workloads
Ephemeral and persistent weight sync modes

Repo Reorganization: The skyrl-tx and skyrl-train packages are being unified into a single skyrl/ folder. The existing packages remain fully functional and will be migrated to new paths shortly.

Megatron Backend for Tinker: The Megatron strategy is now fully supported for Tinker workloads, including RL training with loss_fn_outputs passthrough.

HTTP Inference Integration: A new HTTP-based inference server integration (feature-flagged) enables decoupled inference engine deployments.

Pythonic Configs: Introduced configuration dataclasses as an alternative to YAML-only configuration, with migration of tests to the new system.

Off-Policy Correction Refactor: Refactored truncated importance sampling (TIS) into a more comprehensive off-policy correction config with support for token-level and sequence-level ratio types.

Harbor Integration: Upstream Harbor integration for evaluation, with Modal support and configurable rate limiting.

Documentation: Migrated documentation to fumadocs, with comprehensive Tinker API docs including quickstart, architecture, cookbook scripts, and configuration pages.

New Model Support (TX):

DeepSeekV3 implementation with expert parallelism
GLM-4.7 Flash support
Qwen3 stacked weights optimization

What's Changed

[tx] Add experimental SkyRL-train backend that supports SFT by @pcmoritz in #871
Add sampling support for Tinker SkyRL backend by @pcmoritz in #999
Add checkpointing support for Tinker SkyRL backend by @pcmoritz in #992
Unify Megatron and FSDP training interfaces with forward_backward + optim_step by @pcmoritz in #901
Implement forward-only pass and populate metrics by @tyler-griggs in #1046
Emit loss_fn_outputs with logprobs for RL losses in forward_backward by @tyler-griggs in #1047
[tx] Lazy inference engine initialization by @tyler-griggs in #1069
Support colocate_all=False in Tinker backend by @tyler-griggs in #1097
[skyrl-train] Return loss_fn_outputs for megatron backend to support tinker RL by @erictang000 in #1102
[tx][megatron] making megatron skyrl-train worker usable as TX backend by @erictang000 in #1067
[tx][train][merge] make the skyrl folder standalone by @erictang000 in #1084
[WIP][skyrl] Create new skyrl folder combining tx + train by @erictang000 in #1068
[skyrl-train] Add SFT support via forward_backward(loss_fn="cross_entropy") by @pcmoritz in #961
Add set_lr() for dynamic learning rate updates from Tinker by @pcmoritz in #978
Fix placement group creation in SkyRL-Train backend by @pcmoritz in #1010
[skyrl-train][inference] HTTP Inference Integration (Feature-Flagged) 4/N by @CharlieFRuan in #931
[skyrl-train][inference] Inference Server Refactor (1/N) by @CharlieFRuan in #899
[skyrl-train][refactor] Inference Server Refactor -- RemoteInferenceClient 2/N by @CharlieFRuan in #904
[train] Pythonic Configs 1/N - Introduce configuration dataclasses by @CharlieFRuan in #1001
[skyrl-train] Refactor TIS to use more comprehensive off policy correction config by @erictang000 in #849
[train][Harbor][1/N] Upstream Harbor integration by @CharlieFRuan in #923
[Harbor] Add Modal support and bump Harbor version by @CharlieFRuan in #1022
[Harbor] Add rate limit for trials/sec and max concurrency by @CharlieFRuan in #1074
[tx] DeepseekV3 implementation by @pcmoritz in #889
[tx] Add support for GLM-4.7 Flash by @pcmoritz in #1023
[tx] Stack weights — Qwen3 by @pcmoritz in #1079
[tx] Add EP axis to deepseek by @pcmoritz in #993
[tx] chunked logprobs computation for memory efficiency by @pcmoritz in #902
[skyrl-train] Add example for 235B LoRA training with Megatron on 4 H100 nodes by @erictang000 in #1000
[train] Enable RayPrometheusStatLogger for async vLLM engine by @CharlieFRuan in #900
[train][OpenAI] Add generator.served_model_name for /chat/completions by @CharlieFRuan in #970
[train] Enable custom chat template for get_response_ids_and_loss_mask_from_messages by @CharlieFRuan in #981
[train][vllm] Add enable_log_requests and max_log_len support by @tyler-griggs in #1071
[tx] Use WAL mode for sqlite by @pcmoritz in #1054
Increase busy timeout for sqlite to avoid database is locked error by @pcmoritz in #1105
[tx] Gracefully handle stale save_weights_for_sampler requests on engine restart by @pcmoritz in #1073
Migrate documentation to fumadocs by @tyler-griggs in #941
Add Tinker integration documentation by @tyler-griggs in #1050
[agent] Add YouCom search engine by @caoshiyi in #803

Full Changelog: skyrl_train-v0.3.0...skyrl_train-v0.4.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SkyRL-Train: v0.4.0

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Highlights

What's Changed

Contributors

Uh oh!