Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
572 commits
Select commit Hold shift + click to select a range
0d88bca
fix: fixing the sequence parallel related issue in mcore path (#1487)
youngeunkwon0405 Nov 14, 2025
aadbebf
fix: improve local eval config and doc (#1528)
yuki-97 Nov 17, 2025
dff9072
docs: Refactor Home Page and New About Section (#1338)
jgerh Nov 17, 2025
9605fa4
fix: Incompatible configuration between reward normalization and the …
ffrujeri Nov 18, 2025
0a8915a
feat: Support for nano-v2 (#1514)
yfw Nov 18, 2025
49f3ab2
fix: Update Penguin tests to use renamed resource server (#1540)
shashank3959 Nov 19, 2025
f543b37
fix: honor mlflow server artifact_location (#1536) (#1538)
clumsy Nov 19, 2025
8b235a0
build: Update docker file to include OSS NOTICES.txt (#1544)
chtruong814 Nov 19, 2025
da22cf0
perf: perf script change for qwen30b-a3b (#1526)
youngeunkwon0405 Nov 20, 2025
cda42b0
fix: removed sliding_window_overwrite (#1541)
ahmadki Nov 22, 2025
35e8a05
chore: add a research template project (#1278)
terrykong Nov 23, 2025
660698d
docs: remove doc pyproject toml (#1561)
lbliii Nov 24, 2025
5d83ff9
perf: [Perf script] QWEN3 30B-A3B tensor_parallel_size from 4 to 2 (#…
youngeunkwon0405 Nov 24, 2025
7457e87
feat: per-worker active/idle timeline + IFB size logging (#1534)
youngeunkwon0405 Nov 25, 2025
48a44a3
chore: Improve checkpoint loading error messages with common issue an…
ahmadki Nov 25, 2025
e254950
feat: Fp8 moe rollout (#1446)
guyueh1 Nov 27, 2025
d1710c2
fix: Fix the sequence padding for FP8 case (#1569)
guyueh1 Nov 28, 2025
43cb404
build: Use dynamic engine for generate. (#1502)
shanmugamr1992 Dec 1, 2025
92a84d5
docs: Create performance-summary.md for NeMo RL (#1560)
snowmanwwg Dec 1, 2025
ceab63e
docs: Update nvidia-sphinx-theme (#1584)
chtruong814 Dec 2, 2025
d60c621
feat: KV cache quantization support in fp8 rollout in GRPO (#1212)
sharonyu-115 Dec 2, 2025
11f4e59
fix: Use Float16Module even when defer_fp32_logits=True (#1537)
yfw Dec 3, 2025
b740a54
feat: plot vllm internal metrics to the wandb log (#1567)
youngeunkwon0405 Dec 3, 2025
b1aad0c
docs: add v0.4 news and minor touch up to front page readme (#1268)
euronymous-aithal Dec 3, 2025
3f0dfc7
feat: Add moe load balancing metrics (#1520)
yfw Dec 3, 2025
b4cb62b
feat: force on-policy ratio to 1 (#1529)
yfw Dec 4, 2025
444672b
fix: ADDING DOCS (#1595)
shanmugamr1992 Dec 4, 2025
06c7efc
refactor: Introduce BasePolicyWorker (#1585)
ashors1 Dec 4, 2025
e7c1c7b
chore: rename penguin -> nemo_gym and add the gym submodule (#1587)
terrykong Dec 5, 2025
6949de2
feat: allow uv-less execution and fingerprint the environment (#1491)
terrykong Dec 5, 2025
6537fd7
add dep for causal-conv1d
Dec 8, 2025
f500593
add conversation-based dataset
yuanhangsu1986 Dec 12, 2025
beb2501
add avlm config yaml
yuanhangsu1986 Dec 12, 2025
ce17500
import bugfix
Dec 12, 2025
4f08ca6
indentation fix
yuanhangsu1986 Dec 12, 2025
b588ec8
add GeneralConversationsJsonlDataset initializer
yuanhangsu1986 Dec 12, 2025
d4ea08a
bugfix
Dec 12, 2025
84eda79
process multimodal data
yuanhangsu1986 Dec 12, 2025
23b64db
use decord for video and audio loading
yuanhangsu1986 Dec 12, 2025
ca941a7
move the sample processing to sft_processor
yuanhangsu1986 Dec 13, 2025
cd5bc3d
video output bugfix
Dec 12, 2025
374632e
move multimodal functions to multimodal_utils.py; add video, audio se…
yuanhangsu1986 Dec 13, 2025
628712a
bugfix
Dec 13, 2025
47af67d
bugfix reported by coderabbitai
yuanhangsu1986 Jan 8, 2026
ec621d5
feat: log generation ISL/OSL histogram to wandb (#1594)
youngeunkwon0405 Dec 5, 2025
550d8e8
feat: Enable Ray dashboard for Ray state API (#1602)
pjin-nvidia Dec 5, 2025
6337574
docs: update roadmap post v0.4 (#1607)
euronymous-aithal Dec 7, 2025
dad90f0
fix: add H200 TFLOPS (#1543)
clumsy Dec 9, 2025
2dab255
fix: Set validation accuracy to mean of rewards to handle non-[0,1] r…
alexandery-nvidia Dec 11, 2025
02bf9bd
feat: LoRA SFT support for DTensorV2 path (#1556)
samodi-nv Dec 13, 2025
b1255c6
fix: swanlab logger error caused by `define_metric` (#1615)
Zeyi-Lin Dec 13, 2025
a36a058
refactor: refactor env and data processor & add nemotron super 49b re…
yuki-97 Dec 13, 2025
f6743e6
fix: Sort rollout outputs to match inputs order + gym bump (#1627)
yfw Dec 14, 2025
e60de8c
chore: update megatron dev (11/21/2025) / mbridge (11/28/2025) (#1568)
yaoyu-33 Dec 14, 2025
91d228d
docs: Add SkyRL to inspired libraries list (#1632)
snowmanwwg Dec 15, 2025
6ad57e5
fix: Set use_flashinfer_fused_rope to False (#1636)
shanmugamr1992 Dec 15, 2025
c26200a
chore: Enable LoRA Nightly Test (#1634)
RayenTian Dec 15, 2025
0fc7f84
docs: Revise news section for nemotron v3 and DAPO algorithm support …
snowmanwwg Dec 16, 2025
91421ec
chore: fix grpo functional test metric (#1643)
RayenTian Dec 16, 2025
d31a010
feat: add support from building images using vllm from private repos …
terrykong Dec 17, 2025
ba50efb
feat: Necessary changes for Gym GRPO tutorial (#1630)
bxyu-nvidia Dec 17, 2025
4dd9658
perf: Add qwen3 30b-a3b async-8-off recipe (#1642)
youngeunkwon0405 Dec 17, 2025
1fbc75d
feat: Add GPT-OSS support via mcore (#1452)
ashors1 Dec 17, 2025
72476ea
chore: Bump vllm to 0.11.2, torch to 2.9, transformers to 4.57.1 (#1563)
yfw Dec 18, 2025
be8eaca
fix: Support datasets saved with save_to_disk in ResponseDataset (#1610)
sahgerlad Dec 18, 2025
4029cfe
fix: Handle disabled validation in SFT training (#1611)
sahgerlad Dec 19, 2025
5ee9272
fix: Fix crash when using cp in dtensor path (#1663)
yfw Dec 19, 2025
c0d933b
fix: Fix Fp8 sequence padding for PP>1 case (#1579)
guyueh1 Dec 20, 2025
3cfce26
test: Perf recipe for v0.5 (#1667)
guyueh1 Dec 20, 2025
ca91716
fix: Fix fp8 after vllm v0.11.2 bump (#1660)
guyueh1 Dec 20, 2025
7efbdd3
fix: Fix crash when using activation_checkpointing (#1676)
yfw Dec 22, 2025
4580984
feat: add dapo recipe and test (#1617)
ZhiyuLi-Nvidia Dec 22, 2025
e422e47
feat: DTensorPolicyV2 GPT-OSS SFT support (#1470)
adil-a Dec 23, 2025
267e700
fix: grad norm calculation for dtensor v2 (#1693)
hemildesai Dec 24, 2025
01f8d95
feat: Add Nemotron‑3 Nano 30B A3B BF16 SFT nightly tests (FSDP2, +LoR…
RayenTian Dec 24, 2025
6d0eac6
feat: Support prefetching of specific envs (#1692)
hemildesai Dec 25, 2025
4e1895c
fix: Fix DTensor slice crash after PyTorch 2.9 bump (#1689)
zpqiu Jan 2, 2026
ad6bd9e
fix: grad norm check for automodel gpt oss nightly (#1708)
hemildesai Jan 5, 2026
a3d532b
fix: relax nanov3 nightly test metrics strict (#1712)
RayenTian Jan 5, 2026
5e8ee64
fix: on GB200 use single-thread checkpoint save to avoid Cpu OOM (#1703)
guyueh1 Jan 5, 2026
0a80425
perf: [Perf recipe] Change TP 16->32 for deepseek GB200 sync benchmar…
guyueh1 Jan 5, 2026
b2695c1
docs: Add doc for nano-v3 (#1694)
yfw Jan 5, 2026
486555a
fix: Disable cudnn sdpa backend when using activation checkpointing (…
yfw Jan 6, 2026
c42514d
fix: log metrics that can be coerced to scalars (#1723)
terrykong Jan 6, 2026
c900202
fix: use median instead of mean for logprob error for stability in ni…
terrykong Jan 7, 2026
2f8fb44
fix: gemma3 27b must now have skip_tokenizer_init=False in vllm (#1721)
terrykong Jan 7, 2026
83b9476
fix: fix several nightly tests that were flaky (#1724)
terrykong Jan 7, 2026
4115085
fix: apply offloading change from v2 to v1 (#1726)
terrykong Jan 7, 2026
0a47e76
fix: mcore generation config restored in nightly test (#1720)
terrykong Jan 8, 2026
949380c
feat: Megatron SFT LoRA (#1629)
arendu Jan 8, 2026
c905d54
build: Update aiohttp and urlib3 (#1746)
chtruong814 Jan 9, 2026
de09033
fix: patch pytorch aten.alias.default shard strategy (#1728)
RayenTian Jan 9, 2026
e59175c
feat: RL support for custom moe models in dtensor v2 (#1695)
hemildesai Jan 9, 2026
0bbe2ee
fix: split dtensorv1 vllm dependency (#1638)
yuki-97 Jan 10, 2026
137bf66
build: Resolve CVEs for gnupg and aiohttp (#1755)
chtruong814 Jan 10, 2026
78e6142
build: Bump mamba to d68d16e and causal-conv1d to 67e0a9d (#1759)
chtruong814 Jan 12, 2026
7d14c21
ci: Clean up disk space for lint check (#1768)
chtruong814 Jan 13, 2026
380e22b
docs: Adding dtensor TP debugging summary (#1767)
joyang-nv Jan 15, 2026
02e310d
docs: Update image syntax in dtensor TP accuracy guide for consistenc…
RayenTian Jan 15, 2026
28edf65
fix: fix formatting for async docs (#1783)
parthchadha Jan 15, 2026
c8a2c01
ci: Add nightly and release tests for gb200 (#1788)
chtruong814 Jan 16, 2026
3797917
feat: NeMo Gym refresh 20260113 (#1773)
bxyu-nvidia Jan 18, 2026
0600598
perf: DeepEP interface in megatron backend (#1794)
guyueh1 Jan 20, 2026
6d870b7
feat: refactor init of dtensor policy v2 (#1709)
hemildesai Jan 20, 2026
7ef7501
build: Update pyasn1 to >= 0.6.2 (#1791)
chtruong814 Jan 20, 2026
9c97e47
docs: Adding k8 guide (#1764)
vinhngx Jan 20, 2026
d1ec03a
test: Add grpo-qwen3-30ba3b-4n8g-40k config to performance test suite…
sfawzy-nv Jan 21, 2026
b5c91a2
docs: v0.5 performance results update (#1772)
guyueh1 Jan 21, 2026
deb8af1
docs: model support page (#1799)
terrykong Jan 21, 2026
57ffb0b
refactor: split train and val dataset in response dataset (#1649)
yuki-97 Jan 22, 2026
f34986e
docs: fix pytorch anchor link: PYTORCH_CUDA_ALLOC_CONF->PYTORCH_ALLOC…
terrykong Jan 22, 2026
d24b812
fix: log validation data (#1805)
parthchadha Jan 22, 2026
0b562e7
feat: Add SGLang rollout backend and tests (#1674)
RolaoDenthu Jan 22, 2026
3b16569
refactor: reuse setup data (#1808)
yuki-97 Jan 23, 2026
2633175
feat: refactor megatron init (#1646)
ashors1 Jan 23, 2026
3122477
build: Bump setuptools >= 80.10.1 and wheel >= 0.46.2 (#1822)
chtruong814 Jan 25, 2026
3dec4d9
build: Bump setuptools to 80.10.2 (#1830)
chtruong814 Jan 27, 2026
3e34e07
feat: refactor common data utilities of dtensor policy v2 (#1710)
hemildesai Jan 28, 2026
bb8fa12
feat: add FT launcher config and resiliency dependency [1/4] (#1824)
yashaswikarnati Jan 28, 2026
fd44882
fix: move ft_config.yaml outside examples/configs (#1839)
yashaswikarnati Jan 29, 2026
f0f5bc4
docs: Add notes for FP8 recipe in docs/fp8.md (#1829)
guyueh1 Jan 29, 2026
3844367
feat: Timer for the data sharding and job submission (#1802)
guyueh1 Jan 29, 2026
9386219
feat: Allow loading of more general data types (#1834)
nathan-az Jan 30, 2026
1af304e
chore: add assert for dtensor v2 cpu offload (#1817)
yuki-97 Jan 30, 2026
17ea691
build: Bump protobuf to 6.33.5 and python-multipart to 0.0.22 (#1850)
chtruong814 Jan 30, 2026
5fa4b13
feat: refactor megatron data utils (#1651)
ashors1 Jan 31, 2026
604e979
feat: support stateless group and decouple vLLM in train backend (#1842)
shuyixiong Jan 31, 2026
dc97cd5
docs: update readme post 0.5 (#1856)
euronymous-aithal Feb 1, 2026
6e4fa59
docs: fix readme post 0.5 (#1858)
euronymous-aithal Feb 2, 2026
3974004
feat: Support lora in dtensor grpo workflow by merging weight (#1797)
RayenTian Feb 2, 2026
e1106f2
chore: add nanov3 lora sft recipe to doc (#1860)
RayenTian Feb 2, 2026
8bd6a5d
ci: Allow repo to self publish docs (#1821)
chtruong814 Feb 2, 2026
9033633
fix: fix statistic of probs_ratio_clamped_min/max (#1818)
yuki-97 Feb 3, 2026
1f2826f
feat: support multiple datasets for response dataset (#1691)
yuki-97 Feb 3, 2026
759c14e
refactor: unify entrypoint for different envs (#1841)
yuki-97 Feb 3, 2026
d624f88
feat: add lora config for dpo dtensor backend (#1826)
RayenTian Feb 3, 2026
7876c84
fix: add log_plot to the logger interface (#1862)
terrykong Feb 3, 2026
2462f16
add preprocessor
yuanhangsu1986 Feb 4, 2026
f6ac015
bugfix
Feb 4, 2026
f6bb285
add working example configs for video
Feb 10, 2026
33e1c08
add unit tesets
yuanhangsu1986 Feb 10, 2026
1910ed8
refactor: split train and val dataset in preference dataset (#1763)
yuki-97 Feb 4, 2026
b3833c0
chore: add assert for tp4 batch variant accuracy issue (#1861)
yuki-97 Feb 4, 2026
1d56d3f
fix: prevent crash in rollout metric calculation when just 1 value (#…
terrykong Feb 4, 2026
a0e99c9
feat: add val_at_end for all algorithms (#1863)
terrykong Feb 4, 2026
06b7076
ci: Add secrets detector (#1854)
chtruong814 Feb 4, 2026
d97e109
feat: Add bisecting tooling for nightly test regressions (#1223)
terrykong Feb 5, 2026
9315b36
docs: add release runs to front page readme for 0.5 (#1879)
terrykong Feb 5, 2026
58cd571
fix: Remove redundant nested loop in `move_model` (#1880)
nathan-az Feb 6, 2026
294cee9
docs: Fix a step time number for deepseek (#1890)
guyueh1 Feb 6, 2026
a7ae356
feat: refactor train utilities for dtensor policy v2 (#1757)
hemildesai Feb 6, 2026
312f3c3
feat: add speculative decoding during post-training (#1785)
isomap Feb 6, 2026
29a10cc
feat: Add Nemotron‑3 Nano 30B A3B GRPO nightly tests (FSDP2, +LoRA) …
RayenTian Feb 7, 2026
345119a
ci: Fix docs publishing (#1898)
chtruong814 Feb 7, 2026
560cf3b
feat: Implement ProRLv2 recipe (#1809)
hijkzzz Feb 7, 2026
89c4ff5
feat: add way of excluding generation backends (#1855)
terrykong Feb 9, 2026
9f9047e
feat: Update mlflow to work better with env vars, manual run id, fix …
nathan-az Feb 10, 2026
91e18c3
feat: unify nemogym dataset (#1807)
yuki-97 Feb 10, 2026
f1ab10b
feat: improve dataset (#1893)
yuki-97 Feb 10, 2026
a53eb72
fix: fix enable_seq_packing and apply_temperature_scaling in DTensor …
yuki-97 Feb 10, 2026
2d9c6e1
chore: Centralize OmegaConf resolver registration (#1882)
RayenTian Feb 10, 2026
2294a23
fix: Fix DCP-to-HF conversion for model-wrapped checkpoints (#1881)
RayenTian Feb 10, 2026
f0fca1a
add support of split_validation_size
yuanhangsu1986 Feb 11, 2026
82e4d92
add configs for testing general_conversation_dataset
yuanhangsu1986 Feb 11, 2026
79561fd
change valid batch size
yuanhangsu1986 Feb 11, 2026
8df3f86
update to working config
Feb 11, 2026
ba31c0c
add interleaved multiturn test and singleturn test
yuanhangsu1986 Feb 11, 2026
aa4623e
bugfix for general_conversations_data
Feb 11, 2026
dbdaa8f
add daily-omni unit test
yuanhangsu1986 Feb 11, 2026
fe29d2e
add interleaved multiturn test and singleturn test
yuanhangsu1986 Feb 11, 2026
53148cb
fix: add missing functional test (#1883)
yuki-97 Feb 11, 2026
5428505
fix: fix and re-enable rm env functional test (#1905)
RayenTian Feb 11, 2026
8d27913
feat: start nemo gym and other environments with cached venvs (#1927)
terrykong Feb 11, 2026
25dbcc0
fix: Mxfp8 training fix sequence padding (#1884)
guyueh1 Feb 11, 2026
ca96880
fix: use seq_length instead of padded_seq_length for topk output padd…
zpqiu Feb 12, 2026
c032a1c
fix: Update sglang source (#1926)
RolaoDenthu Feb 12, 2026
8ffb6e4
chore: bump mcore and mbridge (#1902)
yfw Feb 13, 2026
d0bbda9
feat: refactor mcore train/forward utilities (#1654)
ashors1 Feb 13, 2026
3452719
docs: Document Gym + RL integration design (#1762)
ananthsub Feb 13, 2026
8803231
feat: retry rollout if generation_logprobs contains NaN (#1885)
guyueh1 Feb 13, 2026
da6c08c
feat: Support build custom flashinfer (#1886)
guyueh1 Feb 14, 2026
c784627
fix: async llm engine didnt have get_metrics() (#1943)
terrykong Feb 14, 2026
f08d0d1
feat: Mask sequences with high logprob error (#1838)
yfw Feb 14, 2026
c88ffdc
feat: ProRLv2 - add seq-mask-tis truncated importance sampling type (…
hijkzzz Feb 16, 2026
171fd51
ci: Update release-docs workflow to use FW-CI-templates v0.72.0 (#1965)
chtruong814 Feb 17, 2026
2752d38
fix: speedup minimize and minimize-check in config_cli (#1964)
hemildesai Feb 17, 2026
bc572ef
docs: update features.md to reflect v0.5 release and v0.6 roadmap (#1…
seonjinn Feb 17, 2026
4e9791b
fix: add mask seq with high logp err to nemo gym config (#1980)
cmunley1 Feb 18, 2026
8796b22
chore: upgrade wandb to 0.25+ (#1979)
Kipok Feb 18, 2026
5e6bfa9
feat: Remove do_not_average_loss (#1988)
yfw Feb 20, 2026
6ebfc25
chore: rename penguin -> nemo_gym and add the gym submodule (#1587)
terrykong Dec 5, 2025
3789fb7
feat: allow uv-less execution and fingerprint the environment (#1491)
terrykong Dec 5, 2025
66951de
add conversation-based dataset
yuanhangsu1986 Dec 12, 2025
0e2e450
add GeneralConversationsJsonlDataset initializer
yuanhangsu1986 Dec 12, 2025
15a4f08
bugfix
Dec 12, 2025
9e99c72
process multimodal data
yuanhangsu1986 Dec 12, 2025
225c715
use decord for video and audio loading
yuanhangsu1986 Dec 12, 2025
ec210e5
video output bugfix
Dec 12, 2025
fe0626e
move multimodal functions to multimodal_utils.py; add video, audio se…
yuanhangsu1986 Dec 13, 2025
e8eb181
bugfix
Dec 13, 2025
2b8c9d2
bugfix reported by coderabbitai
yuanhangsu1986 Jan 8, 2026
b0d9c34
feat: log generation ISL/OSL histogram to wandb (#1594)
youngeunkwon0405 Dec 5, 2025
be7d057
feat: LoRA SFT support for DTensorV2 path (#1556)
samodi-nv Dec 13, 2025
f9ba596
refactor: refactor env and data processor & add nemotron super 49b re…
yuki-97 Dec 13, 2025
08b6541
fix: Sort rollout outputs to match inputs order + gym bump (#1627)
yfw Dec 14, 2025
eb8555a
chore: update megatron dev (11/21/2025) / mbridge (11/28/2025) (#1568)
yaoyu-33 Dec 14, 2025
83aef80
fix: Set use_flashinfer_fused_rope to False (#1636)
shanmugamr1992 Dec 15, 2025
e837538
chore: Enable LoRA Nightly Test (#1634)
RayenTian Dec 15, 2025
84ba2f7
chore: Bump vllm to 0.11.2, torch to 2.9, transformers to 4.57.1 (#1563)
yfw Dec 18, 2025
a28ed19
fix: Handle disabled validation in SFT training (#1611)
sahgerlad Dec 19, 2025
f223ce8
feat: DTensorPolicyV2 GPT-OSS SFT support (#1470)
adil-a Dec 23, 2025
afba602
feat: Megatron SFT LoRA (#1629)
arendu Jan 8, 2026
6ee2d73
feat: RL support for custom moe models in dtensor v2 (#1695)
hemildesai Jan 9, 2026
acfa334
fix: split dtensorv1 vllm dependency (#1638)
yuki-97 Jan 10, 2026
19c4e63
feat: NeMo Gym refresh 20260113 (#1773)
bxyu-nvidia Jan 18, 2026
534422d
perf: DeepEP interface in megatron backend (#1794)
guyueh1 Jan 20, 2026
edf4412
feat: refactor init of dtensor policy v2 (#1709)
hemildesai Jan 20, 2026
f6052a3
refactor: split train and val dataset in response dataset (#1649)
yuki-97 Jan 22, 2026
356b838
feat: Add SGLang rollout backend and tests (#1674)
RolaoDenthu Jan 22, 2026
a4b45b4
refactor: reuse setup data (#1808)
yuki-97 Jan 23, 2026
e54a937
feat: refactor common data utilities of dtensor policy v2 (#1710)
hemildesai Jan 28, 2026
b3f25fc
feat: add FT launcher config and resiliency dependency [1/4] (#1824)
yashaswikarnati Jan 28, 2026
0a9c02b
fix: move ft_config.yaml outside examples/configs (#1839)
yashaswikarnati Jan 29, 2026
e0aad8f
feat: refactor megatron data utils (#1651)
ashors1 Jan 31, 2026
4a56388
feat: support stateless group and decouple vLLM in train backend (#1842)
shuyixiong Jan 31, 2026
e7d4501
feat: Support lora in dtensor grpo workflow by merging weight (#1797)
RayenTian Feb 2, 2026
c712c40
indentation bugfix
yuanhangsu1986 Feb 3, 2026
ddc6815
feat: support multiple datasets for response dataset (#1691)
yuki-97 Feb 3, 2026
efa7935
refactor: unify entrypoint for different envs (#1841)
yuki-97 Feb 3, 2026
cf15a2d
add preprocessor
yuanhangsu1986 Feb 4, 2026
66ce0da
bugfix
Feb 4, 2026
b516a7c
add working example configs for video
Feb 10, 2026
ee790aa
refactor: split train and val dataset in preference dataset (#1763)
yuki-97 Feb 4, 2026
810fd9c
feat: refactor train utilities for dtensor policy v2 (#1757)
hemildesai Feb 6, 2026
761a063
feat: Implement ProRLv2 recipe (#1809)
hijkzzz Feb 7, 2026
3a793d2
feat: add way of excluding generation backends (#1855)
terrykong Feb 9, 2026
2ce8b83
feat: unify nemogym dataset (#1807)
yuki-97 Feb 10, 2026
28630e2
add support of split_validation_size
yuanhangsu1986 Feb 11, 2026
3473c25
add daily-omni dataset unit test; add general_conversations_dataset u…
yuanhangsu1986 Feb 11, 2026
2eb770e
fix: add missing functional test (#1883)
yuki-97 Feb 11, 2026
6dd98b0
add preprocessor to setup_response_data for rl training
yuanhangsu1986 Feb 13, 2026
fab4e46
add preprocessor for preference datasets as well
yuanhangsu1986 Feb 13, 2026
ecc68e9
lint fixes
yuanhangsu1986 Feb 13, 2026
513a94f
lint fixes
yuanhangsu1986 Feb 20, 2026
a5e0341
Merge remote-tracking branch 'upstream/main' into yuanhangs_dev
yuanhangsu1986 Feb 21, 2026
5f2744d
Merge remote-tracking branch 'upstream/main' into yuanhangs_dev
Feb 21, 2026
a6f105d
merge with the yuanhangs_dev
yuanhangsu1986 Feb 21, 2026
a7295d1
update Megatron-LM to the latest commit
yuanhangsu1986 Feb 21, 2026
b355e22
docstring fix
yuanhangsu1986 Feb 21, 2026
d8267b3
move load_video_kwargs,load_audio_kwargs from global to get_multimoda…
yuanhangsu1986 Feb 22, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/workflows/cicd-main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -222,6 +222,7 @@ jobs:
build-args: |
MAX_JOBS=4
NEMO_RL_COMMIT=${{ github.sha }}
SKIP_SGLANG_BUILD=1
cicd-doc-tests:
strategy:
Expand Down
Binary file not shown.
Binary file not shown.
29 changes: 29 additions & 0 deletions examples/configs/sft_avlm.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
defaults:
- sft_vlm_3B.yaml

sft:
val_batches: 2
val_global_batch_size: 8

policy:
max_total_sequence_length: 32768
train_global_batch_size: 8
dtensor_cfg:
tensor_parallel_size: 1
dynamic_batching:
enabled: true
tokenizer:
video:
num_frames: 16

data:
# dataset
train:
dataset_name: daily-omni
split: train
split_validation_size: 0.05 # use 5% of the training data as validation data
seed: 42 # seed for train/validation split when split_validation_size > 0
validation: null
# default settings for all datasets
default:
prompt_file: null
10 changes: 10 additions & 0 deletions examples/run_sft.py
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,7 @@ def setup_data(tokenizer: AutoTokenizer, data_config: DataConfig):
print("\n▶ Setting up data...")
# setup train dataset
task_data_processors = {}
task_data_preprocessors = {}
data_list = []

if isinstance(data_config["train"], dict):
Expand All @@ -85,19 +86,23 @@ def setup_data(tokenizer: AutoTokenizer, data_config: DataConfig):
add_generation_prompt=data_config["add_generation_prompt"],
)
task_data_processors[data.task_name] = (data.task_spec, data_processor)
if hasattr(data, "preprocessor") and data.preprocessor is not None:
task_data_preprocessors[data.task_name] = data.preprocessor

merged_data = concatenate_datasets([data.dataset for data in data_list])
dataset = AllTaskProcessedDataset(
merged_data,
tokenizer,
None,
task_data_processors,
task_data_preprocessors=task_data_preprocessors,
max_seq_length=data_config["max_input_seq_length"],
)
print(f" ✓ Training dataset loaded with {len(dataset)} samples.")

# setup validation dataset
val_task_data_processors = {}
val_task_data_preprocessors = {}
val_data_list = []

# validation dataset from train dataset (when train dataset's split_validation_size > 0)
Expand All @@ -107,6 +112,8 @@ def setup_data(tokenizer: AutoTokenizer, data_config: DataConfig):
# bind task_name to task_data_processors
task_name = data.task_name
val_task_data_processors[task_name] = task_data_processors[task_name]
if task_name in task_data_preprocessors:
val_task_data_preprocessors[task_name] = task_data_preprocessors[task_name]

# validation dataset from config
if "validation" in data_config and data_config["validation"] is not None:
Expand All @@ -130,6 +137,8 @@ def setup_data(tokenizer: AutoTokenizer, data_config: DataConfig):
val_data.task_spec,
val_data_processor,
)
if hasattr(val_data, "preprocessor") and val_data.preprocessor is not None:
val_task_data_preprocessors[val_data.task_name] = val_data.preprocessor

val_dataset = None
if len(val_data_list) > 0:
Expand All @@ -139,6 +148,7 @@ def setup_data(tokenizer: AutoTokenizer, data_config: DataConfig):
tokenizer,
None,
val_task_data_processors,
task_data_preprocessors=val_task_data_preprocessors,
max_seq_length=data_config["max_input_seq_length"],
)
print(f" ✓ Validation dataset loaded with {len(val_dataset)} samples.")
Expand Down
25 changes: 25 additions & 0 deletions nemo_rl/algorithms/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -320,6 +320,31 @@ def get_tokenizer(
processor.bos_token_id = tokenizer.bos_token_id
# copy name_or_path from tokenizer to processor for logging
processor.name_or_path = tokenizer.name_or_path
if hasattr(processor, "feature_extractor") and "audio" in tokenizer_config:
if "sampling_rate" in tokenizer_config["audio"] and \
tokenizer_config["audio"]["sampling_rate"] != processor.feature_extractor.sampling_rate:
new_sampling_rate = tokenizer_config["audio"]["sampling_rate"]
warnings.warn(
f"Overriding audio sampling rate from {processor.feature_extractor.sampling_rate} to {new_sampling_rate}"
)
processor.feature_extractor.sampling_rate = new_sampling_rate
if hasattr(processor, "video_processor") and "video" in tokenizer_config:
if "fps" in tokenizer_config["video"] and \
tokenizer_config["video"]["fps"] != processor.video_processor.fps:
# override the video loading fps
new_fps = tokenizer_config["video"]["fps"]
warnings.warn(
f"Overriding video fps from {processor.video_processor.fps} to {new_fps}"
)
processor.video_processor.fps = new_fps
# fps and num_frames cannot co-exist, but let it crash later
if "num_frames" in tokenizer_config["video"] and \
tokenizer_config["video"]["num_frames"] != processor.video_processor.num_frames:
new_num_frames = tokenizer_config["video"]["num_frames"]
warnings.warn(
f"Overriding video num_frames from {processor.video_processor.num_frames} to {new_num_frames}"
)
processor.video_processor.num_frames = new_num_frames

return tokenizer if processor is None else processor

Expand Down
19 changes: 19 additions & 0 deletions nemo_rl/data/datasets/processed_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@
DatumSpec,
TaskDataProcessFnCallable,
TaskDataSpec,
TaskDataPreProcessFnCallable,
)

TokenizerType = Union[PreTrainedTokenizerBase, AutoProcessor]
Expand Down Expand Up @@ -52,13 +53,17 @@ def __init__(
dict[str, tuple[TaskDataSpec, TaskDataProcessFnCallable]]
| TaskDataProcessFnCallable
),
task_data_preprocessors: Optional[Union[
dict[str, TaskDataPreProcessFnCallable], TaskDataPreProcessFnCallable
]] = None,
max_seq_length: Optional[int] = None,
):
self.dataset = dataset
self.tokenizer = tokenizer
# TODO @yukih: will be removed once eval datasets are adapted
self.default_task_data_spec = default_task_data_spec
self.task_data_processors = task_data_processors
self.task_data_preprocessors = task_data_preprocessors
self.max_seq_length = max_seq_length
self._bos_checked = False

Expand Down Expand Up @@ -95,6 +100,20 @@ def __getitem__(self, idx: int) -> DatumSpec:
"""Return a single prompt."""
entry = self.dataset[idx]

# preprocessing
task_data_preprocessor = None
if self.task_data_preprocessors:
if isinstance(self.task_data_preprocessors, dict):
task_name = entry["task_name"]
if task_name in self.task_data_preprocessors:
task_data_preprocessor = self.task_data_preprocessors[task_name]
else:
task_data_preprocessor = self.task_data_preprocessors

if task_data_preprocessor is not None:
entry = task_data_preprocessor(entry)

# processing
if isinstance(self.task_data_processors, dict):
task_name = entry["task_name"]

Expand Down
3 changes: 2 additions & 1 deletion nemo_rl/data/datasets/raw_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
from datasets import Dataset

from nemo_rl.data import PreferenceDatasetConfig, ResponseDatasetConfig
from nemo_rl.data.interfaces import TaskDataProcessFnCallable, TaskDataSpec
from nemo_rl.data.interfaces import TaskDataProcessFnCallable, TaskDataPreProcessFnCallable, TaskDataSpec
from nemo_rl.data.processors import PROCESSOR_REGISTRY


Expand All @@ -27,6 +27,7 @@ class RawDataset:
val_dataset: Dataset | None
processor: TaskDataProcessFnCallable
task_spec: TaskDataSpec
preprocessor: TaskDataPreProcessFnCallable | None = None

def split_train_validation(self, test_size: float, seed: int):
if test_size > 0:
Expand Down
6 changes: 6 additions & 0 deletions nemo_rl/data/datasets/response_datasets/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,13 +32,17 @@
)
from nemo_rl.data.datasets.response_datasets.refcoco import RefCOCODataset
from nemo_rl.data.datasets.response_datasets.response_dataset import ResponseDataset
from nemo_rl.data.datasets.response_datasets.daily_omni import DailyOmniDataset
from nemo_rl.data.datasets.response_datasets.general_conversations_dataset import GeneralConversationsJsonlDataset
from nemo_rl.data.datasets.response_datasets.squad import SquadDataset
from nemo_rl.data.datasets.response_datasets.tulu3 import Tulu3SftMixtureDataset

DATASET_REGISTRY = {
# built-in datasets
"AIME2024": AIME2024Dataset,
"clevr-cogent": CLEVRCoGenTDataset,
"daily-omni": DailyOmniDataset,
"general-conversation-jsonl": GeneralConversationsJsonlDataset,
"DAPOMath17K": DAPOMath17KDataset,
"DAPOMathAIME2024": DAPOMathAIME2024Dataset,
"DeepScaler": DeepScalerDataset,
Expand Down Expand Up @@ -84,6 +88,8 @@ def load_response_dataset(data_config: ResponseDatasetConfig):
__all__ = [
"AIME2024Dataset",
"CLEVRCoGenTDataset",
"DailyOmniDataset",
"GeneralConversationsJsonlDataset",
"DAPOMath17KDataset",
"DAPOMathAIME2024Dataset",
"DeepScalerDataset",
Expand Down
123 changes: 123 additions & 0 deletions nemo_rl/data/datasets/response_datasets/daily_omni.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,123 @@
## Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import os
from typing import Any
from huggingface_hub import snapshot_download

from nemo_rl.data.datasets.raw_dataset import RawDataset
from nemo_rl.data.datasets.utils import (
load_dataset_from_path,
get_huggingface_cache_path,
)


class DailyOmniDataset(RawDataset):
"""Simple wrapper around the CLEVR-CoGenT dataset.

Args:
split: Split name for the dataset, default is "train"
"""

task_name = "daily-omni"

def __init__(self, split: str = "train", split_validation_size: float = 0, seed: int = 42, **kwargs):
# train, valA, and valB are supported splits.
SPLIT_TO_HF_NAME = {
"train": "liarliar/Daily-Omni",
}
if split not in SPLIT_TO_HF_NAME:
raise ValueError(
f"Invalid split: {split}. Please use 'train'."
)

self.hf_cache_dir = get_huggingface_cache_path(SPLIT_TO_HF_NAME[split])
if not self.hf_cache_dir:
# download the dataset
self.hf_cache_dir = snapshot_download(repo_id=SPLIT_TO_HF_NAME[split], repo_type='dataset')
if not self.hf_cache_dir:
raise ValueError(f"Cannot download DailyOmniDataset.")

json_file = os.path.join(self.hf_cache_dir, "qa.json")

if not os.path.isfile(json_file):
raise ValueError(f"{json_file} cannot be found.")

files_folder = os.path.join(self.hf_cache_dir, 'Videos')
if not os.path.isdir(files_folder):
# prepare the dataset
# TODO: move untar, unzip func to utils?
import tarfile
archive_filename = os.path.join(self.hf_cache_dir, "Videos.tar")
if not os.path.isfile(archive_filename):
raise ValueError(f"{archive_filename} cannot be found.")
try:
with tarfile.open(archive_filename, "r:*") as tar:
# Extract all contents to the specified path
tar.extractall(path=self.hf_cache_dir)
if os.path.isdir(files_folder):
print(f"Successfully extracted '{archive_filename}' to '{files_folder}'")
else:
raise ValueError(f"Cannot find the extracted folder {files_folder}. Extraction failed.")
except tarfile.ReadError:
raise tarfile.ReadErro(f"Error: Could not read the tar file. It might be corrupted or not a tar file.")
except Exception as e:
raise Exception(f"An unexpected error occurred: {e}")

self.dataset = load_dataset_from_path(json_file)

# format - disable features to avoid schema conflicts
self.dataset = self.dataset.add_column(
"task_name", [self.task_name] * len(self.dataset)
)

self.preprocessor = self.format_data

# `self.val_dataset` is used (not None) only when current dataset is used for both training and validation
self.val_dataset = None
self.split_train_validation(split_validation_size, seed)

@classmethod
def get_prompt(cls, data: dict[str, Any]) -> str:
# WARNING: model could have preference of a different prompt
prompt = data["Question"] + '\n' + '\n'.join(data["Choice"])
candidate_answers = [chr(ord("A")+idx) for idx in range(len(data["Choice"]))]
candidate_answers_all_but_last = ",".join(candidate_answers[:-1])
prompt += '\n' + f"Your replies must contain only a single letter " + \
f"(either {candidate_answers_all_but_last} or {candidate_answers[-1]})."
return prompt

def format_data(self, data: dict[str, Any]) -> dict[str, Any]:
user_content = [
{
"type": "video",
"video": os.path.join(
self.hf_cache_dir,
"Videos",
data["video_id"],
data["video_id"]+"_video.mp4"
),
},
{
"type": "text",
"text": self.get_prompt(data),
},
]
return {
"messages": [
{"role": "user", "content": user_content},
{"role": "assistant", "content": data["Answer"]},
],
"task_name": self.task_name,
}
Loading
Loading