Release Tunix v0.1.6 — Agentic RL & VLM · google/tunix

Highlights

supports Agentic RL training, see https://github.com/google/tunix/tree/main/examples/agentic/gemma_grpo_demo_nb.py
supports VLM training, see https://github.com/google/tunix/blob/main/examples/sft/vlm_training.py

from tunix import AgenticGRPOConfig
from tunix import AgenticGRPOLearner

agentic_grpo_config = AgenticGRPOConfig(
    num_generations=NUM_GENERATIONS,
    num_iterations=NUM_ITERATIONS,
    max_response_length=MAX_RESPONSE_LENGTH,
    beta=BETA,
    epsilon=EPSILON,
    system_prompt=SWE_SYSTEM_PROMPT,
    max_concurrency=1,
    epsilon_high=0.28,
    off_policy_steps=0,
)

agentic_grpo_learner = AgenticGRPOLearner(
    rl_cluster=rl_cluster,
    reward_fns=reward_fns,
    agent_class=MyAgentClass,
    agent_kwargs={},
    env_class=MyEnv,
    env_kwargs={"max_steps": MAX_STEPS},
    algo_config=agentic_grpo_config,
    chat_parser=chat_parser,
)

agentic_grpo_learner.train(train_dataset=train_dataset)

What's Changed

Developing for v0.1.6 now. by @wang2yn84 in #785
Fix the vllm server mode not finish issue. by @wang2yn84 in #784
[Tunix] Update Dockerfile and deepscaler trainer script to seperate trainer model and ref model. by @copybara-service[bot] in #725
Add Tunix RL GRPO examples for Gemma3. by @copybara-service[bot] in #788
[Tunix] change model implementation to be pytree compatible. by @copybara-service[bot] in #782
Fix TPU nightly regression workflow to use vLLM container and add new tests. by @copybara-service[bot] in #754
[Tunix] Update sharding configuration for attention weights. by @copybara-service[bot] in #759
[Tunix] Add gcsfs to TPU nightly regression dependencies. by @copybara-service[bot] in #790
Adding back test_logprobs_extraction_with_missing_token. by @wang2yn84 in #789
feat:add device indexes for sglang jax by @pathfinder-pf in #786
Fix the rendering issue in Example gallery document. by @rajasekharporeddy in #799
[Tunix] Remove the version pin for SGLang. by @copybara-service[bot] in #798
[Fixes 794] fix transformers=4.57.1 to solve issue42369 in transformers and use c… by @aolemila in #795
Refactor gemma3 modelConfig to explicitly include all models by @copybara-service[bot] in #792
[Tunix] Fix nightly regression: remove unnecessary --root-dir argument from TPU nightly regression script. Fix the MATH500 eval script. by @copybara-service[bot] in #796
use naming utils in tunix cli by @copybara-service[bot] in #736
[Tunix] Remove GitHub Actions replacement in copybara. Replying on more generic google3 replacement rule by @copybara-service[bot] in #803
reduce safetensor loading time by @keshavb96 in #760
[Tunix] Remove env_utils.fs_open from safetensors_loader. fsspec object doesn't have fileno. 3P test is broken: https://github.com/google/tunix/actions/runs/19689186862/job/56403241781?pr=744 by @copybara-service[bot] in #804
[Tunix] Pass HF_TOKEN to TPU nightly regression tests. by @copybara-service[bot] in #805
[Tunix] Follow up of cl/836961494. It was out of sync with github PR. by @copybara-service[bot] in #807
[Tunix] Pin the vLLM TPU Docker image to a specific nightly build version for the TPU tests. by @copybara-service[bot] in #808
[Tunix] Update tunix nightly regression workflow schedule. Change the cron schedule from 2 AM UTC to 10 AM UTC. by @copybara-service[bot] in #806
Centralize Flax sharding setup in env_utils by @copybara-service[bot] in #797
Fix gemma3 grpo shell scripts by @sizhit2 in #791
[Tunix] Fix GRPO script. by @copybara-service[bot] in #811
rename all model configs to use "p" instead of "_" for float values by @copybara-service[bot] in #740
[Tunix] Move model alignment tests from CPU to TPU run dev workflow. by @copybara-service[bot] in #818
handle the situation when lora_config is not provided by @Hanjun-Dai in #813
checkpoint_options->checkpointing_options in cli/config.py by @Hanjun-Dai in #814
[Tunix] Remove EOS token appending to the prompt in vLLM and SGLang sampler. by @copybara-service[bot] in #827
Fix bos duplication by @Hanjun-Dai in #822
remove extra flax sharding check by @copybara-service[bot] in #817
Remove irrelevant text in GRPO example by @copybara-service[bot] in #823
[TUNIX] Switch to absl.logging in the tunix util file for scripts. by @copybara-service[bot] in #831
renaming Transformer to Gemma for gemma model by @copybara-service[bot] in #819
Expand model tests and fix gemma from_params parsing by @copybara-service[bot] in #828
add missing refactoring to model test by @copybara-service[bot] in #835
allow user to config project name and run name in wandb by @Hanjun-Dai in #836
fix the issue when eager mode jax is triggered in undesired places by @Hanjun-Dai in #837
make TFDS download flag a configurable option by @copybara-service[bot] in #763
Fix llama RL verl script by @copybara-service[bot] in #839
Fix ref model compute_logps input sharding issue by @copybara-service[bot] in #846
Improves the GRPO script to be more configurable. by @wang2yn84 in #840
remove unused fn by @copybara-service[bot] in #847
Add support for Dr. GRPO by @copybara-service[bot] in #681
[Tunix] Update parallel sizes to use ROLLOUT_MESH in grpo_demo. by @copybara-service[bot] in #851
Fix typo and citation formatting by @selamw1 in #865
[Bug] Fix/sglang jax support pathways by @aolemila in #860
[Tunix] Add number of batches argument and reduce nightly regression run time. by @copybara-service[bot] in #866
Add AgenticRLLearner base class. by @copybara-service[bot] in #829
Add XM launch for tunix cli by @copybara-service[bot] in #848
update OSS readme by @copybara-service[bot] in #863
use env_utils in config_test by @copybara-service[bot] in #872
check integer type by @copybara-service[bot] in #877
Add smoke shell scripts to nightly run by @copybara-service[bot] in #855
use np instead jnp to compute rewards in agentic framework by @copybara-service[bot] in #881
[Tunix] Support pre-resharding pytrees with different meshes. by @copybara-service[bot] in #882
Fix the ValueError while loading the Gemma model in logit_distillation.ipynb by @rajasekharporeddy in #870
change qwen3_30b more specific to qwen3_30b_a3b by @copybara-service[bot] in #880
add qwen4b model config which uses tie embedding by @Hanjun-Dai in #858
Add codewiki link by @copybara-service[bot] in #886
[Script] merge grpo_demo_sglang_jax_rollout.py into grpo_demo_llama3_qwen2.py by @aolemila in #868
Adding support for gemma-X-, llama-X- naming similar to HF by @copybara-service[bot] in #876
enforce rollout tokens to be in RAM by @copybara-service[bot] in #889
Adding Automodel interface to Tunix by @copybara-service[bot] in #862
allow users to import reward module/fn outside tunix folder by @Hanjun-Dai in #852
Fix breaking config test by @copybara-service[bot] in #901
use np instead of jnp for reward fn and GRPO group adv by @copybara-service[bot] in #891
change qwen3_4b_2507 model config added to match HF model ids by @copybara-service[bot] in #897
[Tunix] Remove EOS token appending to the prompt in vLLM and SGLang sampler. by @copybara-service[bot] in #900
use model_path instead of model_id for gcs in the cli by @copybara-service[bot] in #904
change from mock.patch to mock.patch.object. by @copybara-service[bot] in #896
[Tunix]Make chat_parser optional in AgenticRLLearner. by @copybara-service[bot] in #911
Model creation smoke test by @copybara-service[bot] in #873
fix llama run script model name by @copybara-service[bot] in #909
[Tunix] Use jnp.concatenate instead of np.concatenate for merging micro-batches. by @copybara-service[bot] in #912
[Tunix] Fix sharding for act_btf in Tunix models. by @copybara-service[bot] in #914
Add Colab and Kaggle badges to the example notebooks by @rajasekharporeddy in #893
Doc ci check by @ev-br in #888
[Tunix] Remove the duplicate tests move rules. by @copybara-service[bot] in #922
[Tunix] Moves the smoke_tests folder to top level. by @copybara-service[bot] in #923
Support custom MaxText (vLLM) models in sampler and rollout. by @NicoGrande in #841
Fix for failing nightly regression test by @copybara-service[bot] in #918
fix perf by @pathfinder-pf in #915
update gemma-3 models ids to match HF by @copybara-service[bot] in #916
Validation tests for model id to exist on HF by @copybara-service[bot] in #917
allow users to specify data module outside of tunix; refactor a bit by @Hanjun-Dai in #853
add train step time metric. by @copybara-service[bot] in #924
[Tunix] Force install numpy 2.3.5 for vllm by @copybara-service[bot] in #931
Add an option to cache NNX traversals in PEFT trainer. by @copybara-service[bot] in #928
[Tunix] Fix sharding for act_btf in Tunix models. by @copybara-service[bot] in #934
Fix the dataset post initialization by @wang2yn84 in #936
add metric first_micro_batch_rollout_time in fully diagg mode. by @copybara-service[bot] in #925
remove mesh cm from example by @copybara-service[bot] in #908
Enable padding for attention qkv biases. by @copybara-service[bot] in #941
checkpoint opt state by @copybara-service[bot] in #945
Introduce prompt queue to support off-policy by @copybara-service[bot] in #946
remove type indirection by @copybara-service[bot] in #950
Adding dataclass to naming by @copybara-service[bot] in #948
Remove _obs_cache by @copybara-service[bot] in #952
remove type indirection by @copybara-service[bot] in #951
enable proper report for learning rate during training by @Hanjun-Dai in #940
Code update by @copybara-service[bot] in #942
add llama 3.2 1b/3b-instruct model config by @Hanjun-Dai in #935
use taskgroup for batch of async tasks by @copybara-service[bot] in #954
Add Tunix CLI setup instructions to g3docs. by @copybara-service[bot] in #957
[Internal] Adding naming conventions documentation by @copybara-service[bot] in #949
Fix broken example script references in CLI README by @SarveshMahalingam in #938
Fix the cli to have the proper model config over written. by @wang2yn84 in #960
[internal] extend XM launch to peft by @copybara-service[bot] in #959
adding missing test case variant by @copybara-service[bot] in #956
fix safetensor publish by @copybara-service[bot] in #968
Add gcsfs. by @copybara-service[bot] in #969
make explicit copy to create actor model by @copybara-service[bot] in #973
extend support for gemma1.1 in automodel by @copybara-service[bot] in #970
[Tunix] replaces a print statement with logging.vlog and other minor nits. by @copybara-service[bot] in #974
Apply transpose rule to safetensor saver function by @wang2yn84 in #971
[Tunix] Updates the split_by_mesh_axis access pattern. by @copybara-service[bot] in #975
[Tunix] Fix the logic to handle multiple intermetidate mesh. by @copybara-service[bot] in #976
Fix Gemma 3 model loading by @selamw1 in #982
Add overlong reward shaping for DAPO. by @copybara-service[bot] in #947
Internal Doc by @copybara-service[bot] in #984
Internal change by @copybara-service[bot] in #985
Feat/add lora for sglangjax by @aolemila in #826
Add an multi-turn RL example notebook. by @copybara-service[bot] in #988
Fix Gemma 3 safetensor loading by @copybara-service[bot] in #987
minor resharding updates. by @copybara-service[bot] in #989
Disable Lora in base_config.yaml by @copybara-service[bot] in #983
fix math util parsing by @copybara-service[bot] in #994
Add support for interleaved layer mappings and enhanced key mapping regex. by @copybara-service[bot] in #815
Unify to HF as the single model_id by @copybara-service[bot] in #981
Tunix Documentation V2 for OSS by @copybara-service[bot] in #990
Improve HBM usage tracking by avoiding double counting by @copybara-service[bot] in #1003
Add perf_metrics to tunix cli -- grpo by @copybara-service[bot] in #1001
enable Sphinx to fix slug for every header by @copybara-service[bot] in #1007
[Tunix] Update Copybara to keep a TOC placeholder in g3doc. by @copybara-service[bot] in #1011
Allow single str input by @copybara-service[bot] in #1010
[Tunix] Fix the copybara file transformation issue. by @copybara-service[bot] in #1012
[Distillation] Enhanced Logit Strategy with Top-K and Metrics by @gagika in #991
Add qwen3 vllm/sglang weight mapping support by @wang2yn84 in #1009
[Tunix] Fix Qwen2 vLLM to JAX embedding mapping. by @copybara-service[bot] in #1019
Adding new sharding and performance args for vLLM by @NicoGrande in #1006
[Tunix] Add _put_prompts_to_queue to handle dataset iteration and partial batches. by @copybara-service[bot] in #1018
[Tunix] Adds a check to ensure that arrays are still live before accessing their shards. by @copybara-service[bot] in #1025
Explicitly handle single string inputs in samplers. by @copybara-service[bot] in #1030
reduce log by @copybara-service[bot] in #1022
retry on hf download and list to prevent gateway error that cause presubmit timeout by @copybara-service[bot] in #1014
Add Gemini Code Assist style guide for PR reviews by @copybara-service[bot] in #1024
move scrubber after transformation for it to locate code in github by @copybara-service[bot] in #1016
[Tunix] Fix Qwen2/3 vLLM to JAX lm_head mapping. by @copybara-service[bot] in #1020
bug: fix metric refer_inference_time. by @copybara-service[bot] in #1037
feat: add multi-rollout engine interfaces. by @copybara-service[bot] in #1039
properly set LoRA alpha by @copybara-service[bot] in #1044
Clean up and update the deepscaler training script with sglang configurations. by @copybara-service[bot] in #1045
[Tunix] Use single shared loop for producer. by @copybara-service[bot] in #1042
[Tunix] Add trajectory data logging to disk for further analysis and visualization. by @copybara-service[bot] in #980
[Tunix] Add convert_messages_to_string to handle numpy array content in messages. by @copybara-service[bot] in #1047
internal by @copybara-service[bot] in #1046
Cast logits to float32 in sample_top_p. by @copybara-service[bot] in #1051
increase episode timeout by @copybara-service[bot] in #1052
VLM Training (1): Add vision to Gemma 3 by @copybara-service[bot] in #986
update README file in tunix by @copybara-service[bot] in #1054
Add timeline visualization of perf metrics by @copybara-service[bot] in #1035
Sort perfetto trace based on uuid so the sequence follows the timeline by @copybara-service[bot] in #1050
Use the rollout engine construct mesh. by @wang2yn84 in #1060
Fix spacing in algorithm diagram. by @copybara-service[bot] in #1065
Add logging for perf metric mode by @copybara-service[bot] in #1057
Skip softmax and sorting of probabilities when top_p == 1.0 and top_k is None. by @copybara-service[bot] in #1056
Refactor tunix Gemma3-4b SFT script to use new config structure. by @copybara-service[bot] in #1059
Log the computed score in GSM8K reward function by @copybara-service[bot] in #1058
[Tunix] Add Qwen3 32B model configuration. by @copybara-service[bot] in #1068
fix gcs paths in deepscaler notebook by @copybara-service[bot] in #1074
Fix wrong parent when nesting perf metrics by @copybara-service[bot] in #1066
fix optimizer CP restore by @copybara-service[bot] in #1070
[Tunix] Add log_level config to SglangJaxSampler. by @copybara-service[bot] in #1069
[Tunix] Add trajectory logging to agentic GRPO learner. by @copybara-service[bot] in #1071
remove max_steps from trajectory_collect_engine by @copybara-service[bot] in #1076
use absl logging across the repo by @copybara-service[bot] in #1077
feat: log rollout and train time at micro batch level. by @copybara-service[bot] in #1038
Add all steps to global span by @copybara-service[bot] in #1067
[Tunix] Remove upper bound on JAX version in tunix prod dependencies. by @copybara-service[bot] in #1079
Refactor GRPO rollout to simplify grouping and avoid deepcopy by @copybara-service[bot] in #1075
[Tunix] Pad the number of heads for projection bias. by @copybara-service[bot] in #1086
Remove max_open_buckets from GroupQueueManager by @copybara-service[bot] in #1083
chore: Migrate gsutil usage to gcloud storage by @gurusai-voleti in #1082
[Feat] add log_level in SglangJaxConfig and update default page_size by @aolemila in #1090
fix potential race condition on dictionary update by @copybara-service[bot] in #1091
update doc string and error message by @copybara-service[bot] in #1093
Supports padding kwargs for samplers. by @wang2yn84 in #1095
[Tunix]: Skip the already trained data on job resume. by @copybara-service[bot] in #1088
disable perf metrics by default in the cli. by @copybara-service[bot] in #1084
minor update by @copybara-service[bot] in #1092
Added a GPU demo for PEFT with QLoRA on Llama 3_1 by @katjasrz in #1105
Use Exception instead BaseException in Tunix by @wang2yn84 in #1108
fix loss mask for agentic learner by @copybara-service[bot] in #1100
Set the max worker number in asyncio loop. by @wang2yn84 in #1111
add comment clarifying micro batch has to be 1 by @copybara-service[bot] in #1094
Refactor the vllm sampler config with InitVar by @wang2yn84 in #1110
forbidden_tokens in sampler call accepts token IDs instead of strings. by @copybara-service[bot] in #1114
simplify trajectory result processing by @copybara-service[bot] in #1109
fix group_id and pair_idx in traj by @copybara-service[bot] in #1116
Add Colab and Kaggle badges to qlora_llam3_gpu example tutorial by @rajasekharporeddy in #1122
[Tunix] mprove GCS CSV writing. by @copybara-service[bot] in #1120
use max_response_len for deepscaler by @copybara-service[bot] in #1124
fix offpolicy step by @copybara-service[bot] in #1128
Update links in Tunix OSS README. by @copybara-service[bot] in #1121
[Tunix] Engine kwargs overwrite predefined config keys. by @copybara-service[bot] in #1129
[Tunix] Minor fix on rewards by @copybara-service[bot] in #1131
[Tunix] Handles None logits from vLLM. by @copybara-service[bot] in #1126
Change duplicate function registration to log a warning instead of raising an error. by @copybara-service[bot] in #1119
Fix auto-assignment of github issues and pull requests to the eng. by @rajasekharporeddy in #1089
Match changes in README by @copybara-service[bot] in #1113
[Tunix] Fixes a bug in math_rewards.py where multiple rewards could be added for a single sample. by @copybara-service[bot] in #1142
Adding fix to logical axis cm for RL. by @NicoGrande in #1143
Fix eos issues in sglang / vllm samplers during on-policy rollout by @yixinw in #1148
allow passing rollout configs from sglang/vllm through cli by @yixinw in #1140
speed up agentic rl by @copybara-service[bot] in #1136
Minor Doc Fixes: Correct a typo and add a hyperlink. by @rajasekharporeddy in #800
[Tunix] Another minor fix for extracting ground truth. We should evaluated against a ground truth without boxed formatted always. by @copybara-service[bot] in #1154
remat MLP block by @copybara-service[bot] in #1130
fix group_id by @copybara-service[bot] in #1157
ensure seed is set for VLLM sampling params by @copybara-service[bot] in #1162
fix deepscaler notebook by @copybara-service[bot] in #1161
Move ifrt based reshard out of experimental. Leaving intermediate resharding and sidechannel resharding in experimental. by @copybara-service[bot] in #1146
[Tunix] Add vLLM sampler for math eval. by @copybara-service[bot] in #1152
speedup trainer for RL by @copybara-service[bot] in #1141
some BE work by @copybara-service[bot] in #1168
[Tunix] Update BaseAgent to accept observations with a "prompts" key rather than "question". by @copybara-service[bot] in #1164
expert parallelism config in base rollout by @khatwanimohit in #1099
fix metric logging step by @copybara-service[bot] in #1173
Add DeepSWE train script. by @copybara-service[bot] in #1134
Update the qlora_llama3_gpu.ipynb notebook by @rajasekharporeddy in #1160
[Tunix] improve trajectory logging to suppor numpy array and scalar types. by @copybara-service[bot] in #1163
[Tunix] Initialize policy version from global steps. by @copybara-service[bot] in #1182
make trace writing a configurable option by @copybara-service[bot] in #1085
remove unnecessary rollout round by @copybara-service[bot] in #1174
fix expert_parallel_size to not pass through to vLLM args by @khatwanimohit in #1181
measure global step time, prompt len and clip ratio by @copybara-service[bot] in #1183
remove tflops measurement by @copybara-service[bot] in #1171
change defaults for Dropout and BatchNorm by @copybara-service[bot] in #1184
[Resolved 1149] fix oom due to missing closing loop by @aolemila in #1151
[Tunix] Add support for aligning 1D KV biases in sglang_jax. by @copybara-service[bot] in #1175
Enable pr from user's fork to auto assign issues correctly. by @copybara-service[bot] in #1188
[Tunix] Reduce log spam from type mismatch warnings. by @copybara-service[bot] in #1187
split metric prefix by @copybara-service[bot] in #1185
Add support for vllm sampler kwargs. by @NicoGrande in #1169
add pg_clipfrac to grpo_learner by @andytwigg in #1203
fix flax==0.12.4 in tpu-tests.yml temporarily by @aolemila in #1206
[Tunix Perf] New timeline and span definitions by @copybara-service[bot] in #1147
Refactor BackendMappingMixin to use explicit BACKEND_PACKAGE_PATH by @copybara-service[bot] in #1198
Supported mixed precision training in qwen2 by @copybara-service[bot] in #1199
[Tunix Perf] New perf tracer by @copybara-service[bot] in #1172
force loss computation to be in fp32 by @copybara-service[bot] in #1210
Add perfetto and logging export by @copybara-service[bot] in #1155
[Tunix] Make returning logprobs from vLLM sampler configurable. by @copybara-service[bot] in #1212
make perf engine (v1 or v2) and export function selectable. by @copybara-service[bot] in #1191
Add image processor by @copybara-service[bot] in #1064
Allow sampler to take in images by @copybara-service[bot] in #1103
Add VLM SFT example by @copybara-service[bot] in #1104
add docs for perf tracing by @copybara-service[bot] in #1209
Fix typos in README by @copybara-service[bot] in #1215
Add RLOO advantage estimator to Tunix. by @copybara-service[bot] in #1211
[Tunix] Remove utils.time_measure from the training loop. by @copybara-service[bot] in #1118
Fix typo in documentation. by @copybara-service[bot] in #1178
add entroy loss, grad_norm to metrics by @copybara-service[bot] in #1216
Add trajectory status to track limit. by @copybara-service[bot] in #1005
add qwen3 grpo example script with simplemath rewards by @andytwigg in #1217
add support for qwen3-base variants by @andytwigg in #1204
add simple_math reward_fn by @andytwigg in #1214
Update issue auto-assignment script to async and clean up logic. by @copybara-service[bot] in #1220
Enable tied embedding for Qwen3-0.6B and Qwen3-1.7B. by @copybara-service[bot] in #1219
[Tunix] Switch to safetensor API based loader when Pathways is enabled. by @copybara-service[bot] in #1226
[Tunix] Update DeepScaler training notebook with vLLM optimizations. by @copybara-service[bot] in #1231
use unified trace_dir for the trace writer by @copybara-service[bot] in #1218
[Tunix] Add a script to run Pathways on GKE. by @copybara-service[bot] in #1229
create NOOP trace writer by @copybara-service[bot] in #1232
Update Qwen3 JAX to HF mappings for vLLM. by @copybara-service[bot] in #1234
[Tunix] Fix the get_per_token_logps signature in vllm_rollout.py. by @copybara-service[bot] in #1236
chore: move agentic rl learner out of experimental. by @copybara-service[bot] in #1230
fix: add lora flag validation in RLCluster init. by @copybara-service[bot] in #1239

New Contributors

@keshavb96 made their first contribution in #760
@selamw1 made their first contribution in #865
@NicoGrande made their first contribution in #841
@SarveshMahalingam made their first contribution in #938
@gagika made their first contribution in #991
@gurusai-voleti made their first contribution in #1082
@katjasrz made their first contribution in #1105
@yixinw made their first contribution in #1148
@khatwanimohit made their first contribution in #1099
@andytwigg made their first contribution in #1203

Full Changelog: v0.1.5...v0.1.6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tunix v0.1.6 — Agentic RL & VLM

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Highlights

What's Changed

New Contributors

Contributors

Uh oh!