Support Custom MaxText model (with vLLM engine) in RL rollouts. #2778

NicoGrande · 2025-12-02T21:47:42Z

Description

This PR finishes the work started by @gagika in #2767. Credits to @gagika for helping with this feature!

This PR adds the changes required to train_rl.py as well as other modules related to Tunix integration to allow for additional configurations needed for the MaxText on vLLM flow to be passed to Tunix.

More specifically, this PR adds vllm_additional_config and vllm_hf_config_path as new arguments such that these values can be pipelined to Tunix for RL.

Additionally, this PR makes some small modifications to tunix_adapter.py to allow for no-ops to be used as mappings when running RL using MaxText for vLLM.

Tests

Gemma3-4B:

Local (v6e-4 VM):

NEW_MODEL_DESIGN=True  HF_TOKEN=$HF_TOKEN TPU_BACKEND_TYPE=jax python3 -m src.MaxText.rl.train_rl src/MaxText/configs/rl.yml \
  model_name=gemma3-4b \
  tokenizer_path=google/gemma-3-4b-it \
  run_name=$WORKLOAD \
  base_output_directory=$OUTPUT_PATH \
  hf_access_token=$HF_TOKEN \
scan_layers=False \
load_parameters_path="gs://maxtext-gemma/unified/gemma3/4b/unscanned/2025-08-09-01-17/0/items" \
  vllm_hf_config_path=src/MaxText/integration/vllm/maxtext_vllm_adapter   vllm_additional_config='{"maxtext_config": {"model_name": "gemma3-4b", "max_prefill_predict_length": 28, "max_target_length": 32, "ici_tensor_parallelism": 4}}'

Output: logs

Checklist

Before submitting this PR, please make sure (put X in square brackets):

I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
I have necessary comments in my code, particularly in hard-to-understand areas.
I have run end-to-end tests tests and provided workload links above if applicable.
I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

Fix formatting. Refactor model creation and error handling in RL training fix linting. adding no-op mappings to tunix adapter. removing kvcache init for vllm case.

NicoGrande force-pushed the nicogrande/maxtext-vllm-rl-integration branch 2 times, most recently from ff0eb43 to bc49da1 Compare December 4, 2025 23:31

Support Custom MaxText model (with vLLM engine) in RL rollouts.

9b5677c

Fix formatting. Refactor model creation and error handling in RL training fix linting. adding no-op mappings to tunix adapter. removing kvcache init for vllm case.

NicoGrande force-pushed the nicogrande/maxtext-vllm-rl-integration branch from bc49da1 to 9b5677c Compare December 5, 2025 19:26

NicoGrande marked this pull request as ready for review December 5, 2025 21:55

NicoGrande requested review from A9isha, NuojCheng, RissyRan, SurbhiJainUSC, aireenmei, bvandermoon, gagika, gobbleturk, hengtaoguo, jiangjy1982, khatwanimohit, parambole, richjames0, shralex, shuningjin, suexu1025 and vipannalla as code owners December 5, 2025 21:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support Custom MaxText model (with vLLM engine) in RL rollouts. #2778

Support Custom MaxText model (with vLLM engine) in RL rollouts. #2778

Uh oh!

NicoGrande commented Dec 2, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Support Custom MaxText model (with vLLM engine) in RL rollouts. #2778

Are you sure you want to change the base?

Support Custom MaxText model (with vLLM engine) in RL rollouts. #2778

Uh oh!

Conversation

NicoGrande commented Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Tests

Gemma3-4B:

Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

NicoGrande commented Dec 2, 2025 •

edited

Loading