Merge pull request #2755 from AI-Hypercomputer:hengtaoguo-rl

Google-ML-Automation · Google-ML-Automation · commit 5162576d96fa · 2025-11-26T18:10:20.000-08:00
PiperOrigin-RevId: 837317146
diff --git a/docs/guides/install_maxtext.md b/docs/guides/install_maxtext.md
@@ -22,13 +22,13 @@ This document discusses how to install MaxText. We recommend installing MaxText
 This is the easiest way to get started with the latest stable version.
 
 ```bash
-# 1. Create virtual environment
+# 1. Install uv, a fast Python package installer
+pip install uv
+
+# 2. Create virtual environment
 uv venv --python 3.12 --seed maxtext_venv
 source maxtext_venv/bin/activate
 
-# 2. Install uv, a fast Python package installer
-pip install uv
-
 # 3. Install MaxText and its dependencies
 uv pip install maxtext --resolution=lowest
 install_maxtext_github_deps
@@ -46,11 +46,11 @@ git clone https://github.com/AI-Hypercomputer/maxtext.git
 cd maxtext
 
 # 2. Create virtual environment
+pip install uv
 uv venv --python 3.12 --seed maxtext_venv
 source maxtext_venv/bin/activate
 
 # 3. Install dependencies in editable mode
-pip install uv
 # install the tpu package
 uv pip install -e .[tpu] --resolution=lowest
 # or install the gpu package by running the following line
diff --git a/docs/tutorials/grpo.md b/docs/tutorials/grpo.md
@@ -26,8 +26,22 @@ And we use vLLM as the library for efficient model inference and generation.
 In this tutorial we use a single host TPUVM such as `v6e-8/v5p-8`. Let's get started!
 
 ## Create virtual environment and Install MaxText dependencies
-Follow instructions in [Install MaxText](https://github.com/AI-Hypercomputer/maxtext/blob/main/docs/guides/install_maxtext.md), but 
-recommend creating the virtual environment outside the `maxtext` directory.
+If you have already completed the [MaxText installation](https://github.com/AI-Hypercomputer/maxtext/blob/main/docs/guides/install_maxtext.md), you can skip to the next section for vLLM and tpu-inference installations. Otherwise, please install MaxText using the following commands before proceeding.
+```bash
+# 1. Clone the repository
+git clone https://github.com/AI-Hypercomputer/maxtext.git
+cd maxtext
+
+# 2. Create virtual environment
+export VENV_NAME=<your virtual env name> # e.g., maxtext_venv
+pip install uv
+uv venv --python 3.12 --seed $VENV_NAME
+source $VENV_NAME/bin/activate
+
+# 3. Install dependencies in editable mode
+uv pip install -e .[tpu] --resolution=lowest
+install_maxtext_github_deps
+```
 
 ## vLLM and tpu-inference installations
 
diff --git a/docs/tutorials/grpo_with_pathways.md b/docs/tutorials/grpo_with_pathways.md
@@ -82,8 +82,20 @@ python3 -m MaxText.utils.ckpt_conversion.to_maxtext MaxText/configs/base.yml \
 ```
 
 ## Build and Upload MaxText Docker Image with Tunix, vLLM, tpu-inference dependencies
+Before building the Docker image, authenticate to [Google Artifact Registry](https://docs.cloud.google.com/artifact-registry/docs/docker/authentication#gcloud-helper) for permission to push your images and other access.
+```bash
+# Authenticate your user account for gcloud CLI access
+gcloud auth login
+# Configure application default credentials for Docker and other tools
+gcloud auth application-default login
+# Configure Docker credentials and test your access
+gcloud auth configure-docker
+docker run hello-world
+```
+
+You can install the required dependencies using either of the following two options:
 
-### Installing stable releases of tunix and vllm-tpu
+### Option 1: Installing stable releases of tunix and vllm-tpu
 Run the following bash script to create a docker image with all the dependencies of MaxText, Tunix, vLLM and tpu-inference installed.
 
 In addition to MaxText dependencies, primarily, it installs `vllm-tpu` which is [vllm](https://github.com/vllm-project/vllm) and [tpu-inference](https://github.com/vllm-project/tpu-inference) and thereby providing TPU inference for vLLM, with unified JAX and PyTorch support.
@@ -92,9 +104,9 @@ In addition to MaxText dependencies, primarily, it installs `vllm-tpu` which is
 bash dependencies/scripts/docker_build_dependency_image.sh MODE=post-training
 ```
 
-You can also use `bash dependencies/scripts/docker_build_dependency_image.sh MODE=post-training-experimental` to try out new features via experimental dependencies such as improved pathwaysutils resharding API
+You can also use `bash dependencies/scripts/docker_build_dependency_image.sh MODE=post-training-experimental` to try out new features via experimental dependencies such as improved pathwaysutils resharding API.
 
-### Install from locally git cloned repo's
+### Option 2: Install from locally git cloned repositories
 
 You can also locally git clone [tunix](https://github.com/google/tunix), [tpu-inference](https://github.com/vllm-project/tpu-inference), [vllm](https://github.com/vllm-project/vllm.git) and then use the following command to build a docker image using them: 
 ```
@@ -106,7 +118,7 @@ bash dependencies/scripts/docker_build_dependency_image.sh MODE=post-training PO
 bash dependencies/scripts/docker_upload_runner.sh CLOUD_IMAGE_NAME=${CLOUD_IMAGE_NAME}
 ```
 
-### Submit your jobs
+## Submit your jobs
 
 Please create a pathways ready GKE cluster as described [here](https://docs.cloud.google.com/ai-hypercomputer/docs/workloads/pathways-on-cloud/create-gke-cluster), and you can submit the `train_rl.py` script via [XPK](https://github.com/AI-Hypercomputer/xpk)
 ```
@@ -123,10 +135,3 @@ python3 -m src.MaxText.rl.train_rl src/MaxText/configs/rl.yml \
   base_output_directory=${BASE_OUTPUT_DIRECTORY} \
   hf_access_token=$HF_TOKEN"
 ```
-
-The overview of the demo script ~/maxtext/src/MaxText/examples/grpo_llama3_1_70b_demo_pw.py` is as follows:
-
-1. We load a policy model and a reference model. Both are copies of `Llama3.1-70b-Instruct`.
-2. Evaluate the policy model's performance on GSM8K math reasoning benchmark.
-3. Train the policy model using GRPO with potentially different meshes for trainer and rollout depending on the parameters `TRAINER_DEVICES_FRACTION` and `SAMPLER_DEVICES_FRACTION`. If we set both of these to `1.0`, the entire (same) mesh will be used for both trainer and rollout. If we set say `TRAINER_DEVICES_FRACTION=0.5` and `SAMPLER_DEVICES_FRACTION=0.5`, the first half of the devices will be used for trainer and the second half will be used for rollout
-4. Evaluate the policy model's performance on GSM8K math reasoning benchmark after the post-training with GRPO.
diff --git a/docs/tutorials/sft_on_multi_host.md b/docs/tutorials/sft_on_multi_host.md
@@ -33,10 +33,20 @@ cd maxtext
 ```
 
 ### 1.2. Build MaxText Docker image
+Before building the Docker image, authenticate to [Google Artifact Registry](https://docs.cloud.google.com/artifact-registry/docs/docker/authentication#gcloud-helper) for permission to push your images and other access.
+```bash
+# Authenticate your user account for gcloud CLI access
+gcloud auth login
+# Configure application default credentials for Docker and other tools
+gcloud auth application-default login
+# Configure Docker credentials and test your access
+gcloud auth configure-docker
+docker run hello-world
+```
+Then run the following command to create a local Docker image named `maxtext_base_image`.
 ```bash
 bash dependencies/scripts/docker_build_dependency_image.sh MODE=post-training
 ```
-This creates a local Docker image named `maxtext_base_image`.
 
 ### 1.3. Upload the Docker image to Artifact Registry
 ```bash