Skip to content

Commit 8629e8b

Browse files
committed
More UXR fixes
XPK quick guides fix flags clarification amend fix
1 parent 2226c7d commit 8629e8b

File tree

4 files changed

+36
-12
lines changed

4 files changed

+36
-12
lines changed

docs/install_maxtext.md

Lines changed: 27 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -122,7 +122,7 @@ seed-env \
122122
--output-dir=generated_gpu_artifacts
123123
```
124124

125-
## 4. Update Project Files
125+
## Step 4: Update Project Files
126126

127127
After generating the new requirements, you need to update the files in the MaxText repository.
128128

@@ -133,7 +133,7 @@ After generating the new requirements, you need to update the files in the MaxTe
133133
2. **Update `extra_deps_from_github.txt` (if necessary):**
134134
Currently, MaxText uses a few dependencies, such as `mlperf-logging` and `google-jetstream`, that are installed directly from GitHub source. These are defined in `base_requirements/requirements.txt`, and the `seed-env` tool will carry them over to the generated requirements files.
135135

136-
## 5. Verify the New Dependencies
136+
## Step 5: Verify the New Dependencies
137137

138138
Finally, test that the new dependencies install correctly and that MaxText runs as expected.
139139

@@ -155,4 +155,28 @@ uv pip install -e .[tpu] --resolution=lowest
155155
install_maxtext_github_deps
156156
```
157157

158-
3. **Run tests:** Run MaxText tests to ensure there are no regressions.
158+
3. **Run tests:** Run MaxText tests to ensure there are no regressions.
159+
160+
## Appendix: Install XPK for MaxText Multi-host Workloads
161+
162+
> **_NOTE:_** XPK is only required for multi-host TPU configurations (e.g., v5p-128, v6e-256). For single-host training, XPK is not needed and you can run MaxText directly on your TPU VM.
163+
164+
XPK (Accelerated Processing Kit) is a tool designed to simplify the orchestration and management of workloads on Google Kubernetes Engine (GKE) clusters with TPU or GPU accelerators. In MaxText, we use XPK to submit both pre-training and post-training jobs on multi-host TPU configurations.
165+
166+
For your convenience, we provide a minimal installation path below:
167+
```bash
168+
# Directly install xpk using pip
169+
pip install xpk
170+
171+
# Install kubectl
172+
sudo apt-get update
173+
sudo apt install snapd
174+
sudo snap install kubectl --classic
175+
176+
# Install gke-gcloud-auth-plugin
177+
echo "deb [signed-by=/usr/share/keyrings/cloud.google.gpg] https://packages.cloud.google.com/apt cloud-sdk main" | sudo tee -a /etc/apt/sources.list.d/google-cloud-sdk.list
178+
curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key --keyring /usr/share/keyrings/cloud.google.gpg add -
179+
sudo apt update && sudo apt-get install google-cloud-sdk-gke-gcloud-auth-plugin
180+
```
181+
182+
For detailed setup instructions and advanced features, please refer to the [official XPK documentation](https://github.com/AI-Hypercomputer/xpk).

docs/tutorials/posttraining/rl.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -117,7 +117,7 @@ Run the following command for GRPO:
117117
python3 -m src.MaxText.rl.train_rl src/MaxText/configs/rl.yml \
118118
model_name=${MODEL} \
119119
tokenizer_path=${TOKENIZER} \
120-
load_parameters_path=${MAXTEXT_CKPT_PATH} \
120+
load_parameters_path=${MAXTEXT_CKPT_PATH}/0/items \
121121
run_name=${RUN_NAME} \
122122
base_output_directory=${BASE_OUTPUT_DIRECTORY} \
123123
hf_access_token=${HF_TOKEN}
@@ -136,12 +136,12 @@ Run the following command for GSPO:
136136

137137
```
138138
python3 -m src.MaxText.rl.train_rl src/MaxText/configs/rl.yml \
139-
model_name=llama3.1-8b \
140-
tokenizer_path=meta-llama/Llama-3.1-8B-Instruct \
141-
load_parameters_path=gs://path/to/checkpoint/0/items \
142-
run_name=$WORKLOAD \
143-
base_output_directory=$OUTPUT_PATH \
144-
hf_access_token=$HF_TOKEN \
139+
model_name=${MODEL} \
140+
tokenizer_path=${TOKENIZER} \
141+
load_parameters_path=${MAXTEXT_CKPT_PATH}/0/items \
142+
run_name=${RUN_NAME} \
143+
base_output_directory=${BASE_OUTPUT_DIRECTORY} \
144+
hf_access_token=${HF_TOKEN} \
145145
loss_algo=gspo-token
146146
```
147147

docs/tutorials/posttraining/rl_on_multi_host.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -93,7 +93,7 @@ You can install the required dependencies using either of the following two opti
9393
### Option 1: Installing stable releases of tunix and vllm-tpu
9494
Run the following bash script to create a docker image with all the dependencies of MaxText, Tunix, vLLM and tpu-inference installed.
9595

96-
In addition to MaxText dependencies, primarily, it installs `vllm-tpu` which is [vllm](https://github.com/vllm-project/vllm) and [tpu-inference](https://github.com/vllm-project/tpu-inference) and thereby providing TPU inference for vLLM, with unified JAX and PyTorch support.
96+
In addition to MaxText dependencies, primarily, it installs `vllm-tpu` which is [vllm](https://github.com/vllm-project/vllm) and [tpu-inference](https://github.com/vllm-project/tpu-inference) and thereby providing TPU inference for vLLM, with unified JAX and PyTorch support. This build process takes approximately 10 to 15 minutes.
9797

9898
```
9999
bash dependencies/scripts/docker_build_dependency_image.sh MODE=post-training

docs/tutorials/posttraining/sft_on_multi_host.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,7 @@ gcloud auth application-default login
4343
gcloud auth configure-docker
4444
docker run hello-world
4545
```
46-
Then run the following command to create a local Docker image named `maxtext_base_image`.
46+
Then run the following command to create a local Docker image named `maxtext_base_image`. This build process takes approximately 10 to 15 minutes.
4747
```bash
4848
bash dependencies/scripts/docker_build_dependency_image.sh MODE=post-training
4949
```

0 commit comments

Comments
 (0)