You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The recipe serves Llama-4-Maverick-17B-128E model using JetStream MaxText Engine on `v6e-32` mulithost slice of TPU v6e Trillium
249
249
250
250
To start the inference, the recipe launches JetStream MaxText Engine that does the following steps:
251
-
1. Downloads the full Llama-4-Maverick-17B-128E model PyTorch checkpoints from [Hugging Face](https://huggingface.co/meta-llama/Llama-4-Maverick-17B-128E-Original).
252
-
2. Convert the model checkpoints from PyTorch format to JAX Orbax format.
251
+
1. Downloads the full Llama-4-Maverick-17B-128E model Hugging Face checkpoints from [Hugging Face](https://huggingface.co/meta-llama/Llama-4-Maverick-17B-128E).
252
+
2. Convert the model checkpoints from Hugging Face format to JAX Orbax format.
253
253
3. Start the JetStream MaxText Engine server.
254
254
3. Inference is ready to respond to requests and run benchmarks
255
255
@@ -267,8 +267,8 @@ The recipe uses the helm chart to run the above steps.
267
267
--dry-run=client -o yaml | kubectl apply -f -
268
268
```
269
269
270
-
2. Convert the checkpoint from PyTorch to Orbax
271
-
This job converts the checkpoint from PyTorch format to JAX Orbax format and unscans it forperformant serving. This unscanned checkpoint is then storedin the mounted GCS bucket so that it can be used by the TPU nodepool to bring up the JetStream serve in the next step.
270
+
2. Convert the checkpoint from Hugging Face to Orbax
271
+
This job converts the checkpoint from Hugging Face format to JAX Orbax format and unscans it forperformant serving. This unscanned checkpoint is then storedin the mounted GCS bucket so that it can be used by the TPU nodepool to bring up the JetStream serve in the next step.
0 commit comments