-
Notifications
You must be signed in to change notification settings - Fork 68
JAX-vLLM Offloading k8s (GKE) #1797
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
b45cada to
e0a1b67
Compare
1b95e99 to
a3f36ff
Compare
yhtang
reviewed
Nov 26, 2025
.github/gke-workflow/jax-vllm-offloading/transfer/deployment/rollout.yml
Outdated
Show resolved
Hide resolved
.github/gke-workflow/jax-vllm-offloading/transfer/deployment/rollout.yml
Outdated
Show resolved
Hide resolved
.github/gke-workflow/jax-vllm-offloading/transfer/deployment/rollout.yml
Outdated
Show resolved
Hide resolved
.github/gke-workflow/jax-vllm-offloading/transfer/deployment/trainer.yml
Outdated
Show resolved
Hide resolved
64c2abc to
aeb70b0
Compare
4742877 to
4730a76
Compare
046f541 to
288d621
Compare
de72d14 to
53b1b38
Compare
53b1b38 to
c35b73e
Compare
Member
Author
|
https://github.com/NVIDIA/JAX-Toolbox/actions/runs/20374445860 |
Steboss
previously approved these changes
Jan 13, 2026
Contributor
Steboss
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing job. I can feel the pain in testing some parts. Well done 🚀
.github/gke-workflow/jax-vllm-offloading/transfer/deployment/gateway-pod.yml
Outdated
Show resolved
Hide resolved
.github/gke-workflow/jax-vllm-offloading/transfer/deployment/rollout.yml
Outdated
Show resolved
Hide resolved
.github/gke-workflow/jax-vllm-offloading/transfer/deployment/rollout.yml
Outdated
Show resolved
Hide resolved
.github/gke-workflow/jax-vllm-offloading/transfer/deployment/trainer.yml
Outdated
Show resolved
Hide resolved
Steboss
previously approved these changes
Jan 14, 2026
Contributor
Steboss
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀
Steboss
approved these changes
Jan 14, 2026
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
JAX-vLLM offloading transfer and GRPO examples on Kubernetes
Benchmark results
Transfer
create_bridgeload_modeltransferdeploy.shdeploy.shdeploy.shjobset.ymljobset.ymlexample-transfer-multinode.shexample-transfer-multinode.shGRPO
handshakerollouttrainingexample-grpo-multinode.shjobset.yamlexample-grpo-multinode.shn.b.
meta-llama/Llama-3.1-8B-InstructCI workflow
The CI workflow added in this PR handles the building of
amd64andarm64images which are then used to run the transfer and GRPO k8s recipe workloads on GKE.The workloads are created on the cluster with the xpk toolkit which creates a JobSet resource. This is done using the xpk-gke composite action that is already used for NCCL and MaxText workloads on GKE.
For the transfer recipe,
meta-llama/Llama-3.1-8B-Instructandmeta-llama/Llama-3.1-70B-Instructare run. For the GRPO recipe, onlymeta-llama/Llama-3.1-8B-Instructruns due to memory constraints on the particular GPU in the GKE cluster.example run
Appendix: Detailed results
Transfer
Single-node 2:2 (a3-megagpu-8g - H100)
Single-node 4:4 (a3-megagpu-8g - H100)
2-node 8:8 (a3-megagpu-8g - H100)
As at e0a1b672-node 8:8 JobSet with TCPXO plugin enabled (a3-megagpu-8g - H100)
JobSet as at fd9c38f
2-node 8:8 JobSet with TCPXO plugin no debug (a3-megagpu-8g - H100)
JobSet as at fd9c38f
2-node 8:8 slurm (viking-prod - H100)
2-node 8:8 slurm no debug (eos - H100)
GRPO
2-node 8:8 JobSet with TCPXO plugin enabled (a3-megagpu-8g - H100)
Single-node 4:4 slurm (eos - H100)