Add Qwen3-Next tests to MaxText_MoE DAG#1194
Conversation
RissyRan
left a comment
There was a problem hiding this comment.
LGTM! Once the test is green, I will click the approval. Thanks!
| "cluster": XpkClusters.TPU_V5P_8_CLUSTER, | ||
| "time_out_in_min": 90, | ||
| }, | ||
| "qwen3-next-80b": { |
There was a problem hiding this comment.
Can you add yourself as the owner for qwen3-next-80b?
We can do something like the maxtext_end_to_end DAG, where the owner is different across tests:
ml-auto-solutions/dags/multipod/maxtext_end_to_end.py
Lines 61 to 71 in 26acd7b
There was a problem hiding this comment.
Sure, I have added myself for the qwen3 next test. For the other tests, i let it default to you.
| time_out_in_min=90, | ||
| test_name="maxtext_qwen3_next_80b_test", | ||
| run_model_cmds=( | ||
| f"export HF_TOKEN={HF_TOKEN}; export BASE_OUTPUT_PATH=$GCS_OUTPUT; bash tests/end_to_end/tpu/qwen/next/qwen3-next-80b-a3b/2_test_qwen3_next_80b_a3b.sh", |
There was a problem hiding this comment.
To confirm this test is only checking for runtime errors ? do you also plan on adding a tests for logits now or in the future ?
There was a problem hiding this comment.
The test is running this script: https://github.com/AI-Hypercomputer/maxtext/blob/040c71b73616d768b141da07292fb0417164846c/tests/end_to_end/tpu/qwen/next/qwen3-next-80b-a3b/2_test_qwen3_next_80b_a3b.sh
It does:
- Forward pass logit check
- train workload
- finetuning workload
- decoding workload
It should cover logit comparision, runtime errors, config checks, train/decoding support, etc. Pretty much all end-to-end model checks.
parambole
left a comment
There was a problem hiding this comment.
LGTM. I have left a few comments. PTAL
Add qwen3-next to MaxText_moe DAG Run pylinter add separate test owner logic
acaac29 to
e029f69
Compare
Description
Onboard qwen3-next tests to XLML DAG.
The script that the DAG will run tests:
Tests
Made a dummy DAG to run the tests in local XLML. Will remove once verified that tests pass
The output of the DAG:
Note: Caching is added to the model in this pr: AI-Hypercomputer/maxtext#2971. It has not merged into main and thus decoding test will fail in this DAG. However, once that pr merges the command will work.
Checklist
Before submitting this PR, please make sure (put X in square brackets):