Update unquantized_fused_moe_method.py #849
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Change assignment of unquantized moe weights when using aiter on rocm, making it safer for reloading the weights. This will solve the random output case after wake-up and reloading weights in reinforcement learning.
Purpose
Change assignment of unquantized moe weights when using aiter on rocm, making it safer for reloading the weights. Solve the random output case after wake-up and reloading weights in reinforcement learning.
I've been doing some adaptation works for using vllm on ROLL (https://github.com/alibaba/ROLL/) with ROCm platforms. It turns out to generate random characters after vllm instance sleeps and wake-up and get model weights from reloading from train actors. Then I found out that AITER will shuffle the original weights to a better layout to be calculated, but it points to a new tensor, which will cause problem after capturing cuda graphs and reloading weights after waking up. This pull request will fix this issue in general usage of vllm when using AITER on ROCm platforms in reinforcement learning.
Test Plan
Enabling VLLM_ROCM_USE_AITER=1 and VLLM_ROCM_USE_AITER_MOE=1 testing Qwen3-30BA3B (or a moe model) after wake-up and reloading weights in reinforcement learning.
Test Result
If using previous assignment, the output will be random characters. If using my assignment, the output will be normal.
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.