Fix/Improve vllm PTQ and Support multi-node with ray #484

mxinO · 2025-10-30T05:19:50Z

What does this PR do?

Type of change: Bug fix

Overview:
Fix or improve the vllm PTQ.

Now support ray, and can run on multiple nodes.
MoE typo, and better folding weight for large MoE layers.
Add the layer SharedFusedMoE
Support vllm > 0.11 (not released yet)
Add os env to specify quant configs

Usage

Testing

Tested with latest vllm.

Additional Information

The vllm >0.11.0 changed the low-level API significantly. Some changes needs to be removed when vllm<=0.11.0 is outdated.

copy-pr-bot · 2025-10-30T05:19:53Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

Signed-off-by: mxin <[email protected]>

realAsma · 2025-11-04T14:42:01Z

examples/vllm_serve/vllm_serve_fakequant.py

@mxinO does this maintain the support for non-ray + vLLM ?

realAsma · 2025-11-04T14:44:20Z

examples/vllm_serve/fakequant_worker.py

+        model.load_state_dict(current_state_dict)
+        torch.distributed.barrier()
+
+    if amax_file_path is None:
+        # Sync amax across TP can be done here if needed
+        pass
+        # for name, buffer in model.named_buffers():
+        #     if name.endswith("_amax"):
+        #         print("syncing amax across TP for", name)
+        #         torch.distributed.all_reduce(
+        #             buffer, op=torch.distributed.ReduceOp.MAX, group=get_tp_group().device_group
+        #         )
+        # torch.distributed.barrier()
+
+    if not torch.distributed.is_initialized() or torch.distributed.get_rank() == 0:
+        mtq.print_quant_summary(model)
+
+    mtq.fold_weight(model)
+    for name, module in model.named_modules():
+        if name.endswith("weight_quantizer"):
+            assert not module.is_enabled, f"quantizer {name} is still enabled"


Do we need to do this under disable_compilation context?

vllm fix

a94f463

mxinO self-assigned this Oct 30, 2025

mxinO added 5 commits October 29, 2025 23:12

minor

d9d5fd7

support multiple version

d8d22d5

Signed-off-by: mxin <[email protected]>

cuda graph

d67100a

Signed-off-by: mxin <[email protected]>

support multiple version

da506bd

update doc

c7eaefa

mxinO changed the title ~~[Draft] Fix/Improve vllm PTQ~~ Fix/Improve vllm PTQ, and support latest vllm Nov 4, 2025

mxinO changed the title ~~Fix/Improve vllm PTQ, and support latest vllm~~ Fix/Improve vllm PTQ and Support multi-node with ray Nov 4, 2025

mxinO marked this pull request as ready for review November 4, 2025 06:07

mxinO requested review from a team as code owners November 4, 2025 06:07

mxinO requested review from Edwardf0t1, RalphMao, kinjalpatel27 and realAsma November 4, 2025 06:07

mxinO added 2 commits November 3, 2025 22:10

clean up

c76ed00

Merge remote-tracking branch 'origin/main' into mxin/vllm_fix

94774c8

realAsma reviewed Nov 4, 2025

View reviewed changes

examples/vllm_serve/vllm_serve_fakequant.py

Copy link

Contributor

realAsma Nov 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mxinO does this maintain the support for non-ray + vLLM ?

realAsma reviewed Nov 4, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix/Improve vllm PTQ and Support multi-node with ray #484

Fix/Improve vllm PTQ and Support multi-node with ray #484

mxinO commented Oct 30, 2025 •

edited

Loading

Uh oh!

copy-pr-bot bot commented Oct 30, 2025

Uh oh!

realAsma Nov 4, 2025

Uh oh!

realAsma Nov 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix/Improve vllm PTQ and Support multi-node with ray #484

Are you sure you want to change the base?

Fix/Improve vllm PTQ and Support multi-node with ray #484

Conversation

mxinO commented Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Usage

Testing

Additional Information

Uh oh!

copy-pr-bot bot commented Oct 30, 2025

Uh oh!

realAsma Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

realAsma Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mxinO commented Oct 30, 2025 •

edited

Loading