revert awq dtype change for vllm inference limitation by WeiweiZhang1 · Pull Request #1613 · intel/auto-round

WeiweiZhang1 · 2026-03-25T07:33:42Z

Description

The performance of vllm awq inference varies across different devices; the CUDA restrictions on the float16 data type still apply on the A100, so revert the data type change to ensure the robustness of the inference.

Type of Change

Related Issues

Fixes or relates to #

Checklist Before Submitting

My code has been tested locally.
Documentation has been updated as needed.
New or updated tests are included where applicable.

Signed-off-by: WeiweiZhang1 <weiwei1.zhang@intel.com>

Copilot

Pull request overview

This PR aims to improve robustness of vLLM AWQ inference across different CUDA devices by ensuring the exported model’s dtype metadata aligns with vLLM’s AWQ kernel limitations.

Changes:

Force torch.float16 dtype metadata during AWQ export to improve vLLM compatibility.
Extend AutoRound export dtype selection to prefer FP16 when the packing format is AWQ.
Update the vLLM AWQ integration test to pass an explicit dtype argument to LLM(...).

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File	Description
`test/test_cuda/integrations/test_vllm.py`	Adjusts vLLM initialization for the AWQ integration test.
`auto_round/export/export_to_awq/export.py`	Forces AWQ exports to write FP16 dtype metadata via `save_model(..., dtype=...)`.
`auto_round/export/export_to_autoround/export.py`	Selects FP16 dtype metadata when the packing format indicates AWQ.

auto_round/export/export_to_awq/export.py

test/test_cuda/integrations/test_vllm.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

xin3he · 2026-03-25T07:48:32Z

/azp run Unit-Test-CUDA-AutoRound

azure-pipelines · 2026-03-25T07:48:40Z

Azure Pipelines successfully started running 1 pipeline(s).

auto_round/export/export_to_awq/export.py

Signed-off-by: WeiweiZhang1 <weiwei1.zhang@intel.com>

xin3he · 2026-03-25T09:24:09Z

/azp run Unit-Test-CUDA-AutoRound

azure-pipelines · 2026-03-25T09:24:21Z

Azure Pipelines successfully started running 1 pipeline(s).

Signed-off-by: WeiweiZhang1 <weiwei1.zhang@intel.com>

xin3he · 2026-03-25T13:13:03Z

/azp run Unit-Test-CUDA-AutoRound

azure-pipelines · 2026-03-25T13:13:13Z

Azure Pipelines successfully started running 1 pipeline(s).

…ation

XuehaoSun · 2026-03-26T04:09:04Z

/azp run Unit-Test-CUDA-AutoRound

azure-pipelines · 2026-03-26T04:09:15Z

Azure Pipelines successfully started running 1 pipeline(s).

revert awq dtype change for vllm inference limitation

d061d62

Signed-off-by: WeiweiZhang1 <weiwei1.zhang@intel.com>

Copilot AI review requested due to automatic review settings March 25, 2026 07:33

Copilot started reviewing on behalf of WeiweiZhang1 March 25, 2026 07:34 View session

Copilot AI reviewed Mar 25, 2026

View reviewed changes

auto_round/export/export_to_awq/export.py Outdated Show resolved Hide resolved

test/test_cuda/integrations/test_vllm.py Outdated Show resolved Hide resolved

refine comments

9ed5bfb

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

xin3he approved these changes Mar 25, 2026

View reviewed changes

wenhuach21 reviewed Mar 25, 2026

View reviewed changes

auto_round/export/export_to_awq/export.py Show resolved Hide resolved

chensuyue added this to the 0.12.0 milestone Mar 25, 2026

fix dtype config field consistency issue

1e3f689

Signed-off-by: WeiweiZhang1 <weiwei1.zhang@intel.com>

refine marlin backend priority

30f0c7d

Signed-off-by: WeiweiZhang1 <weiwei1.zhang@intel.com>

Merge branch 'main' into revert_awq_dtype_change_for_vllm_infer_limit…

22b1162

…ation

WeiweiZhang1 merged commit 72d5c4d into main Mar 26, 2026
40 checks passed

WeiweiZhang1 deleted the revert_awq_dtype_change_for_vllm_infer_limitation branch March 26, 2026 05:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

revert awq dtype change for vllm inference limitation#1613

revert awq dtype change for vllm inference limitation#1613
WeiweiZhang1 merged 5 commits intomainfrom
revert_awq_dtype_change_for_vllm_infer_limitation

WeiweiZhang1 commented Mar 25, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

xin3he commented Mar 25, 2026

Uh oh!

azure-pipelines bot commented Mar 25, 2026

Uh oh!

Uh oh!

xin3he commented Mar 25, 2026

Uh oh!

azure-pipelines bot commented Mar 25, 2026

Uh oh!

xin3he commented Mar 25, 2026

Uh oh!

azure-pipelines bot commented Mar 25, 2026

Uh oh!

XuehaoSun commented Mar 26, 2026

Uh oh!

azure-pipelines bot commented Mar 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

WeiweiZhang1 commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of Change

Related Issues

Checklist Before Submitting

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

xin3he commented Mar 25, 2026

Uh oh!

azure-pipelines bot commented Mar 25, 2026

Uh oh!

Uh oh!

xin3he commented Mar 25, 2026

Uh oh!

azure-pipelines bot commented Mar 25, 2026

Uh oh!

xin3he commented Mar 25, 2026

Uh oh!

azure-pipelines bot commented Mar 25, 2026

Uh oh!

XuehaoSun commented Mar 26, 2026

Uh oh!

azure-pipelines bot commented Mar 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

WeiweiZhang1 commented Mar 25, 2026 •

edited

Loading