Skip to content

feat(collector): nvfp4 MoE and GEMM support (Blackwell SM100+)#534

Draft
ilyasher wants to merge 2 commits intodev-isherstyuk-add-collector-argsfrom
dev-isherstyuk-vllm-nvfp4
Draft

feat(collector): nvfp4 MoE and GEMM support (Blackwell SM100+)#534
ilyasher wants to merge 2 commits intodev-isherstyuk-add-collector-argsfrom
dev-isherstyuk-vllm-nvfp4

Conversation

@ilyasher
Copy link
Contributor

@ilyasher ilyasher commented Mar 6, 2026

  • collect_gemm: add nvfp4 GEMM via CompressedTensorsConfig (nvfp4-pack- quantized, group_size=16); quantize weights with scaled_fp4_quant; require n,k divisible by 16
  • collect_moe: add nvfp4 MoE via flashinfer_cutlass_fused_moe (fused_experts does not support nvfp4); quantize per-expert weights with swizzle_blockscale; require hidden_size and local_inter_size divisible by 16; EP > 1 not supported
  • collect_moe: use float32 topk_weights for nvfp4 (kernel requirement); use power_law_topk for routing generation

Overview:

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

  • closes GitHub issue: #xxx

@copy-pr-bot
Copy link

copy-pr-bot bot commented Mar 6, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions github-actions bot added the feat label Mar 6, 2026
- collect_gemm: add nvfp4 GEMM via CompressedTensorsConfig (nvfp4-pack-
  quantized, group_size=16); quantize weights with scaled_fp4_quant;
  require n,k divisible by 16
- collect_moe: add nvfp4 MoE via flashinfer_cutlass_fused_moe (fused_experts
  does not support nvfp4); quantize per-expert weights with swizzle_blockscale;
  require hidden_size and local_inter_size divisible by 16; EP > 1 not supported
- collect_moe: use float32 topk_weights for nvfp4 (kernel requirement);
  use power_law_topk for routing generation

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@ilyasher ilyasher force-pushed the dev-isherstyuk-vllm-nvfp4 branch from e3d0271 to 17061bb Compare March 6, 2026 21:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant