Skip to content

feature request: add trtllm_fp8_block_scale_routed_moe API #2381

@yzh119

Description

@yzh119

For fp4 block scaled moe, we have both trtllm_fp4_block_scale_moe and trtllm_fp4_block_scale_routed_moe:

  • trtllm_fp4_block_scale_routed_moe skips the router (top-k, etc)
  • trtllm_fp4_block_scale_moe applies the router in the kernel.

But for fp8, we only have trtllm_fp8_block_scale_moe, and trtllm_fp8_block_scale_routed_moe is missing. We should implement this API (add to documentation index) and add relevant unittests.

@claude please draft a PR.

Metadata

Metadata

Assignees

Type

No type

Projects

Status

In Progress

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions