For fp4 block scaled moe, we have both trtllm_fp4_block_scale_moe and trtllm_fp4_block_scale_routed_moe:
trtllm_fp4_block_scale_routed_moe skips the router (top-k, etc)
trtllm_fp4_block_scale_moe applies the router in the kernel.
But for fp8, we only have trtllm_fp8_block_scale_moe, and trtllm_fp8_block_scale_routed_moe is missing. We should implement this API (add to documentation index) and add relevant unittests.
@claude please draft a PR.