Skip to content

Commit fb610ae

Browse files
bnellnmhmellor
andauthored
[Docs] Add moe kernel features doc (#25297)
Signed-off-by: Bill Nell <[email protected]> Signed-off-by: bnellnm <[email protected]> Co-authored-by: Harry Mellor <[email protected]>
1 parent 2f652e6 commit fb610ae

File tree

2 files changed

+121
-24
lines changed

2 files changed

+121
-24
lines changed

docs/design/fused_moe_modular_kernel.md

Lines changed: 2 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -242,30 +242,8 @@ Example: `python3 -m tests.kernels.moe.modular_kernel_tools.profile_modular_kern
242242

243243
## FusedMoEPrepareAndFinalize Implementations
244244

245-
The following table lists the `FusedMoEPrepareAndFinalize` implementations at the time of writing,
246-
247-
| Implementation | Type | Comments |
248-
| :--- | :--- | :--- |
249-
| DeepEPHTPrepareAndFinalize | Contiguous / Non-Batched | Uses the DeepEP High-Throughput all2all kernels. |
250-
| DeepEPLLPrepareAndFinalize | Batched | Uses the DeepEP Low-Latency all2all kernels. |
251-
| PplxPrepareAndFinalize | Batched | Uses the Perplexity all2all kernels. |
252-
| FlashInferCutlassMoEPrepareAndFinalize | Contiguous | |
253-
| MoEPrepareAndFinalizeNoEP | Contiguous | This implementation is used when there is no EP. i.e. no all2all kernels are invoked. |
254-
| BatchedPrepareAndFinalize | Batched | A reference prepare/finalize class that reorganizes the tokens into expert batched format, i.e. E x max_num_tokens x K. (Doesn’t use any all2all kernels. This is primarily used in unit testing) |
245+
See [Fused MoE Kernel features](./moe_kernel_features.md#fused-moe-modular-all2all-backends) for a list of all the available modular prepare and finalize subclasses.
255246

256247
## FusedMoEPermuteExpertsUnpermute
257248

258-
The following table lists the `FusedMoEPermuteExpertsUnpermute` implementations at the time of writing,
259-
260-
| Implementation | Type | Comment |
261-
| :--- | :--- | :--- |
262-
| BatchedDeepGemmExperts | Batched | Uses the DeepGemm’s Masked Grouped Gemm kernels for the fused_moe operation. |
263-
| BatchedTritonExperts | Batched | Uses a Triton Kernel for the Batched matmuls. |
264-
| BatchedTritonOrDeepGemmExperts | Batched | Chooses either the `BatchedDeepGemmExperts` or `BatchedTritonExperts` based on environment settings. |
265-
| DeepGemmExperts | Contiguous / Non-Batched | Uses DeepGemm’s Grouped Gemm kernels for fused_moe operation. |
266-
| TritonExperts | Contiguous / Non-Batched | Uses a Triton Kernel for fused_moe matmuls. |
267-
| TritonOrDeepGemmExperts | Contiguous / Non-Batched | Chooses either the `DeepGemmExperts` or `TritonExperts` based on fused_moe inputs. |
268-
| CutlassExpertsFP8 | Supports both Batched and Contiguous formats | Uses Cutlass Grouped Gemm implementations for the fp8 matmuls. |
269-
| CutlassExpertsFP4 | Supports both Batched and Contiguous formats | Uses Cutlass Grouped Gemm implementations for the fp4 matmuls. |
270-
| FlashInferExperts | Contiguous | Uses fused_moe operation from FlashInfer |
271-
| NaiveBatchedExperts | Batched | Reference Batched Experts implementation. Primarily used in unit tests. |
249+
See [Fused MoE Kernel features](./moe_kernel_features.md#fused-moe-experts-kernels) for a list of all the available modular experts.

0 commit comments

Comments
 (0)