Skip to content

Commit 1b83168

Browse files
authored
Refactor Fused Moe Module (#1309)
1 parent ece1689 commit 1b83168

25 files changed

+2732
-2071
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@ compile_commands.json
77

88
# Generated files
99
csrc/generated/
10+
csrc/nv_internal/tensorrt_llm/cutlass_instantiations/
1011
docs/generated/
1112
flashinfer/_build_meta.py
1213
flashinfer/data/

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -88,7 +88,7 @@ python -m flashinfer.aot
8888
# Build AOT wheel
8989
python -m build --no-isolation --wheel
9090
# Install AOT wheel
91-
python -m pip install dist/flashinfer-*.whl
91+
python -m pip install dist/flashinfer_*.whl
9292
```
9393

9494
For more details, refer to the [Install from Source documentation](https://docs.flashinfer.ai/installation.html#install-from-source).

csrc/nv_internal/cpp/kernels/quantization.cu

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -363,5 +363,7 @@ template void invokeBatchedFP4Quantization<__nv_fp8_e4m3, 32>(
363363
int32_t* SFOuput, bool useUE8M0, int multiProcessorCount, cudaStream_t stream);
364364
#endif
365365

366+
////////////////////////////////////////////////////////////////////////////////////////////////////
367+
366368
} // namespace kernels
367369
} // namespace tensorrt_llm

0 commit comments

Comments
 (0)