feat: Add Windows support with triton-windows and PyTorch fallback by tori29umai0123 · Pull Request #237 · Tencent/AngelSlim

tori29umai0123 · 2026-02-02T10:11:17Z

Summary

Add Windows support for AngelSlim with FP8 Triton kernels using triton-windows.

Changes

New Files

angelslim/compressor/_platform.py - Platform detection and automatic backend selection
angelslim/compressor/quant/core/quant_func_torch.py - PyTorch fallback for weight quantization
angelslim/compressor/diffusion/kernels/python/quantizers/fp8_per_block_torch.py - PyTorch fallback for FP8
per-block quantization
angelslim/compressor/diffusion/kernels/python/quantizers/fp8_per_token_group_torch.py - PyTorch fallback for FP8
per-token-group quantization
angelslim/compressor/diffusion/kernels/python/gemm/fp8_gemm_torch.py - PyTorch fallback for FP8 GEMM

Modified Files

requirements/requirements.txt - Platform-specific triton dependency (triton for Linux, triton-windows for Windows)
setup.py - Version string includes CUDA and PyTorch version (e.g., 0.0.0_dev+cu128.torch2.10)
angelslim/compressor/diffusion/cache/taylorcache_helper.py - Conditional torch.compile (disabled on Windows)
angelslim/compressor/quant/core/quant_func.py - Lazy Triton import with automatic backend selection
angelslim/compressor/diffusion/quant/quant_func.py - Remove CUDA requirement for per-token-group quantization
angelslim/compressor/diffusion/kernels/python/quantizers/__init__.py - Conditional imports based on backend
angelslim/compressor/diffusion/kernels/python/gemm/__init__.py - Conditional imports based on backend
README.md - Windows build instructions

Features

Automatic Backend Selection: Automatically detects if Triton is available and functional; falls back to PyTorch
if not
Environment Variables:
- ANGELSLIM_BACKEND=pytorch|triton - Force backend selection
- ANGELSLIM_TORCH_COMPILE=0|1 - Enable/disable torch.compile
Cross-Platform: Works on both Linux (with triton) and Windows (with triton-windows or PyTorch fallback)

Test

python -c "import torch; from angelslim.compressor.diffusion.kernels.python.quantizers import
fp8_per_block_quant_triton; from angelslim.compressor.diffusion.kernels.python.gemm import fp8_gemm_triton_block;
a,b=torch.randn(128,256,device='cuda'),torch.randn(512,256,device='cuda'); aq,a_s=fp8_per_block_quant_triton(a);
bq,b_s=fp8_per_block_quant_triton(b); c=fp8_gemm_triton_block(aq,a_s,bq,b_s); print(f'FP8 GEMM OK: {c.shape},
{c.dtype}')"

Expected output:
FP8 GEMM OK: torch.Size([128, 512]), torch.bfloat16

Tested Environment

- Windows 11 + NVIDIA RTX A6000
- CUDA 12.8 + PyTorch 2.10.0 + triton-windows

When native_fp8_support=False and quant_type="fp8-per-block", 3D tensors (e.g., [batch, seq_len, hidden] from attention layers like txt_attn_q/k/v) were passed directly to fp8_per_block_quant which expects 2D input. Changes: - Reshape 3D input to 2D before calling fp8_per_block_quant - Store original shape and restore output to 3D after GEMM operation This fixes assertion errors when using FP8 quantization with transformer attention layers that produce 3D tensors. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

yghstill · 2026-02-02T11:57:17Z

README.md


 For more detailed installation instructions, please refer to the [Installation Documentation](https://angelslim.readthedocs.io/zh-cn/latest/getting_started/installation.html).

+#### Windows Installation (with FP8 Triton Support)


Please mv to docs/source/getting_started/installation.md

tori29umai0123 and others added 2 commits February 2, 2026 19:09

feat: Add Windows support with triton-windows and PyTorch fallback

3715ccf

yghstill requested review from StromNoNo and yghstill February 2, 2026 11:39

yghstill reviewed Feb 2, 2026

View reviewed changes

StromNoNo approved these changes Feb 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add Windows support with triton-windows and PyTorch fallback#237

feat: Add Windows support with triton-windows and PyTorch fallback#237
tori29umai0123 wants to merge 2 commits intoTencent:mainfrom
tori29umai0123:feature/windows-support

tori29umai0123 commented Feb 2, 2026

Uh oh!

yghstill Feb 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants


		For more detailed installation instructions, please refer to the [Installation Documentation](https://angelslim.readthedocs.io/zh-cn/latest/getting_started/installation.html).

		#### Windows Installation (with FP8 Triton Support)

Conversation

tori29umai0123 commented Feb 2, 2026

Summary

Changes

New Files

Modified Files

Features

Test

Uh oh!

yghstill Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants