Skip to content

Commit 6a38342

Browse files
noemotiovonTianHao324Tcc0403
authored
feat(NPU): add UB Manager for auto tiling strategy management (#987)
### Background and Motivation When developing Ascend NPU operators, we frequently encounter compilation failures caused by UB (Unified Buffer) overflow. During compilation, Triton kernels check UB usage, and if it exceeds the capacity, an error is raised: `MLIRCompilationError: ub overflow`. To address this issue, developers usually have to: - Manually adjust block sizes - Tune separately for different NPU models and input sizes - Repeatedly trade off between performance and UB safety This process is tedious, error-prone, and difficult to cover all scenarios. ------ ### Solution We implemented a **UB Manager** that provides: - **Automatic UB capacity detection**: retrieved from device properties or environment variables - **Best-practice-based tiling strategies**: dynamically compute safe block sizes based on UB capacity and operator parameters - **Unified strategy registration system**: supports both fixed and conditional strategies, making it easy to extend - **Integration with GEGLU and ROPE**: automatically handles UB constraints and prevents overflow ------ ### Core Features #### Automatic Capacity Detection - Supports three sources: environment variables, device properties, and model default values - Supports multiple Ascend models such as Ascend 910B1 / 910B4 #### Dynamic Strategy Computation - **GEGLU**: computes a safe block size based on `n_cols` and `dtype_size` - **ROPE**: computes `BLOCK_Q` and `BLOCK_K` based on `pad_n_q_head`, `pad_n_kv_head`, and `pad_hd` - Uses an **80% safety margin** to balance performance and safety #### Easy Extensibility - Simple interface: adding a new kernel strategy only requires registering a function - Supports parameterized strategies to adapt to different input sizes ------ ### Implementation Details - Added `ub_manager.py`: the core UB management class - Updated `geglu.py` and `rope.py`: integrated UB-aware implementations ------ ### Testing Verified on Ascend NPU 910B4: - GEGLU forward and backward pass tests - ROPE forward and backward pass tests - Works correctly across different input sizes and data types - Hardware Type: <Ascend910B4> - [x] run `make test` to ensure correctness - [x] run `make checkstyle` to ensure code style - [ ] run `make test-convergence` to ensure convergence --------- Co-authored-by: TianHao324 <[email protected]> Co-authored-by: Tcc0403 <[email protected]>
1 parent 6b3d761 commit 6a38342

File tree

5 files changed

+1384
-0
lines changed

5 files changed

+1384
-0
lines changed

0 commit comments

Comments
 (0)