Commit 6a38342
feat(NPU): add UB Manager for auto tiling strategy management (#987)
### Background and Motivation
When developing Ascend NPU operators, we frequently encounter
compilation failures caused by UB (Unified Buffer) overflow. During
compilation, Triton kernels check UB usage, and if it exceeds the
capacity, an error is raised:
`MLIRCompilationError: ub overflow`.
To address this issue, developers usually have to:
- Manually adjust block sizes
- Tune separately for different NPU models and input sizes
- Repeatedly trade off between performance and UB safety
This process is tedious, error-prone, and difficult to cover all
scenarios.
------
### Solution
We implemented a **UB Manager** that provides:
- **Automatic UB capacity detection**: retrieved from device properties
or environment variables
- **Best-practice-based tiling strategies**: dynamically compute safe
block sizes based on UB capacity and operator parameters
- **Unified strategy registration system**: supports both fixed and
conditional strategies, making it easy to extend
- **Integration with GEGLU and ROPE**: automatically handles UB
constraints and prevents overflow
------
### Core Features
#### Automatic Capacity Detection
- Supports three sources: environment variables, device properties, and
model default values
- Supports multiple Ascend models such as Ascend 910B1 / 910B4
#### Dynamic Strategy Computation
- **GEGLU**: computes a safe block size based on `n_cols` and
`dtype_size`
- **ROPE**: computes `BLOCK_Q` and `BLOCK_K` based on `pad_n_q_head`,
`pad_n_kv_head`, and `pad_hd`
- Uses an **80% safety margin** to balance performance and safety
#### Easy Extensibility
- Simple interface: adding a new kernel strategy only requires
registering a function
- Supports parameterized strategies to adapt to different input sizes
------
### Implementation Details
- Added `ub_manager.py`: the core UB management class
- Updated `geglu.py` and `rope.py`: integrated UB-aware implementations
------
### Testing
Verified on Ascend NPU 910B4:
- GEGLU forward and backward pass tests
- ROPE forward and backward pass tests
- Works correctly across different input sizes and data types
- Hardware Type: <Ascend910B4>
- [x] run `make test` to ensure correctness
- [x] run `make checkstyle` to ensure code style
- [ ] run `make test-convergence` to ensure convergence
---------
Co-authored-by: TianHao324 <[email protected]>
Co-authored-by: Tcc0403 <[email protected]>1 parent 6b3d761 commit 6a38342
File tree
5 files changed
+1384
-0
lines changed- src/liger_kernel/ops/backends/_ascend
- ops
5 files changed
+1384
-0
lines changed
0 commit comments