Commit 1524251
authored
fix(svdquant): run SVD on GPU to maintain utilization (NVIDIA#633)
## What does this PR do?
**Type of change:** Bug fix
**Overview:** Fix SVD quantization running on CPU instead of GPU, which
caused jobs to be killed by the internal job scheduler due to low GPU
utilization during long-running SVD operations.
## Changes
- Keep tensor on GPU during SVD computation by explicitly specifying
device
- Add `full_matrices=False` for faster computation (only need first
`lowrank` singular vectors)
## Motivation
The original implementation ran SVD on CPU, causing:
1. Jobs killed by internal scheduler due to low GPU utilization during
SVD phase
2. Potential performance degradation from CPU-GPU data transfer overhead
## Usage
No API changes. Existing svdquant usage remains the same.
## Testing
- Tested locally with Wan2.2 SVD quantization
- Verified job no longer killed due to low GPU utilization
## Before your PR is "*Ready for review*"
- **Is this change backward compatible?**: Yes
- **Did you write any new necessary tests?**: No
- **Did you add or update any necessary documentation?**: No
- **Did you update
[Changelog](https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/main/CHANGELOG.rst)?**:
No
Signed-off-by: Taekyung Heo <[email protected]>1 parent 11728b7 commit 1524251
1 file changed
+11
-4
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1029 | 1029 | | |
1030 | 1030 | | |
1031 | 1031 | | |
1032 | | - | |
| 1032 | + | |
| 1033 | + | |
| 1034 | + | |
| 1035 | + | |
| 1036 | + | |
1033 | 1037 | | |
1034 | 1038 | | |
1035 | 1039 | | |
| |||
1039 | 1043 | | |
1040 | 1044 | | |
1041 | 1045 | | |
1042 | | - | |
1043 | | - | |
1044 | | - | |
| 1046 | + | |
| 1047 | + | |
| 1048 | + | |
| 1049 | + | |
| 1050 | + | |
| 1051 | + | |
1045 | 1052 | | |
1046 | 1053 | | |
1047 | 1054 | | |
| |||
0 commit comments