You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[SW-235047] use w8a8 path for per_channel for performance regression fixing (#1629)
https://jira.habana-labs.com/browse/SW-235047
## Essential Elements of an Effective PR Description Checklist
- [X] The purpose of the PR, such as "Fix some issue (link existing
issues this PR will resolve)".
- [ ] The test plan, such as providing test command.
- [ ] The test results, such as pasting the results comparison before
and after, or e2e results
## Purpose
Previous, any HPU fp8 linear will go `hpu_ops.apply_fp8_linear_hpu`,
which is not necessary for per_channel scaling since W8A8 also supports
HPU; which introduced a performance regression for WOQ model.
This PR is to skip per_channel scaling fp8 support in
`hpu_ops.apply_fp8_linear_hpu`
## Test Plan
## Test Result
<!--- pyml disable-next-line no-emphasis-as-heading -->
Signed-off-by: Chendi.Xue <[email protected]>
0 commit comments