Commit 8c3095d
Don't Decompose Hardswish (pytorch#12360)
Summary:
Investigating MV3, I noticed that hardswish was getting decomposed into many little ops. This become annoying because it injected unnecessary transposes, and also wasn't being quantized. I didn't realize that this was being decomposed. After doing some investigating, it looks like this can greatly improve our MV3 Performance for both Quantized and FP32 models. Some benchmarks here.
| | Before Hardwish Decomp | After Hardswish Decomp | Latency Reduction |
|----------------|------------------------|------------------------|-------------------|
| Macbook (FP32) | [13.3685]((https://www.internalfb.com/phabricator/paste/view/P1859573931)) | [8.451](https://www.internalfb.com/phabricator/paste/view/P1859573328) |36% |
| Macbook (QS8) | [16.0361](https://www.internalfb.com/phabricator/paste/view/P1859609658) | [4.914](https://www.internalfb.com/phabricator/paste/view/P1859610252) |69% |
| S24 (FP32) | [56.885](https://www.internalfb.com/intern/paste/P1859603500) | [41.9638](https://www.internalfb.com/intern/paste/P1859603738) |26% |
| S24 (QS8) | [56.1718](https://www.internalfb.com/intern/paste/P1859615896) | [40.2096](https://www.internalfb.com/intern/paste/P1859615683/) |40% |
Reviewed By: cccclai
Differential Revision: D777651291 parent 378f062 commit 8c3095d
1 file changed
+3
-0
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
399 | 399 | | |
400 | 400 | | |
401 | 401 | | |
| 402 | + | |
| 403 | + | |
| 404 | + | |
402 | 405 | | |
403 | 406 | | |
404 | 407 | | |
| |||
0 commit comments