Don't Decompose Hardswish (pytorch#12360)

mcr229 · facebook-github-bot · commit 8c3095d0069c · 2025-07-10T13:50:26.000-07:00
Summary: Investigating MV3, I noticed that hardswish was getting decomposed into many little ops. This become annoying because it injected unnecessary transposes, and also wasn't being quantized. I didn't realize that this was being decomposed. After doing some investigating, it looks like this can greatly improve our MV3 Performance for both Quantized and FP32 models. Some benchmarks here. | | Before Hardwish Decomp | After Hardswish Decomp | Latency Reduction | |----------------|------------------------|------------------------|-------------------| | Macbook (FP32) | [13.3685]((https://www.internalfb.com/phabricator/paste/view/P1859573931)) | [8.451](https://www.internalfb.com/phabricator/paste/view/P1859573328) |36% | | Macbook (QS8) | [16.0361](https://www.internalfb.com/phabricator/paste/view/P1859609658) | [4.914](https://www.internalfb.com/phabricator/paste/view/P1859610252) |69% | | S24 (FP32) | [56.885](https://www.internalfb.com/intern/paste/P1859603500) | [41.9638](https://www.internalfb.com/intern/paste/P1859603738) |26% | | S24 (QS8) | [56.1718](https://www.internalfb.com/intern/paste/P1859615896) | [40.2096](https://www.internalfb.com/intern/paste/P1859615683/) |40% | Reviewed By: cccclai Differential Revision: D77765129
diff --git a/backends/xnnpack/partition/config/generic_node_configs.py b/backends/xnnpack/partition/config/generic_node_configs.py
@@ -399,6 +399,9 @@ class HardswishConfig(GenericNodePartitionerConfig):
 
     def supported_precision_types(self) -> List[ConfigPrecisionType]:
         return [ConfigPrecisionType.FP32]
+    
+    def get_original_aten(self) -> Optional[torch._ops.OpOverload]:
+        return torch.ops.aten.hardswish.default
 
 
 class LeakyReLUConfig(GenericNodePartitionerConfig):