Commit cc66aa4
Run decompositions before the quantizer
Summary:
In the current flow, decompositions run in `to_edge()`, long after the quantization process is done. This creates a lot of issues, since we cannot quantize any operations contained in the large operators that the graph tracer can give (e.g. aten.scaled_dot_product_attention, aten.rnn_<tanh, relu>.input, and a few others).
Any models using those will see many fp32 operators in the final graph. Running the decomps earlier solves the problem, but we need to retain a couple operators that we do rely on in the quantizer, namely `aten.linear`, `aten.conv1d` and `aten.conv2d`.
Differential Revision: D664614061 parent a2619e1 commit cc66aa4
1 file changed
+16
-5
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
28 | 28 | | |
29 | 29 | | |
30 | 30 | | |
| 31 | + | |
31 | 32 | | |
32 | 33 | | |
33 | 34 | | |
| |||
59 | 60 | | |
60 | 61 | | |
61 | 62 | | |
62 | | - | |
63 | | - | |
64 | | - | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
65 | 76 | | |
66 | | - | |
| 77 | + | |
67 | 78 | | |
68 | 79 | | |
69 | 80 | | |
70 | | - | |
| 81 | + | |
71 | 82 | | |
72 | 83 | | |
73 | 84 | | |
| |||
0 commit comments