Commit a11afab
Run decompositions before the quantizer (#7111)
Summary:
In the current flow, decompositions run in `to_edge()`, long after the quantization process is done. This creates a lot of issues, since we cannot quantize any operations contained in the large operators that the graph tracer can give (e.g. aten.scaled_dot_product_attention, aten.rnn_<tanh, relu>.input, and a few others).
Any models using those will see many fp32 operators in the final graph. Running the decomps earlier solves the problem, but we need to retain a couple operators that we do rely on in the quantizer, like `aten.linear`, `aten.conv1d` and `aten.conv2d`.
Reviewed By: zonglinpeng
Differential Revision: D664614061 parent 2d499b3 commit a11afab
1 file changed
+22
-4
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
28 | 28 | | |
29 | 29 | | |
30 | 30 | | |
| 31 | + | |
31 | 32 | | |
32 | 33 | | |
33 | 34 | | |
| |||
58 | 59 | | |
59 | 60 | | |
60 | 61 | | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
61 | 75 | | |
62 | | - | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
63 | 81 | | |
64 | | - | |
| 82 | + | |
65 | 83 | | |
66 | | - | |
| 84 | + | |
67 | 85 | | |
68 | 86 | | |
69 | 87 | | |
70 | | - | |
| 88 | + | |
71 | 89 | | |
72 | 90 | | |
73 | 91 | | |
| |||
0 commit comments