You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- temporarily force `use_neural_accelerators = 1` in `ccv_nnc_conv_mps.m`;
175
175
- run `./mpsdnn.tests "mfa conv3d"` from `test/int/nnc`;
176
176
- revert the force after validation so production code uses `ccv_nnc_mfa_has_neural_accelerators(context)`.
177
+
-`NAInt8Attention` backward `dS` fallback note:
178
+
- Earlier exploration suggested `dS -> half` might be a fallback worth keeping in mind, but on the current shipped `D=128` fixed-quant setup it is not a win.
179
+
- Rechecked on `4096 x 4096 x 128` with the current selector:
180
+
- fixed-quant `dS`: forward median `4.0495 ms`, backward median `21.8308 ms`, ratio `5.3910x`
181
+
-`dS -> half`: forward median `4.0552 ms`, backward median `23.0083 ms`, ratio `5.6737x`
182
+
- Takeaway:
183
+
- on the current `NAInt8Attention` backward path, `dS -> half` regresses relative to fixed-quant `dS`
184
+
- do not treat it as the preferred fallback without reworking the kernel again
- Trust the backward absolute times more than any single reported ratio; forward medians on the probe can move enough to make one-off ratios look too optimistic.
190
+
- Reliable current probe numbers are in this range:
191
+
-`4096 x 4096 x 128`: backward median about `21-23 ms`, typically around `5.2x-5.6x`
192
+
-`8192 x 8192 x 128`: backward median about `82-87 ms`, typically around `5.2x-5.4x`
193
+
- Wider key/value traversal (`blockC=96`) can benchmark slightly faster in the probe but is not accuracy-safe on the real gradient test surface; keep `blockC=64` in production.
0 commit comments