-
Notifications
You must be signed in to change notification settings - Fork 690
add quantized fast_hadamard_transform_28N #5184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
update preproc
Differential Revision: D62366679 Pull Request resolved: #5172
Differential Revision: D62366751 Pull Request resolved: #5177
Differential Revision: D61134262 Pull Request resolved: #4721
Differential Revision: D62266927 Pull Request resolved: #5133
Differential Revision: D62167345 Pull Request resolved: #5061
Differential Revision: D62151658 Pull Request resolved: #5122
Differential Revision: [D60943029](https://our.internmc.facebook.com/intern/diff/D60943029/) [ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/5184
Note: Links to docs will display an error until the docs builds have been completed. This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This pull request was exported from Phabricator. Differential Revision: D60943029 |
Differential Revision: D59984028 Pull Request resolved: #4332
…#5072) * Qualcomm AI Engine Direct - Optimization and fix mutable buffer issue Summary: - Add a pass to convert linear to conv2d: We found the accuracy drop because of QNN Linear op in llama3. And it will be fixed with convert linear to conv2d pass. - Workaround the issue about mutable buffer for index_put op: We add a pass to replace the input of index_put op. Under the workaround, it will result in performance regression. - Insert copy op for int64 inputs to convert int64 to int32 in i64toi32 pass - Support QNN RMS Norm and use native rms norm in llama_transformer - Add a pass to compose rms norm * Use transform to replace rms_norm * temporarily remove test-llama-runner-qnn-linux --------- Co-authored-by: Sheng Feng Wu <[email protected]>
Differential Revision: [D60943029](https://our.internmc.facebook.com/intern/diff/D60943029/) [ghstack-poisoned]
Differential Revision: [D60943029](https://our.internmc.facebook.com/intern/diff/D60943029/) [ghstack-poisoned]
This pull request was exported from Phabricator. Differential Revision: D60943029 |
Differential Revision: D62206605 Pull Request resolved: #5191
We added API in Java. Need to register in JNI as well Pull Request resolved: #5201
Differential Revision: D62394222 Pull Request resolved: #5180
Differential Revision: [D60943029](https://our.internmc.facebook.com/intern/diff/D60943029/) [ghstack-poisoned]
Differential Revision: [D60943029](https://our.internmc.facebook.com/intern/diff/D60943029/) [ghstack-poisoned]
Co-authored-by: Guang Yang <[email protected]>
Differential Revision: D62458651 Pull Request resolved: #5205
realpath works differently on MacOS Change-Id: I17e114cd289692aa6de8a5b4e6f29fc1734aca08
Differential Revision: D62458604 Pull Request resolved: #5231
…haders + massive code cleanup Differential Revision: D62444923 Pull Request resolved: #5223
Summary: We specify a model dir, not model path. It's easier to update test spec Pull Request resolved: #5250 Reviewed By: huydhn Differential Revision: D62473641 Pulled By: kirklandsign fbshipit-source-id: 40864831de9960fe29b101683ef7182e2f56fe7b
Summary: Pull Request resolved: #5235 Reviewed By: cmodi-meta, shoumikhin Differential Revision: D62468267 Pulled By: kirklandsign fbshipit-source-id: d64f28cb7c6c97853bbb557af63c1f6937b3626d
Summary: Pull Request resolved: #5241 - Add default system prompt - Set temperature to 0 - Load model directly upon click Reviewed By: cmodi-meta, kirklandsign Differential Revision: D62472502 fbshipit-source-id: 8ecc88ee4474afa50658e93955c49ff0f3eef745
Summary: Currently when a non-core ATen operator shows up in the exported graph, `to_edge()` will fail and the only option is to disable IR validity check by setting `_check_ir_validity=False`. However this is unsafe to do, instead we should still run the rest of the checks. This PR adds support to allow users to bypass core ATen ops check, by passing a list of non-core ATen ops into `to_edge()`. Note that: * This is different than `ops_set_to_not_decompose` in `to_edge_transform_and_lower`, as the ops in `_core_aten_ops_exception_list` are not intended to be kept but more likely showing up because of missing decompositions or missing core ATen tag in `native_functions.yaml`. For this reason, we are combining two lists (`ops_set_to_not_decompose` and `_core_aten_ops_exception_list`) and pass to verifier. * I updated the error log to encourage people to use `_core_aten_ops_exception_list` instead of using `_check_ir_validity=False`. Pull Request resolved: #5237 Test Plan: Added unit test Reviewed By: tarun292 Differential Revision: D62469015 Pulled By: larryliu0820 fbshipit-source-id: 1abb1b4fbbfdf3eb5e64e82e2035c7f93cf5b153
Summary: Pull Request resolved: #5253 Reviewed By: shoumikhin Differential Revision: D62474497 Pulled By: kirklandsign fbshipit-source-id: 408cd0340dce706b758097bfd6f9606bfe506460
Summary: Pull Request resolved: #5125 This PR adds the option to export the model with spin quant on gpu. Reviewed By: mergennachin Differential Revision: D62042861 fbshipit-source-id: 74274fcb3408e5f6b23e0c924272385090da03d2
Summary: The upload should not be all or nothing ([example flow](https://github.com/pytorch/executorch/actions/runs/10783442883)). It should upload exported models to S3 if there is at least one artifact. Pull Request resolved: #5232 Test Plan: - Android: https://github.com/pytorch/executorch/actions/runs/10800212616 - iOS: https://github.com/pytorch/executorch/actions/runs/10799346884 Reviewed By: huydhn Differential Revision: D62459630 Pulled By: guangy10 fbshipit-source-id: cbf6c1c9e030089096d126b91ec10a936030e15b
Summary: Add execution scripts and runner for 8 OSS models Pull Request resolved: #5217 Reviewed By: kirklandsign Differential Revision: D62479707 Pulled By: cccclai fbshipit-source-id: 81310dbb6b785ec59329110ebacb8208102e8597
Summary: Pull Request resolved: #5142 Intermediate debugging in delegate doesn't work without also doing intermediate latency profiling in delegates. This diff is to fix this issue. It's currently blocking modai and htp side of work. Reviewed By: Jack-Khuu Differential Revision: D60947913 fbshipit-source-id: 78cb252dc4f0088c2af3a27f467f8cb6182cc785
Summary: Pull Request resolved: #5242 No immediate need for this, but it is extremely simple to implement so why not support it? ghstack-source-id: 241919004 exported-using-ghexport Reviewed By: kimishpatel Differential Revision: D62151659 fbshipit-source-id: 7cb5850981ad0666a304e7917d407847037ffa2d
Summary: Pull Request resolved: #5243 If we happen to be running without a delegate, directly implementing linear is much more efficient than permute_copy_out (materialize a transpose) followed by matmul. ghstack-source-id: 241918986 exported-using-ghexport Reviewed By: kimishpatel Differential Revision: D62154007 fbshipit-source-id: 7b764cf9de616729541f081a51384ba8e18e72f5
Summary: Pull Request resolved: #5260 overriding_review_checks_triggers_an_audit_and_retroactive_review Oncall Short Name: executorch Differential Revision: D62484498 fbshipit-source-id: 345fcd365d25beb1e2ae713cca9bea36e1db04d2
Summary: ## Motivation Short term: TorchAO int4 quantization yields float zero point, but CoreML does not have good support for it yet. We will need CoreML int4 quantization for now. Intermediate term: Before torch implements all CoreML-supported quantizations (e.g. palettization, sparcification, joint compression...), it will be great to have a way to use/experiment those CoreML quantizations. ## Solution In CoreML preprocess, we add CoreML quantization config as a compile spec Pull Request resolved: #5228 Reviewed By: kirklandsign Differential Revision: D62468184 Pulled By: cccclai fbshipit-source-id: 9f4987d19a01eaf5e2814c9ff8089324174644f8
Summary: Pull Request resolved: #5266 Reviewed By: kirklandsign Differential Revision: D62501925 fbshipit-source-id: 790ca389887bb3921fe13d92dbc61e804cfe0c19
Differential Revision: [D60943029](https://our.internmc.facebook.com/intern/diff/D60943029/) [ghstack-poisoned]
Differential Revision: [D60943029](https://our.internmc.facebook.com/intern/diff/D60943029/) [ghstack-poisoned]
This pull request was exported from Phabricator. Differential Revision: D60943029 |
re-exporting this stack because it's been bitten by a tooling issue |
Pull Request resolved: pytorch/executorch#5184 ghstack-source-id: 241722289 @exported-using-ghexport Differential Revision: [D60943029](https://our.internmc.facebook.com/intern/diff/D60943029/)
Stack from ghstack (oldest at bottom):
Differential Revision: D60943029