add quantized fast_hadamard_transform_28N #5184

swolchok · 2024-09-09T20:12:34Z

Stack from ghstack (oldest at bottom):

Differential Revision: D60943029

update preproc

Differential Revision: D62366679 Pull Request resolved: #5172

Differential Revision: D62366751 Pull Request resolved: #5177

Differential Revision: D61134262 Pull Request resolved: #4721

Differential Revision: D62266927 Pull Request resolved: #5133

Differential Revision: D62167345 Pull Request resolved: #5061

Differential Revision: D62151658 Pull Request resolved: #5122

Differential Revision: [D60943029](https://our.internmc.facebook.com/intern/diff/D60943029/) [ghstack-poisoned]

pytorch-bot · 2024-09-09T20:12:37Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/5184

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

facebook-github-bot · 2024-09-09T20:13:02Z

This pull request was exported from Phabricator. Differential Revision: D60943029

Differential Revision: D59984028 Pull Request resolved: #4332

…#5072) * Qualcomm AI Engine Direct - Optimization and fix mutable buffer issue Summary: - Add a pass to convert linear to conv2d: We found the accuracy drop because of QNN Linear op in llama3. And it will be fixed with convert linear to conv2d pass. - Workaround the issue about mutable buffer for index_put op: We add a pass to replace the input of index_put op. Under the workaround, it will result in performance regression. - Insert copy op for int64 inputs to convert int64 to int32 in i64toi32 pass - Support QNN RMS Norm and use native rms norm in llama_transformer - Add a pass to compose rms norm * Use transform to replace rms_norm * temporarily remove test-llama-runner-qnn-linux --------- Co-authored-by: Sheng Feng Wu <[email protected]>

Differential Revision: [D60943029](https://our.internmc.facebook.com/intern/diff/D60943029/) [ghstack-poisoned]

facebook-github-bot · 2024-09-09T22:25:07Z

This pull request was exported from Phabricator. Differential Revision: D60943029

Differential Revision: D62206605 Pull Request resolved: #5191

We added API in Java. Need to register in JNI as well Pull Request resolved: #5201

Differential Revision: D62394222 Pull Request resolved: #5180

Differential Revision: [D60943029](https://our.internmc.facebook.com/intern/diff/D60943029/) [ghstack-poisoned]

Co-authored-by: Guang Yang <[email protected]>

Differential Revision: D62458651 Pull Request resolved: #5205

realpath works differently on MacOS Change-Id: I17e114cd289692aa6de8a5b4e6f29fc1734aca08

Differential Revision: D62458604 Pull Request resolved: #5231

…haders + massive code cleanup Differential Revision: D62444923 Pull Request resolved: #5223

Summary: We specify a model dir, not model path. It's easier to update test spec Pull Request resolved: #5250 Reviewed By: huydhn Differential Revision: D62473641 Pulled By: kirklandsign fbshipit-source-id: 40864831de9960fe29b101683ef7182e2f56fe7b

Summary: Pull Request resolved: #5235 Reviewed By: cmodi-meta, shoumikhin Differential Revision: D62468267 Pulled By: kirklandsign fbshipit-source-id: d64f28cb7c6c97853bbb557af63c1f6937b3626d

Summary: Pull Request resolved: #5241 - Add default system prompt - Set temperature to 0 - Load model directly upon click Reviewed By: cmodi-meta, kirklandsign Differential Revision: D62472502 fbshipit-source-id: 8ecc88ee4474afa50658e93955c49ff0f3eef745

Summary: Currently when a non-core ATen operator shows up in the exported graph, `to_edge()` will fail and the only option is to disable IR validity check by setting `_check_ir_validity=False`. However this is unsafe to do, instead we should still run the rest of the checks. This PR adds support to allow users to bypass core ATen ops check, by passing a list of non-core ATen ops into `to_edge()`. Note that: * This is different than `ops_set_to_not_decompose` in `to_edge_transform_and_lower`, as the ops in `_core_aten_ops_exception_list` are not intended to be kept but more likely showing up because of missing decompositions or missing core ATen tag in `native_functions.yaml`. For this reason, we are combining two lists (`ops_set_to_not_decompose` and `_core_aten_ops_exception_list`) and pass to verifier. * I updated the error log to encourage people to use `_core_aten_ops_exception_list` instead of using `_check_ir_validity=False`. Pull Request resolved: #5237 Test Plan: Added unit test Reviewed By: tarun292 Differential Revision: D62469015 Pulled By: larryliu0820 fbshipit-source-id: 1abb1b4fbbfdf3eb5e64e82e2035c7f93cf5b153

Summary: Pull Request resolved: #5253 Reviewed By: shoumikhin Differential Revision: D62474497 Pulled By: kirklandsign fbshipit-source-id: 408cd0340dce706b758097bfd6f9606bfe506460

Summary: Pull Request resolved: #5125 This PR adds the option to export the model with spin quant on gpu. Reviewed By: mergennachin Differential Revision: D62042861 fbshipit-source-id: 74274fcb3408e5f6b23e0c924272385090da03d2

Summary: The upload should not be all or nothing ([example flow](https://github.com/pytorch/executorch/actions/runs/10783442883)). It should upload exported models to S3 if there is at least one artifact. Pull Request resolved: #5232 Test Plan: - Android: https://github.com/pytorch/executorch/actions/runs/10800212616 - iOS: https://github.com/pytorch/executorch/actions/runs/10799346884 Reviewed By: huydhn Differential Revision: D62459630 Pulled By: guangy10 fbshipit-source-id: cbf6c1c9e030089096d126b91ec10a936030e15b

Summary: Add execution scripts and runner for 8 OSS models Pull Request resolved: #5217 Reviewed By: kirklandsign Differential Revision: D62479707 Pulled By: cccclai fbshipit-source-id: 81310dbb6b785ec59329110ebacb8208102e8597

Summary: Pull Request resolved: #5142 Intermediate debugging in delegate doesn't work without also doing intermediate latency profiling in delegates. This diff is to fix this issue. It's currently blocking modai and htp side of work. Reviewed By: Jack-Khuu Differential Revision: D60947913 fbshipit-source-id: 78cb252dc4f0088c2af3a27f467f8cb6182cc785

Summary: Pull Request resolved: #5242 No immediate need for this, but it is extremely simple to implement so why not support it? ghstack-source-id: 241919004 exported-using-ghexport Reviewed By: kimishpatel Differential Revision: D62151659 fbshipit-source-id: 7cb5850981ad0666a304e7917d407847037ffa2d

Summary: Pull Request resolved: #5243 If we happen to be running without a delegate, directly implementing linear is much more efficient than permute_copy_out (materialize a transpose) followed by matmul. ghstack-source-id: 241918986 exported-using-ghexport Reviewed By: kimishpatel Differential Revision: D62154007 fbshipit-source-id: 7b764cf9de616729541f081a51384ba8e18e72f5

Summary: Pull Request resolved: #5260 overriding_review_checks_triggers_an_audit_and_retroactive_review Oncall Short Name: executorch Differential Revision: D62484498 fbshipit-source-id: 345fcd365d25beb1e2ae713cca9bea36e1db04d2

Summary: ## Motivation Short term: TorchAO int4 quantization yields float zero point, but CoreML does not have good support for it yet. We will need CoreML int4 quantization for now. Intermediate term: Before torch implements all CoreML-supported quantizations (e.g. palettization, sparcification, joint compression...), it will be great to have a way to use/experiment those CoreML quantizations. ## Solution In CoreML preprocess, we add CoreML quantization config as a compile spec Pull Request resolved: #5228 Reviewed By: kirklandsign Differential Revision: D62468184 Pulled By: cccclai fbshipit-source-id: 9f4987d19a01eaf5e2814c9ff8089324174644f8

…5261) Summary: Pull Request resolved: #5261 . Reviewed By: kirklandsign Differential Revision: D62486240 fbshipit-source-id: 1c89db9ed2b31d85ffa68320348f00bc297686f8

Summary: Pull Request resolved: #5266 Reviewed By: kirklandsign Differential Revision: D62501925 fbshipit-source-id: 790ca389887bb3921fe13d92dbc61e804cfe0c19

Differential Revision: [D60943029](https://our.internmc.facebook.com/intern/diff/D60943029/) [ghstack-poisoned]

facebook-github-bot · 2024-09-11T21:30:22Z

This pull request was exported from Phabricator. Differential Revision: D60943029

swolchok · 2024-09-11T23:14:33Z

re-exporting this stack because it's been bitten by a tooling issue

Pull Request resolved: pytorch/executorch#5184 ghstack-source-id: 241722289 @exported-using-ghexport Differential Revision: [D60943029](https://our.internmc.facebook.com/intern/diff/D60943029/)

lucylq and others added 8 commits September 9, 2024 10:29

[flamingo] Update preproc imports (#5160)

0f4caa1

update preproc

Refactor namespace usage in module tests.

2dee34e

Differential Revision: D62366679 Pull Request resolved: #5172

Add an overload to skip dtype and sizes.

647bfd4

Differential Revision: D62366751 Pull Request resolved: #5177

Enable Llama3 Multi-turn conversation

b52d4b6

Differential Revision: D61134262 Pull Request resolved: #4721

Make convert to linear an export pass

cd9d536

Differential Revision: D62266927 Pull Request resolved: #5133

Hide and simplify operator registry internals

b69ae0c

Differential Revision: D62167345 Pull Request resolved: #5061

[ExecuTorch] Support BFloat16 in CPUBlas gemm

6b1e328

Differential Revision: D62151658 Pull Request resolved: #5122

add quantized fast_hadamard_transform_28N

80f2a3d

Differential Revision: [D60943029](https://our.internmc.facebook.com/intern/diff/D60943029/) [ghstack-poisoned]

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 9, 2024

facebook-github-bot added the fb-exported label Sep 9, 2024

Gasoonjia and others added 4 commits September 9, 2024 14:13

q to s start ops | add dim order sanity check

eca9ed5

Differential Revision: D59984028 Pull Request resolved: #4332

Update base for Update on "add quantized fast_hadamard_transform_28N"

68c6dbc

Differential Revision: [D60943029](https://our.internmc.facebook.com/intern/diff/D60943029/) [ghstack-poisoned]

Update on "add quantized fast_hadamard_transform_28N"

d43f979

Differential Revision: [D60943029](https://our.internmc.facebook.com/intern/diff/D60943029/) [ghstack-poisoned]

swolchok changed the base branch from gh/swolchok/38/base to gh/swolchok/37/head September 9, 2024 22:30

LeeOHzzZ and others added 5 commits September 9, 2024 15:37

Add a target rule for ops_registrations (#5083)

d2014e3

Differential Revision: D62206605 Pull Request resolved: #5191

Register LLM prefill native method in JNI

b23ee01

We added API in Java. Need to register in JNI as well Pull Request resolved: #5201

Clean up devtools/etdump

28beeff

Differential Revision: D62394222 Pull Request resolved: #5180

Update base for Update on "add quantized fast_hadamard_transform_28N"

d0affe8

Differential Revision: [D60943029](https://our.internmc.facebook.com/intern/diff/D60943029/) [ghstack-poisoned]

Update on "add quantized fast_hadamard_transform_28N"

e10fa9f

Differential Revision: [D60943029](https://our.internmc.facebook.com/intern/diff/D60943029/) [ghstack-poisoned]

guangy10 and others added 22 commits September 10, 2024 13:31

Fix models in benchinfra (#5226)

ced40f4

Co-authored-by: Guang Yang <[email protected]>

App side change

e245590

Differential Revision: D62458651 Pull Request resolved: #5205

Minor fix: Create root dir when it doesn't exist. (#5075)

4cce620

realpath works differently on MacOS Change-Id: I17e114cd289692aa6de8a5b4e6f29fc1734aca08

Fix internal executorch_llama_jni

ab6d91c

Differential Revision: D62458604 Pull Request resolved: #5231

Update setup-with-qnn.sh with runner util flag (#5210)

f07e4d5

[ET-VK] Integrate axis mapping into optimized matrix multiplication s…

cac2c05

…haders + massive code cleanup Differential Revision: D62444923 Pull Request resolved: #5223

fbshipit-source-id: f63634ba171da01328849d84552b125b829403e8

cba5bee

Update setup.sh for LlamaDemo (#5235)

e4d72ce

Summary: Pull Request resolved: #5235 Reviewed By: cmodi-meta, shoumikhin Differential Revision: D62468267 Pulled By: kirklandsign fbshipit-source-id: d64f28cb7c6c97853bbb557af63c1f6937b3626d

link whole quantized_ops_lib (#5253)

69aed24

Summary: Pull Request resolved: #5253 Reviewed By: shoumikhin Differential Revision: D62474497 Pulled By: kirklandsign fbshipit-source-id: 408cd0340dce706b758097bfd6f9606bfe506460

spinquant in eager mode (#5125)

41bc1ce

Summary: Pull Request resolved: #5125 This PR adds the option to export the model with spin quant on gpu. Reviewed By: mergennachin Differential Revision: D62042861 fbshipit-source-id: 74274fcb3408e5f6b23e0c924272385090da03d2

Add model execution scripts and runner (#5217)

7e374d7

Summary: Add execution scripts and runner for 8 OSS models Pull Request resolved: #5217 Reviewed By: kirklandsign Differential Revision: D62479707 Pulled By: cccclai fbshipit-source-id: 81310dbb6b785ec59329110ebacb8208102e8597

Add scalar tensor tests. (#5260)

3171ede

Summary: Pull Request resolved: #5260 overriding_review_checks_triggers_an_audit_and_retroactive_review Oncall Short Name: executorch Differential Revision: D62484498 fbshipit-source-id: 345fcd365d25beb1e2ae713cca9bea36e1db04d2

Add helper function to create empty, full, ones and zeros tensors. (#…

d6b800b

…5261) Summary: Pull Request resolved: #5261 . Reviewed By: kirklandsign Differential Revision: D62486240 fbshipit-source-id: 1c89db9ed2b31d85ffa68320348f00bc297686f8

Add helper function to create random tensors. (#5266)

75a56a2

Summary: Pull Request resolved: #5266 Reviewed By: kirklandsign Differential Revision: D62501925 fbshipit-source-id: 790ca389887bb3921fe13d92dbc61e804cfe0c19

swolchok requested a review from kimishpatel September 11, 2024 20:47

kimishpatel approved these changes Sep 11, 2024

View reviewed changes

swolchok added 2 commits September 11, 2024 14:30

Update base for Update on "add quantized fast_hadamard_transform_28N"

aa01937

Differential Revision: [D60943029](https://our.internmc.facebook.com/intern/diff/D60943029/) [ghstack-poisoned]

Update on "add quantized fast_hadamard_transform_28N"

369aaad

Differential Revision: [D60943029](https://our.internmc.facebook.com/intern/diff/D60943029/) [ghstack-poisoned]

swolchok closed this Sep 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add quantized fast_hadamard_transform_28N #5184

add quantized fast_hadamard_transform_28N #5184

Uh oh!

swolchok commented Sep 9, 2024 •

edited

Loading

Uh oh!

pytorch-bot bot commented Sep 9, 2024 •

edited

Loading

Uh oh!

facebook-github-bot commented Sep 9, 2024

Uh oh!

facebook-github-bot commented Sep 9, 2024

Uh oh!

facebook-github-bot commented Sep 11, 2024

Uh oh!

swolchok commented Sep 11, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

27 participants

add quantized fast_hadamard_transform_28N #5184

add quantized fast_hadamard_transform_28N #5184

Uh oh!

Conversation

swolchok commented Sep 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Sep 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/5184

Uh oh!

facebook-github-bot commented Sep 9, 2024

Uh oh!

facebook-github-bot commented Sep 9, 2024

Uh oh!

facebook-github-bot commented Sep 11, 2024

Uh oh!

swolchok commented Sep 11, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

27 participants

swolchok commented Sep 9, 2024 •

edited

Loading

pytorch-bot bot commented Sep 9, 2024 •

edited

Loading