Skip to content

Conversation

@swolchok
Copy link
Contributor

@swolchok swolchok commented Sep 9, 2024

Stack from ghstack (oldest at bottom):

Demonstrate that we can calculate a quantized fast hadamard transform with integer math only, except for adjusting the scale of the result. (Not sure if there is a reason to actually commit this -- do we have a use case for quantized FHT on CPU?)

Differential Revision: D60866280

NOTE FOR REVIEWERS: This PR has internal Meta-specific changes or comments, please review them on Phabricator!

lucylq and others added 8 commits September 9, 2024 10:29
Differential Revision: D62366679

Pull Request resolved: #5172
Differential Revision: D62366751

Pull Request resolved: #5177
Differential Revision: D61134262

Pull Request resolved: #4721
Differential Revision: D62266927

Pull Request resolved: #5133
Differential Revision: D62167345

Pull Request resolved: #5061
Differential Revision: D62151658

Pull Request resolved: #5122
Demonstrate that we can calculate a quantized fast hadamard transform with integer math only, except for adjusting the scale of the result. (Not sure if there is a reason to actually commit this -- do we have a use case for quantized FHT on CPU?)

Differential Revision: [D60866280](https://our.internmc.facebook.com/intern/diff/D60866280/)

**NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D60866280/)!

[ghstack-poisoned]
@pytorch-bot
Copy link

pytorch-bot bot commented Sep 9, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/5183

Note: Links to docs will display an error until the docs builds have been completed.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 9, 2024
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D60866280

Gasoonjia and others added 4 commits September 9, 2024 14:13
Differential Revision: D59984028

Pull Request resolved: #4332
…#5072)

* Qualcomm AI Engine Direct - Optimization and fix mutable buffer issue

Summary:
- Add a pass to convert linear to conv2d:
We found the accuracy drop because of QNN Linear op in llama3. And it will be fixed with convert linear to conv2d pass.
- Workaround the issue about mutable buffer for index_put op:
We add a pass to replace the input of index_put op.
Under the workaround, it will result in performance regression.
- Insert copy op for int64 inputs to convert int64 to int32 in i64toi32
  pass
- Support QNN RMS Norm and use native rms norm in llama_transformer
- Add a pass to compose rms norm

* Use transform to replace rms_norm

* temporarily remove test-llama-runner-qnn-linux

---------

Co-authored-by: Sheng Feng Wu <[email protected]>
Demonstrate that we can calculate a quantized fast hadamard transform with integer math only, except for adjusting the scale of the result. (Not sure if there is a reason to actually commit this -- do we have a use case for quantized FHT on CPU?)

Differential Revision: [D60866280](https://our.internmc.facebook.com/intern/diff/D60866280/)

**NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D60866280/)!

[ghstack-poisoned]
Demonstrate that we can calculate a quantized fast hadamard transform with integer math only, except for adjusting the scale of the result. (Not sure if there is a reason to actually commit this -- do we have a use case for quantized FHT on CPU?)

Differential Revision: [D60866280](https://our.internmc.facebook.com/intern/diff/D60866280/)

**NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D60866280/)!

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D60866280

@swolchok swolchok changed the base branch from gh/swolchok/37/base to gh/swolchok/36/head September 9, 2024 22:30
LeeOHzzZ and others added 5 commits September 9, 2024 15:37
Differential Revision: D62206605

Pull Request resolved: #5191
We added API in Java. Need to register in JNI as well

Pull Request resolved: #5201
Differential Revision: D62394222

Pull Request resolved: #5180
Demonstrate that we can calculate a quantized fast hadamard transform with integer math only, except for adjusting the scale of the result. (Not sure if there is a reason to actually commit this -- do we have a use case for quantized FHT on CPU?)

Differential Revision: [D60866280](https://our.internmc.facebook.com/intern/diff/D60866280/)

**NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D60866280/)!

[ghstack-poisoned]
Demonstrate that we can calculate a quantized fast hadamard transform with integer math only, except for adjusting the scale of the result. (Not sure if there is a reason to actually commit this -- do we have a use case for quantized FHT on CPU?)

Differential Revision: [D60866280](https://our.internmc.facebook.com/intern/diff/D60866280/)

**NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D60866280/)!

[ghstack-poisoned]
guangy10 and others added 22 commits September 10, 2024 13:31
Differential Revision: D62458651

Pull Request resolved: #5205
realpath works differently on MacOS

Change-Id: I17e114cd289692aa6de8a5b4e6f29fc1734aca08
Differential Revision: D62458604

Pull Request resolved: #5231
…haders + massive code cleanup

Differential Revision: D62444923

Pull Request resolved: #5223
Summary:
We specify a model dir, not model path. It's easier to update test spec

Pull Request resolved: #5250

Reviewed By: huydhn

Differential Revision: D62473641

Pulled By: kirklandsign

fbshipit-source-id: 40864831de9960fe29b101683ef7182e2f56fe7b
Summary: Pull Request resolved: #5235

Reviewed By: cmodi-meta, shoumikhin

Differential Revision: D62468267

Pulled By: kirklandsign

fbshipit-source-id: d64f28cb7c6c97853bbb557af63c1f6937b3626d
Summary:
Pull Request resolved: #5241

- Add default system prompt
- Set temperature to 0
- Load model directly upon click

Reviewed By: cmodi-meta, kirklandsign

Differential Revision: D62472502

fbshipit-source-id: 8ecc88ee4474afa50658e93955c49ff0f3eef745
Summary:
Currently when a non-core ATen operator shows up in the exported graph, `to_edge()` will fail and the only option is to disable IR validity check by setting `_check_ir_validity=False`. However this is unsafe to do, instead we should still run the rest of the checks.

This PR adds support to allow users to bypass core ATen ops check, by passing a list of non-core ATen ops into `to_edge()`.

Note that:

* This is different than `ops_set_to_not_decompose` in `to_edge_transform_and_lower`, as the ops in `_core_aten_ops_exception_list` are not intended to be kept but more likely showing up because of missing decompositions or missing core ATen tag in `native_functions.yaml`. For this reason, we are combining two lists (`ops_set_to_not_decompose` and `_core_aten_ops_exception_list`) and pass to verifier.
* I updated the error log to encourage people to use `_core_aten_ops_exception_list` instead of using `_check_ir_validity=False`.

Pull Request resolved: #5237

Test Plan: Added unit test

Reviewed By: tarun292

Differential Revision: D62469015

Pulled By: larryliu0820

fbshipit-source-id: 1abb1b4fbbfdf3eb5e64e82e2035c7f93cf5b153
Summary: Pull Request resolved: #5253

Reviewed By: shoumikhin

Differential Revision: D62474497

Pulled By: kirklandsign

fbshipit-source-id: 408cd0340dce706b758097bfd6f9606bfe506460
Summary:
Pull Request resolved: #5125

This PR adds the option to export the model with spin quant on gpu.

Reviewed By: mergennachin

Differential Revision: D62042861

fbshipit-source-id: 74274fcb3408e5f6b23e0c924272385090da03d2
Summary:
The upload should not be all or nothing ([example flow](https://github.com/pytorch/executorch/actions/runs/10783442883)). It should upload exported models to S3 if there is at least one artifact.

Pull Request resolved: #5232

Test Plan:
- Android: https://github.com/pytorch/executorch/actions/runs/10800212616
  - iOS: https://github.com/pytorch/executorch/actions/runs/10799346884

Reviewed By: huydhn

Differential Revision: D62459630

Pulled By: guangy10

fbshipit-source-id: cbf6c1c9e030089096d126b91ec10a936030e15b
Summary:
Add execution scripts and runner for 8 OSS models

Pull Request resolved: #5217

Reviewed By: kirklandsign

Differential Revision: D62479707

Pulled By: cccclai

fbshipit-source-id: 81310dbb6b785ec59329110ebacb8208102e8597
Summary:
Pull Request resolved: #5142

Intermediate debugging in delegate doesn't work without also doing  intermediate latency profiling in delegates. This diff is to fix this issue. It's currently blocking modai and htp side of work.

Reviewed By: Jack-Khuu

Differential Revision: D60947913

fbshipit-source-id: 78cb252dc4f0088c2af3a27f467f8cb6182cc785
Summary:
Pull Request resolved: #5242

No immediate need for this, but it is extremely simple to implement so why not support it?
ghstack-source-id: 241919004
exported-using-ghexport

Reviewed By: kimishpatel

Differential Revision: D62151659

fbshipit-source-id: 7cb5850981ad0666a304e7917d407847037ffa2d
Summary:
Pull Request resolved: #5243

If we happen to be running without a delegate, directly implementing linear is much more efficient than permute_copy_out (materialize a transpose) followed by matmul.
ghstack-source-id: 241918986
exported-using-ghexport

Reviewed By: kimishpatel

Differential Revision: D62154007

fbshipit-source-id: 7b764cf9de616729541f081a51384ba8e18e72f5
Summary:
Pull Request resolved: #5260
overriding_review_checks_triggers_an_audit_and_retroactive_review
Oncall Short Name: executorch

Differential Revision: D62484498

fbshipit-source-id: 345fcd365d25beb1e2ae713cca9bea36e1db04d2
Summary:
## Motivation
Short term: TorchAO int4 quantization yields float zero point, but CoreML does not have good support for it yet. We will need CoreML int4 quantization for now.

Intermediate term: Before torch implements all CoreML-supported quantizations (e.g. palettization, sparcification, joint compression...), it will be great to have a way to use/experiment those CoreML quantizations.

## Solution
In CoreML preprocess, we add CoreML quantization config as a compile spec

Pull Request resolved: #5228

Reviewed By: kirklandsign

Differential Revision: D62468184

Pulled By: cccclai

fbshipit-source-id: 9f4987d19a01eaf5e2814c9ff8089324174644f8
…5261)

Summary:
Pull Request resolved: #5261

.

Reviewed By: kirklandsign

Differential Revision: D62486240

fbshipit-source-id: 1c89db9ed2b31d85ffa68320348f00bc297686f8
Summary: Pull Request resolved: #5266

Reviewed By: kirklandsign

Differential Revision: D62501925

fbshipit-source-id: 790ca389887bb3921fe13d92dbc61e804cfe0c19
Demonstrate that we can calculate a quantized fast hadamard transform with integer math only, except for adjusting the scale of the result. (Not sure if there is a reason to actually commit this -- do we have a use case for quantized FHT on CPU?)

Differential Revision: [D60866280](https://our.internmc.facebook.com/intern/diff/D60866280/)

**NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D60866280/)!

[ghstack-poisoned]
Demonstrate that we can calculate a quantized fast hadamard transform with integer math only, except for adjusting the scale of the result. (Not sure if there is a reason to actually commit this -- do we have a use case for quantized FHT on CPU?)

Differential Revision: [D60866280](https://our.internmc.facebook.com/intern/diff/D60866280/)

**NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D60866280/)!

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D60866280

@swolchok
Copy link
Contributor Author

re-exporting this stack because it's been bitten by a tooling issue

@swolchok swolchok closed this Sep 11, 2024
kedarnath03 pushed a commit to kedarnath03/executorch that referenced this pull request Jun 25, 2025
Pull Request resolved: pytorch/executorch#5183

Demonstrate that we can calculate a quantized fast hadamard transform with integer math only, except for adjusting the scale of the result. (Not sure if there is a reason to actually commit this -- do we have a use case for quantized FHT on CPU?)

Differential Revision: [D60866280](https://our.internmc.facebook.com/intern/diff/D60866280/)

**NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D60866280/)!
ghstack-source-id: 241722286
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.