-
Couldn't load subscription status.
- Fork 700
[XNNPACK Quantizer] Select between TConvs and Convs #11863
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: gh/mcr229/32/orig
Are you sure you want to change the base?
Conversation
…uple outputs (#11647) ### Summary This PR fixes the `channels_last_tagged_reshape_pass.py` to properly handle tuple outputs with mixed memory formats. Previously, the pass only checked and converted the first element of tuple outputs, which could lead to incorrect memory formats for other elements in the tuple. This fix is important for models that return multiple outputs with different memory format requirements, such as a mix of convolution outputs (which should be in NHWC format) and linear outputs (which should be in standard format). ### Test plan I added a new test class `ThreeOutputsModel` that has three outputs with different memory format requirements. I ensured that this test output given NCHW and NHWC inputs would evaluate properly. I also created a simpler 2 input class `ConvAddConvOutput` which operated on different inputs and returned two different dim order outputs.
Differential Revision: D76737404 Pull Request resolved: #11727
Differential Revision: D76469624 Pull Request resolved: #11577
### Summary Fixed linter error. ### Test plan CI Co-authored-by: Guang Yang <[email protected]>
#11745) ### Summary Running `install_dev.py` for `optimum-executorch` will force overriding installed `executorch` and torch deps to the pinned nightly in `optimum-executorch`. In ExecuTorch CI including the benchmark, we would want to always run the optimum-executorch models with ExecuTorch from source to catch issues/regressions. ### Test plan Verified the installed deps in the CI and benchmark jobs Co-authored-by: Guang Yang <[email protected]>
### Summary 1. Update MediaTek backend documents for the decoupled buffer allocator. 2. Follow backend template. 3. Remove unnecessary instructions. Fixes #8532 @pytorchbot label "partner: mediatek"
Differential Revision: D76745314 Pull Request resolved: #11739
As titled, this API allows us to support multi-turn conversation by passing in a `start_pos` argument to `generate_from_pos`. This pull request introduces a new feature to support text generation from a specific starting position (`generate_from_pos`) and includes updates to ensure proper error handling and functionality when `max_new_tokens` is negative. The changes primarily focus on extending the `TextLLMRunner` class and its associated methods to accommodate this new feature while maintaining backward compatibility. ### New Feature: Text Generation from a Specific Starting Position * **Added `generate_from_pos` Method**: Introduced a new method `generate_from_pos` in `TextLLMRunner` to allow text generation starting from a specified position in the KV cache. This includes updates to the method signature, logic, and error handling. (`extension/llm/runner/text_llm_runner.cpp` [[1]](diffhunk://#diff-9b3bd38c0b1ad81b18afab15784634e2b394fda448f5e2dae03de58870751440L76-R78) [[2]](diffhunk://#diff-9b3bd38c0b1ad81b18afab15784634e2b394fda448f5e2dae03de58870751440R129-R156) [[3]](diffhunk://#diff-9b3bd38c0b1ad81b18afab15784634e2b394fda448f5e2dae03de58870751440L150-R165) [[4]](diffhunk://#diff-9b3bd38c0b1ad81b18afab15784634e2b394fda448f5e2dae03de58870751440R219-R225); `extension/llm/runner/text_llm_runner.h` [[5]](diffhunk://#diff-d1aa44a87ea9b7ec51250c2002466cb9bd57db153c1c8b58ffdf73e8f231a89bR98-R122) * **Updated Documentation**: Enhanced method documentation in `TextLLMRunner` to describe the new functionality, including parameters like `start_pos` and the expected behavior. (`extension/llm/runner/text_llm_runner.h` [[1]](diffhunk://#diff-d1aa44a87ea9b7ec51250c2002466cb9bd57db153c1c8b58ffdf73e8f231a89bL81-R83) [[2]](diffhunk://#diff-d1aa44a87ea9b7ec51250c2002466cb9bd57db153c1c8b58ffdf73e8f231a89bR98-R122) ### Error Handling Improvements * **Validation for `max_new_tokens`**: Added checks to ensure `max_new_tokens` is positive. If it is not, an `InvalidArgument` error is returned. This prevents invalid configurations during text generation. (`extension/llm/runner/text_llm_runner.cpp` [extension/llm/runner/text_llm_runner.cppR129-R156](diffhunk://#diff-9b3bd38c0b1ad81b18afab15784634e2b394fda448f5e2dae03de58870751440R129-R156)) * **Unit Test for Negative `max_new_tokens`**: Created a new test case (`GenerateFromPosErrorsWithNegativeMaxNewTokens`) to verify that the `generate_from_pos` method correctly handles scenarios where `max_new_tokens` is negative. (`extension/llm/runner/test/test_text_llm_runner.cpp` [extension/llm/runner/test/test_text_llm_runner.cppR325-R379](diffhunk://#diff-0a1e69b4182878ccad887c4f4ba3929ef24082a26623e26a871d73f4e6cea503R325-R379))
…1724) Arm backend: Added decomposition for MaxPool2D operator with dilation > 0 Signed-off-by: Elena Zhelezina <[email protected]>
- Adds support for per-channel quantization in TosaQuantizer and TosaBackend - Enables per-channel quantization for MobilneNetV2 test cases cc @digantdesai @freddan80 @per @zingo --------- Signed-off-by: Oscar Andersson <[email protected]>
The introduction of decomposition for linalg vector norm revealed a bug that when dim is None, then all dimensions should be reduced. Signed-off-by: Elena Zhelezina <[email protected]>
Differential Revision: D76746854 Pull Request resolved: #11751
Differential Revision: D76791781 Pull Request resolved: #11750
### Summary This PR uses `xnn_define_binary` and `xnn_define_unary` to define XNNPack ops, instead of separately calling the individual definitions. Further changes: 1. Removes individual node definitions for unary and binary ops 2. Creates a wrapper macro to generate function defs for individual ops using `xnn_define_binary` and `xnn_define_unary` inside. Fixes #11584 ### Test plan ``` ## Build steps cmake -DEXECUTORCH_BUILD_XNNPACK=ON .. cmake --build cmake-out -j9 Tests ran: ./test/run_oss_cpp_tests.sh . . . 100% tests passed, 0 tests failed out of 86 ```
…1546) ### Summary This PR consists of 4 Encoder-Only models. Following stats are based on SM8750. 1. Albert (16a16w) - Accuracy: ~22% (NOTE: nn.Module accuracy is around 24%, so the similarity between QNN and nn.Module is around 92%) - Speed: 11ms/inf - Script: `python examples/qualcomm/oss_scripts/albert.py -b build-android -s $DEVICE -m SM8750 --dataset ../wikipedia-sentences/wikisent2.txt` 2. Bert (16a8w) - Accuracy: ~60% - Speed: 9ms/inf - Script: `python examples/qualcomm/oss_scripts/bert.py -b build-android -s $DEVICE -m SM8750 --dataset ../wikipedia-sentences/wikisent2.txt` 3. Distilbert (16a8w) - Accuracy: ~59% - Speed: 8ms/inf - Script: `python examples/qualcomm/oss_scripts/distilbert.py -b build-android -s $DEVICE -m SM8750 --dataset ../wikipedia-sentences/wikisent2.txt` 4. Eurobert (16a16w) - Accuracy: ~54% - Speed: 40ms/inf - Script: `python examples/qualcomm/oss_scripts/eurobert.py -b build-android -s $DEVICE -m SM8750 --dataset ../wikipedia-sentences/wikisent2.txt` ### Test plan - E2E Scripts under `test_qnn_delegate.py` - Example script: `python backends/qualcomm/tests/test_qnn_delegate.py -k TestExampleOssScript.test_{BERT_MODEL} --model SM8750 -s $DEVICE --build_folder build-android/ -r ./ -a ./test --sentence_dataset ../wikipedia-sentences/wikisent2.txt` - Mainline CI Author: @haowhsu-quic, @chunit-quic, @winskuo-quic
Differential Revision: D76781331 Pull Request resolved: #11759
#11596) ### Summary Refactor the XNNPACK tester to split out reusable base components from XNNPACK-specific parts. I've relocated the base classes to backends/test/harness. I've kept the tester structure pretty much unchanged, except for replacing stage names with an enum. It looks like Arm tests are directly importing for XNNPACK's tester currently. Ideally, we'll want to refactor to have their own stage implementations, but I've left that as a follow-up to minimize changes for the initial refactor. ### Test plan CI
… fbsource sleef (#11261)" (#11765) This PR was created by the merge bot to help merge the original PR into the main branch. ghstack PR number: #11657 by @swolchok ^ Please use this as the source of truth for the PR details, comments, and reviews ghstack PR base: https://github.com/pytorch/executorch/tree/gh/swolchok/458/base ghstack PR head: https://github.com/pytorch/executorch/tree/gh/swolchok/458/head Merge bot PR base: https://github.com/pytorch/executorch/tree/main Merge bot PR head: https://github.com/pytorch/executorch/tree/gh/swolchok/458/orig @diff-train-skip-merge Co-authored-by: Scott Wolchok <[email protected]>
This PR was created by the merge bot to help merge the original PR into the main branch. ghstack PR number: #11369 by @ahmtox ^ Please use this as the source of truth for the PR details, comments, and reviews ghstack PR base: https://github.com/pytorch/executorch/tree/gh/ahmtox/11/base ghstack PR head: https://github.com/pytorch/executorch/tree/gh/ahmtox/11/head Merge bot PR base: https://github.com/pytorch/executorch/tree/main Merge bot PR head: https://github.com/pytorch/executorch/tree/gh/ahmtox/11/orig @diff-train-skip-merge Co-authored-by: morelos <[email protected]>
Creating the dequantize_per_tensor and dequantize_per_token logic shaders and impl which are linked with the testing framework. Differential Revision: [D76267107](https://our.internmc.facebook.com/intern/diff/D76267107/) [ghstack-poisoned]
Creating the choose_qparams per_tensor and per_token logic shaders and impl which are linked with the testing framework Differential Revision: [D76436933](https://our.internmc.facebook.com/intern/diff/D76436933/) [ghstack-poisoned]
Differential Revision: D76842266 Pull Request resolved: #11764
Differential Revision: D76483572 Pull Request resolved: #11592
…hapes Differential Revision: D76530379 Pull Request resolved: #11611
…11778) This PR was created by the merge bot to help merge the original PR into the main branch. ghstack PR number: #11757 by @cccclai ^ Please use this as the source of truth for the PR details, comments, and reviews ghstack PR base: https://github.com/pytorch/executorch/tree/gh/cccclai/28/base ghstack PR head: https://github.com/pytorch/executorch/tree/gh/cccclai/28/head Merge bot PR base: https://github.com/pytorch/executorch/tree/main Merge bot PR head: https://github.com/pytorch/executorch/tree/gh/cccclai/28/orig @diff-train-skip-merge Co-authored-by: Chen Lai <[email protected]>
Differential Revision: D76781745 Pull Request resolved: #11746
) - Constant placeholders with same values but different data types, such as int32 and fp32, shouldn't be fused into a single placeholder. Otherwise, some operators will have operands with mismatched dtypes. - Fix the bug by adding a dtype check to fuse only constants with matching types and same values. Signed-off-by: Yufeng Shi <[email protected]>
Differential Revision: D76954785 Pull Request resolved: #11824
# Summary Provide methods and script to fetch all execuTorch benchamrk data from HUD API into two dataset,private and public, the script will: - fetch all data from HUD API from input time range in UTC - clean out records and tables with only FAILURE_REPORT due to job-level failures - get all private table metrics, generate `table_name` and find intersected public table metrics - generate private and public table groups - output data OutputType: - run with excel-sheet export - run with csv export - run with dataframe format print - run with json format print See more guidance in README.md the data is similar to the excel sheet generated manually in #10982 The result should be the same as the hud per model datatable: <img width="1480" alt="image" src="https://github.com/user-attachments/assets/7c6cc12e-50c5-4ce2-ac87-5cac650486e3" /> ## helper methods: common.py provide common.py helper method to convert back csv and excel sheets back to {"groupInfo":{}, "df":df.DataFrame} format. # run with ``` bash python3 .ci/scripts/benchmark_tooling/get_benchmark_analysis_data.py \ --startTime "2025-04-29T09:48:57" \ --endTime "2025-05-13T22:00:00" \ --outputType "excel" \ --models "mv3" python3 .ci/scripts/benchmark_tooling/analyze_benchmark_stability.py \ --primary-file private.xlsx \ --reference-file public.xlsx ``` Generate excel files: [private.xlsx](https://github.com/user-attachments/files/20844977/private.xlsx) [public.xlsx](https://github.com/user-attachments/files/20844978/public.xlsx) For instance you can find result for mv3 xnnq_q8 S22 Ultra android 14: ``` Latency Stability Analysis: table10 (Primary) ================================================================================ Model: mv3(xnnpack_q8) Device: Samsung Galaxy S22 Ultra 5G (private)(Android 14) Dataset Overview: - Number of samples: 88 - Date range: 2025-04-29 09:48:57+00:00 to 2025-05-13 21:08:36+00:00 Central Tendency Metrics: - Mean latency: 2.91 ms - Median latency (P50): 2.54 ms - Mean trimmed latency: 2.41 ms - Median trimmed latency: 2.15 ms Dispersion Metrics: - Standard deviation: 1.14 ms - Coefficient of variation (CV): 39.08% - Interquartile range (IQR): 0.82 ms - Trimmed standard deviation: 0.76 ms - Trimmed coefficient of variation: 31.60% Percentile Metrics: - P50 (median): 2.54 ms - P90: 3.88 ms - P95: 4.60 ms - P99: 5.91 ms Inter-Jitter Metrics (variability between runs): - Max/Min ratio: 5.6103 - P99/P50 ratio: 2.3319 - Mean rolling std (window=5): 0.79 ms Intra-Jitter Metrics (variability within runs): - Mean trimming effect ratio: 15.37% - Max trimming effect ratio: 38.83% Stability Assessment: - Overall stability score: 0.0/100 - Overall stability rating: Poor Interpretation: The benchmark shows poor stability (score: 0.0/100) with significant variation between runs (CV: 39.08%). Performance is unpredictable and may lead to inconsistent user experience. The significant difference between raw and trimmed means suggests considerable intra-run jitter (15.4%) with occasional outliers within benchmark runs. The max/min ratio of 5.61 indicates substantial performance differences between the best and worst runs. The P99/P50 ratio of 2.33 suggests occasional latency spikes that could affect tail latency sensitive applications. ``` --------- Signed-off-by: Yang Wang <[email protected]>
…diate outputs Differential Revision: D76831086 Pull Request resolved: #11855
…ups==1 (#11774) This PR was created by the merge bot to help merge the original PR into the main branch. ghstack PR number: #11730 by @mcr229 ^ Please use this as the source of truth for the PR details, comments, and reviews ghstack PR base: https://github.com/pytorch/executorch/tree/gh/mcr229/31/base ghstack PR head: https://github.com/pytorch/executorch/tree/gh/mcr229/31/head Merge bot PR base: https://github.com/pytorch/executorch/tree/main Merge bot PR head: https://github.com/pytorch/executorch/tree/gh/mcr229/31/orig @diff-train-skip-merge --------- Co-authored-by: Max Ren <[email protected]> Co-authored-by: Gregory Comer <[email protected]>
Fixes some bugs with how enum fields are used.
Update documentation to use the new `export_llm` instead of the old `export_llama`.
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/11863
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit b7572d0 with merge base 0c12dcd ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
### Summary This PR adds support for the tanh operator in ExecuTorch via XNNPACK, enabling optimized execution of torch.tanh on the XNNPACK backend. The implementation includes updates to operator configuration, serialization, and runtime handling. The tanh operator is now properly registered in the XNNPACK partition config and mapped to XNNPACK's xnn_create_tanh_operator API in the compiler. ### Test plan I added a new test class TestTanh that is a simple torch model with a tanh op. It then asserts that the XNNPACK delegate was called while executing the tanh op instead of the torch default tanh op.
…ups==1 Pull Request resolved: #11730 Supporting Quantized Transposed Convs with Groups being 1. Previously, There was some added support for Quantized Transposed Convolutions but only when the channel axis is 1 and when the groups is 1. The current Quantizer didn't support this because it only allows quantizaing along the zero dim, which is generally the output channels. However for TransposedConvs, the dimension of the weights are: ``` [in_channels, out_channels/groups, h, w] ``` Since we want to keep quantization along the output channels, we now need to quantize along axis = 1. The reason we require groups to be one is because XNNPACK takes in filters of the dimension: ``` [out_channels, H, W, in_channels/groups] ``` Since we are quantizing along the output channels, in pytorch we expect to have out_channels/groups scales, but in xnnpack we have out_channels scales! Realistically we would need to support this with some affine quantization, where we provide a scale for every group, every out_channel. However for now, we just ensure the constraint where groups == 1. ghstack-source-id: 291033630 @exported-using-ghexport Differential Revision: [D76631781](https://our.internmc.facebook.com/intern/diff/D76631781/)
…groups ==1 Pull Request resolved: #11731 Here we support dynamically quantized Deconvolutions. There is some refactoring of the previous diff, but in general, we just remove the constraint in the Dynamism check that the convolution isn't transposed. For the same reasons as before, this only supports channel_axis = 1 and groups = 1. ghstack-source-id: 291033632 @exported-using-ghexport Differential Revision: [D76638904](https://our.internmc.facebook.com/intern/diff/D76638904/)
Pull Request resolved: #11732 Allow selection of Difference between transposed convs and regular convs. Previously, we grouped all conv targets together (transposed and regular convs), but now we enable better per-operator selection ghstack-source-id: 291033631 Differential Revision: [D76641838](https://our.internmc.facebook.com/intern/diff/D76641838/)
84a5579 to
b7572d0
Compare
|
Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as |
This PR was created by the merge bot to help merge the original PR into the main branch.
ghstack PR number: #11732 by @mcr229
^ Please use this as the source of truth for the PR details, comments, and reviews
ghstack PR base: https://github.com/pytorch/executorch/tree/gh/mcr229/33/base
ghstack PR head: https://github.com/pytorch/executorch/tree/gh/mcr229/33/head
Merge bot PR base: https://github.com/pytorch/executorch/tree/gh/mcr229/32/orig
Merge bot PR head: https://github.com/pytorch/executorch/tree/gh/mcr229/33/orig
@diff-train-skip-merge