dtype selective build from model API in OSS #11760

BujSet · 2025-06-17T18:28:16Z

Motivation

Given a specific model, we want to produce a binary that only includes the minimal operators and dtypes needed to run the model. This requires parsing the model to determine what kernels it launch, the operators used in those kernels, and the dtypes of the tensors in the kernels. After parsing, a header file must be generated, and the portable_kernels lib can be rebuilt to only include the operators and dtypes specified in the generated header.

Summary

This changes completes this E2E process. A user can now specify the model they wish to optimize their binary for via the command line argument -DEXECUTORCH_SELECT_OPS_FROM_MODEL="<file path to model pte>". When specified, the pte is parsed to produce a YAML file called seleced_operators.yaml which describes the model's operators and dtypes. From this YAML, a header file called selected_op_variants.h is generated that selects the described operators and dtypes. When command line argument -DEXECUTORCH_DTYPE_SELECTIVE_BUILD=ON is specified, the header file is linked to the portable_kernels lib when it's rebuilt. Only the model API is supported with dtype selective build, and using other methods such as list or dict will results in a build error.

Results

An example usage of this flow is included in examples/selective_build/test_selective_build.sh:test_cmake_select_ops_in_model. When run as bash examples/selective_build/test_selective_build.sh cmake, the cmake-out/examples/selective_build/selective_build_test binary is built. After stripping the binary, the following binary size results were seen with the following models:

Working models

Model	Default Binary Size (KB)	Dtype Selected Binary Size (KB)
add	359	275
mul	335	263
add_mul	367	287
linear	347	291
softmax	251	251
resnet18	643	515
resnet50	643	515
mobilebert	707	415
lstm	643	459
dl3	607	539
edsr	543	371

Models that crash when run

Model	Default Binary Size (KB)	Dtype Selected Binary Size (KB)
emformer_transcribe	863	555
vit	907	687
mv2	631	495
mv3	843	535
llama	1.1M	827
qwen2_5	1.1M	827

Notes

Although there is a noted reduction in the binary size, it seems that the pte file parsing functionality from gen_oplist.py is incomplete. Please see the discussion on PR #11582 and on issue #11762 for more details.

pytorch-bot · 2025-06-17T18:28:20Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/11760

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 12 New Failures, 1 Unrelated Failure

As of commit eb3fc1e with merge base fcc7f3b ():

NEW FAILURES - The following jobs have failed:

Build Linux Wheels / pytorch/executorch / build-manywheel-py3_10-cpu (gh)
ModuleNotFoundError: No module named 'torchvision'
Build Linux Wheels / pytorch/executorch / build-manywheel-py3_11-cpu (gh)
ModuleNotFoundError: No module named 'torchvision'
Build Linux Wheels / pytorch/executorch / build-manywheel-py3_12-cpu (gh)
ModuleNotFoundError: No module named 'torchvision'
Build Linux Wheels / pytorch/executorch / upload / upload-manywheel-py3_10-cpu (gh)
Unable to download artifact(s): Artifact not found for name: pytorch_executorch__3.10_cpu_x86_64
Build Linux Wheels / pytorch/executorch / upload / upload-manywheel-py3_11-cpu (gh)
Unable to download artifact(s): Artifact not found for name: pytorch_executorch__3.11_cpu_x86_64
Build Linux Wheels / pytorch/executorch / upload / upload-manywheel-py3_12-cpu (gh)
Unable to download artifact(s): Artifact not found for name: pytorch_executorch__3.12_cpu_x86_64
Build macOS Wheels / pytorch/executorch / build-wheel-py3_10-cpu (gh)
ModuleNotFoundError: No module named 'torchvision'
Build macOS Wheels / pytorch/executorch / build-wheel-py3_11-cpu (gh)
ModuleNotFoundError: No module named 'torchvision'
Build macOS Wheels / pytorch/executorch / build-wheel-py3_12-cpu (gh)
ModuleNotFoundError: No module named 'torchvision'
Build macOS Wheels / pytorch/executorch / upload / upload-wheel-py3_10-cpu (gh)
Unable to download artifact(s): Artifact not found for name: pytorch_executorch__3.10_cpu_
Build macOS Wheels / pytorch/executorch / upload / upload-wheel-py3_11-cpu (gh)
Unable to download artifact(s): Artifact not found for name: pytorch_executorch__3.11_cpu_
Build macOS Wheels / pytorch/executorch / upload / upload-wheel-py3_12-cpu (gh)
Unable to download artifact(s): Artifact not found for name: pytorch_executorch__3.12_cpu_

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / test-moshi-linux / linux-job (gh) (trunk failure)
test_exported_decoder_xnnpack

This comment was automatically generated by Dr. CI and updates every 15 minutes.

BujSet · 2025-06-17T18:52:27Z

@pytorchbot label "release notes: none"

tools/cmake/Codegen.cmake

jathu · 2025-06-18T14:24:20Z

tools/cmake/Codegen.cmake

+      endif()
+    endif()
+
+    if(GEN_KERNEL_LIBS)


This is unnecessary I think — aren't we already in an if(GEN_KERNEL_LIBS) block from line 241?

The issue is that if portable_kernels is listed in GEN_KERNEL_LIBS, we remove it from the list (line 243). I thought it might be possible that there may be multiple options passed in here, so this check ensure that if others are specified, they still get linked. I'm not entirely sure if this use case is probable though?

tools/cmake/Codegen.cmake

support for this

… find out what

The entrie flow works now, from reading in a model's pte file to rebuilding the executorch binary with selected ops and dtypes in OSS. Tests on add and add_mul match expectations. MV2 and MV3 suffer from unrelated issues (i.e. the parser is unable to get all the needed info from the pte file when building the YAML).

Given a specific model, we want to produce a binary that only includes the minimal operators and dtypes needed to run the model. This requires parsing the model to determine what kernels it launch, the operators used in those kernels, and the dtypes of the tensors in the kernels. After parsing, a header file must be generated, and the portable_kernels lib can be rebuilt to only include the operators and dtypes specified in the generated header. This changes completes this E2E process. A user can now specify the model they wish to optimize their binary for via the command line argument `-DEXECUTORCH_SELECT_OPS_FROM_MODEL="<file path to model pte>"`. When specified, the pte is parsed to produce a YAML file called `seleced_operators.yaml` which describes the model's operators and dtypes. From this YAML, a header file called `selected_op_variants.h` is generated that selects the described operators and dtypes. When command line argument `-DEXECUTORCH_DTYPE_SELECTIVE_BUILD=ON` is specified, the header file is linked to the `portable_kernels` lib when it's rebuilt. Only the model API is supported with dtype selective build, and using other methods such as `list` or `dict` will results in a build error. An example usage of this flow is included in `examples/selective_build/test_selective_build.sh:test_cmake_select_ops_in_model`. When run as `bash examples/selective_build/test_selective_build.sh cmake`, the `cmake-out/examples/selective_build/selective_build_test` binary is built. After stripping the binary, the following binary size results were seen with the following models: | Model | Default Binary Size (KB) | Dtype Selected Binary Size (KB) | | ------- | :---:| :---:| |add | 359 | 275| |mul | 335 | 263 | | add_mul| 367 | 287 | | linear | 347 | 291 | | softmax | 251 | 251 | |resnet18| 643 | 515 | |resnet50 | 643 | 515 | |mobilebert| 707 | 415 | |lstm| 643 | 459| | dl3| 607 | 539| |edsr | 543 | 371| | Model | Default Binary Size (KB) | Dtype Selected Binary Size (KB) | | ------- | :---:| :---:| |emformer_transcribe| 863| 555| |vit| 907| 687 | |mv2| 631 | 495 | |mv3| 843 | 535 | |llama | 1.1M | 827| |qwen2_5 | 1.1M | 827| Although there is a noted reduction in the binary size, it seems that the pte file parsing functionality from `gen_oplist.py` is incomplete. Please see the discussion on [PR details.

BujSet · 2025-07-23T17:21:45Z

Tested with openai's whisper-tiny model. Dtype selection helps bring the binary size down to 523 KB, but it still crashes based on the same issues mentioned above.

cc @lucylq

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 17, 2025

BujSet changed the title ~~Dtype selective build for cmake~~ dtype selective build from model API in OSS Jun 17, 2025

pytorch-bot bot added the release notes: none Do not include this in the release notes label Jun 17, 2025

BujSet self-assigned this Jun 17, 2025

BujSet added ciflow/trunk ciflow/binaries labels Jun 17, 2025

BujSet mentioned this pull request Jun 17, 2025

Dtype not selected from exported pte via gen_oplist.py #11762

Closed

BujSet force-pushed the dtype_selective_build_for_cmake branch 5 times, most recently from 4e2792b to 7d823f2 Compare June 17, 2025 23:32

BujSet marked this pull request as ready for review June 18, 2025 02:41

BujSet requested review from JacobSzwejbka, kirklandsign, larryliu0820 and lucylq as code owners June 18, 2025 02:41

jathu reviewed Jun 18, 2025

View reviewed changes

BujSet force-pushed the dtype_selective_build_for_cmake branch from f1c6439 to ed3abb7 Compare June 18, 2025 17:22

BujSet added 9 commits June 18, 2025 12:38

Testing flow to generate selected_op_variants.h in cmake build process

2c0fe62

Now able to generate selected_op_variants.h from yaml spec

f64809e

Something with add_mull test seems to be broken

8b6dfd8

Add_mul example does not use dtype during cmake build, need to add

00531b1

support for this

Not seeing any benefit to using DTYPE selection in add_mul model

ffb72f8

Still testing

5ecaeb9

Making test flow a little easier

19b9a14

Something broke in test selective build, partially undoing changes to…

73cc298

… find out what

Testing, mv2.pte, seems like yaml does not include all the needed dtypes

194484b

BujSet added 12 commits June 18, 2025 12:38

Able to now persist the needed header file

986ae36

Cleanup to include error handling and debug code removal

d4e3fd8

Revert back to orig state on this

3ce1c79

Undoing change to unused file

7c90969

Error handling seems incorrect, testing a fix

7ce2fe7

Testing new syntax for testing cmake flag ON-OFF

917e7e8

Maybe cmake if condition can't handle complex expansions?

7c7a876

Fixing if logic condition

e4c4908

Rearranging if logic

733eb9e

Linting

eee7f17

Addressing comments on PR

eb3fc1e

BujSet force-pushed the dtype_selective_build_for_cmake branch from ed3abb7 to eb3fc1e Compare June 18, 2025 19:39

jathu approved these changes Jun 18, 2025

View reviewed changes

BujSet merged commit daebcde into pytorch:main Jun 18, 2025
354 of 378 checks passed

BujSet deleted the dtype_selective_build_for_cmake branch June 19, 2025 00:01

This was referenced Jun 26, 2025

Enable selective build model API in OSS #11434

Closed

Add dtype selective build for OSS #9983

Closed

BujSet mentioned this pull request Jul 1, 2025

Cross Compile Executorch for Arm Cortex M h/w (Raspberry PI Pico 2) and infer a simple model #11913

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

dtype selective build from model API in OSS #11760

dtype selective build from model API in OSS #11760

Uh oh!

BujSet commented Jun 17, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Jun 17, 2025 •

edited

Loading

Uh oh!

BujSet commented Jun 17, 2025

Uh oh!

Uh oh!

Uh oh!

jathu Jun 18, 2025

Uh oh!

BujSet Jun 18, 2025

Uh oh!

Uh oh!

Uh oh!

BujSet commented Jul 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

dtype selective build from model API in OSS #11760

dtype selective build from model API in OSS #11760

Uh oh!

Conversation

BujSet commented Jun 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Summary

Results

Working models

Models that crash when run

Notes

Uh oh!

pytorch-bot bot commented Jun 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/11760

❌ 12 New Failures, 1 Unrelated Failure

Uh oh!

BujSet commented Jun 17, 2025

Uh oh!

Uh oh!

Uh oh!

jathu Jun 18, 2025

Choose a reason for hiding this comment

Uh oh!

BujSet Jun 18, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

BujSet commented Jul 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

BujSet commented Jun 17, 2025 •

edited

Loading

pytorch-bot bot commented Jun 17, 2025 •

edited

Loading