-
Notifications
You must be signed in to change notification settings - Fork 712
dtype selective build from model API in OSS #11760
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/11760
Note: Links to docs will display an error until the docs builds have been completed. ❌ 12 New Failures, 1 Unrelated FailureAs of commit eb3fc1e with merge base fcc7f3b ( NEW FAILURES - The following jobs have failed:
BROKEN TRUNK - The following job failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
@pytorchbot label "release notes: none" |
4e2792b to
7d823f2
Compare
| endif() | ||
| endif() | ||
|
|
||
| if(GEN_KERNEL_LIBS) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is unnecessary I think — aren't we already in an if(GEN_KERNEL_LIBS) block from line 241?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue is that if portable_kernels is listed in GEN_KERNEL_LIBS, we remove it from the list (line 243). I thought it might be possible that there may be multiple options passed in here, so this check ensure that if others are specified, they still get linked. I'm not entirely sure if this use case is probable though?
f1c6439 to
ed3abb7
Compare
The entrie flow works now, from reading in a model's pte file to rebuilding the executorch binary with selected ops and dtypes in OSS. Tests on add and add_mul match expectations. MV2 and MV3 suffer from unrelated issues (i.e. the parser is unable to get all the needed info from the pte file when building the YAML).
ed3abb7 to
eb3fc1e
Compare
Given a specific model, we want to produce a binary that only includes the minimal operators and dtypes needed to run the model. This requires parsing the model to determine what kernels it launch, the operators used in those kernels, and the dtypes of the tensors in the kernels. After parsing, a header file must be generated, and the portable_kernels lib can be rebuilt to only include the operators and dtypes specified in the generated header. This changes completes this E2E process. A user can now specify the model they wish to optimize their binary for via the command line argument `-DEXECUTORCH_SELECT_OPS_FROM_MODEL="<file path to model pte>"`. When specified, the pte is parsed to produce a YAML file called `seleced_operators.yaml` which describes the model's operators and dtypes. From this YAML, a header file called `selected_op_variants.h` is generated that selects the described operators and dtypes. When command line argument `-DEXECUTORCH_DTYPE_SELECTIVE_BUILD=ON` is specified, the header file is linked to the `portable_kernels` lib when it's rebuilt. Only the model API is supported with dtype selective build, and using other methods such as `list` or `dict` will results in a build error. An example usage of this flow is included in `examples/selective_build/test_selective_build.sh:test_cmake_select_ops_in_model`. When run as `bash examples/selective_build/test_selective_build.sh cmake`, the `cmake-out/examples/selective_build/selective_build_test` binary is built. After stripping the binary, the following binary size results were seen with the following models: | Model | Default Binary Size (KB) | Dtype Selected Binary Size (KB) | | ------- | :---:| :---:| |add | 359 | 275| |mul | 335 | 263 | | add_mul| 367 | 287 | | linear | 347 | 291 | | softmax | 251 | 251 | |resnet18| 643 | 515 | |resnet50 | 643 | 515 | |mobilebert| 707 | 415 | |lstm| 643 | 459| | dl3| 607 | 539| |edsr | 543 | 371| | Model | Default Binary Size (KB) | Dtype Selected Binary Size (KB) | | ------- | :---:| :---:| |emformer_transcribe| 863| 555| |vit| 907| 687 | |mv2| 631 | 495 | |mv3| 843 | 535 | |llama | 1.1M | 827| |qwen2_5 | 1.1M | 827| Although there is a noted reduction in the binary size, it seems that the pte file parsing functionality from `gen_oplist.py` is incomplete. Please see the discussion on [PR details.
|
Tested with openai's whisper-tiny model. Dtype selection helps bring the binary size down to 523 KB, but it still crashes based on the same issues mentioned above. cc @lucylq |
Motivation
Given a specific model, we want to produce a binary that only includes the minimal operators and dtypes needed to run the model. This requires parsing the model to determine what kernels it launch, the operators used in those kernels, and the dtypes of the tensors in the kernels. After parsing, a header file must be generated, and the portable_kernels lib can be rebuilt to only include the operators and dtypes specified in the generated header.
Summary
This changes completes this E2E process. A user can now specify the model they wish to optimize their binary for via the command line argument
-DEXECUTORCH_SELECT_OPS_FROM_MODEL="<file path to model pte>". When specified, the pte is parsed to produce a YAML file calledseleced_operators.yamlwhich describes the model's operators and dtypes. From this YAML, a header file calledselected_op_variants.his generated that selects the described operators and dtypes. When command line argument-DEXECUTORCH_DTYPE_SELECTIVE_BUILD=ONis specified, the header file is linked to theportable_kernelslib when it's rebuilt. Only the model API is supported with dtype selective build, and using other methods such aslistordictwill results in a build error.Results
An example usage of this flow is included in
examples/selective_build/test_selective_build.sh:test_cmake_select_ops_in_model. When run asbash examples/selective_build/test_selective_build.sh cmake, thecmake-out/examples/selective_build/selective_build_testbinary is built. After stripping the binary, the following binary size results were seen with the following models:Working models
Models that crash when run
Notes
Although there is a noted reduction in the binary size, it seems that the pte file parsing functionality from
gen_oplist.pyis incomplete. Please see the discussion on PR #11582 and on issue #11762 for more details.