[ET-VK] Enable automatic dtype conversion when copying to/from staging buffer #14222

SS-JIA · 2025-09-11T17:40:25Z

Stack from ghstack (oldest at bottom):

[ET-VK] Add 'half' variants to some Llama operators + enable llama vulkan export with force_fp16 flag #14223
-> [ET-VK] Enable automatic dtype conversion when copying to/from staging buffer #14222

Context

During export, Vulkan sometimes converts certain tensor dtypes. The most common case of this is that int64 and float64 are internally represented as int32 and float32 tensors. The primary reason for this is to reduce the number of dtype variants that need to be generated for each shader, and also due to the fact that 64-bit types are not guaranteed to be supported.

However, this raises an issue if an int64 or float64 tensor is marked as an input/output tensor of the model. The source/destination ETensor will have a different dtype than the internal representation, meaning that the input/output bytes will be interpreted incorrectly.

Changes

This diff fixes this behaviour by introducing the concept of a "staging dtype". This allows the staging buffer of a tensor to have a different dtype than the underlying GPU buffer or texture. When copying to/from the GPU resource, the dtype can then be converted to the correct dtype expected by the client code.

As a bonus, also add an optional setting to force fp16 to be used internally for fp32 tensors. This allows models to access half precision inference without needing to incur the cost of dtype conversion ops being inserted into the graph, or needing to manually convert inputs/outputs to half type.

Differential Revision: D82234180

…g buffer ## Context During export, Vulkan sometimes converts certain tensor dtypes. The most common case of this is that int64 and float64 are internally represented as int32 and float32 tensors. The primary reason for this is to reduce the number of dtype variants that need to be generated for each shader, and also due to the fact that 64-bit types are not guaranteed to be supported. However, this raises an issue if an int64 or float64 tensor is marked as an input/output tensor of the model. The source/destination ETensor will have a different dtype than the internal representation, meaning that the input/output bytes will be interpreted incorrectly. ## Changes This diff fixes this behaviour by introducing the concept of a "staging dtype". This allows the staging buffer of a tensor to have a different dtype than the underlying GPU buffer or texture. When copying to/from the GPU resource, the dtype can then be converted to the correct dtype expected by the client code. As a bonus, also add an optional setting to force fp16 to be used internally for fp32 tensors. This allows models to access half precision inference without needing to incur the cost of dtype conversion ops being inserted into the graph, or needing to manually convert inputs/outputs to half type. Differential Revision: [D82234180](https://our.internmc.facebook.com/intern/diff/D82234180/) [ghstack-poisoned]

pytorch-bot · 2025-09-11T17:40:28Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/14222

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 3 Pending, 6 Unrelated Failures

As of commit fe8a683 with merge base b2ae2b4 ():

NEW FAILURE - The following job has failed:

Build Presets / windows (pybind) / build (gh)

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

Build Windows Wheels / pytorch/executorch / build-wheel-py3_10-cpu (gh) (trunk failure)
RuntimeError: Failed to install QNN SDK. Please check the logs above.
Build Windows Wheels / pytorch/executorch / upload / upload-wheel-py3_10-cpu (gh) (trunk failure)
Unable to download artifact(s): Artifact not found for name: pytorch_executorch__3.10_cpu_x64
pull / test-binary-size-linux-gcc / linux-job (gh) (trunk failure)
##[error]The operation was canceled.
pull / test-moshi-linux / linux-job (gh) (trunk failure)
##[error]The operation was canceled.
pull / test-openvino-linux / linux-job (gh) (trunk failure)
##[error]The operation was canceled.
pull / test-setup-linux-gcc / linux-job (gh) (trunk failure)
##[error]The operation was canceled.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

facebook-github-bot · 2025-09-11T17:40:42Z

This pull request was exported from Phabricator. Differential Revision: D82234180

github-actions · 2025-09-11T17:41:20Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

…from staging buffer" ## Context During export, Vulkan sometimes converts certain tensor dtypes. The most common case of this is that int64 and float64 are internally represented as int32 and float32 tensors. The primary reason for this is to reduce the number of dtype variants that need to be generated for each shader, and also due to the fact that 64-bit types are not guaranteed to be supported. However, this raises an issue if an int64 or float64 tensor is marked as an input/output tensor of the model. The source/destination ETensor will have a different dtype than the internal representation, meaning that the input/output bytes will be interpreted incorrectly. ## Changes This diff fixes this behaviour by introducing the concept of a "staging dtype". This allows the staging buffer of a tensor to have a different dtype than the underlying GPU buffer or texture. When copying to/from the GPU resource, the dtype can then be converted to the correct dtype expected by the client code. As a bonus, also add an optional setting to force fp16 to be used internally for fp32 tensors. This allows models to access half precision inference without needing to incur the cost of dtype conversion ops being inserted into the graph, or needing to manually convert inputs/outputs to half type. Differential Revision: [D82234180](https://our.internmc.facebook.com/intern/diff/D82234180/) [ghstack-poisoned]

facebook-github-bot · 2025-09-11T18:13:09Z

This pull request was exported from Phabricator. Differential Revision: D82234180

…from staging buffer" ## Context During export, Vulkan sometimes converts certain tensor dtypes. The most common case of this is that int64 and float64 are internally represented as int32 and float32 tensors. The primary reason for this is to reduce the number of dtype variants that need to be generated for each shader, and also due to the fact that 64-bit types are not guaranteed to be supported. However, this raises an issue if an int64 or float64 tensor is marked as an input/output tensor of the model. The source/destination ETensor will have a different dtype than the internal representation, meaning that the input/output bytes will be interpreted incorrectly. ## Changes This diff fixes this behaviour by introducing the concept of a "staging dtype". This allows the staging buffer of a tensor to have a different dtype than the underlying GPU buffer or texture. When copying to/from the GPU resource, the dtype can then be converted to the correct dtype expected by the client code. As a bonus, also add an optional setting to force fp16 to be used internally for fp32 tensors. This allows models to access half precision inference without needing to incur the cost of dtype conversion ops being inserted into the graph, or needing to manually convert inputs/outputs to half type. Differential Revision: [D82234180](https://our.internmc.facebook.com/intern/diff/D82234180/) [ghstack-poisoned]

facebook-github-bot · 2025-09-11T20:20:03Z

This pull request was exported from Phabricator. Differential Revision: D82234180

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 11, 2025

SS-JIA mentioned this pull request Sep 11, 2025

[ET-VK] Add 'half' variants to some Llama operators + enable llama vulkan export with force_fp16 flag #14223

Merged

facebook-github-bot added the fb-exported label Sep 11, 2025

manuelcandales approved these changes Sep 11, 2025

View reviewed changes

facebook-github-bot merged commit 2ca7304 into gh/SS-JIA/327/base Sep 12, 2025
115 of 125 checks passed

facebook-github-bot deleted the gh/SS-JIA/327/head branch September 12, 2025 02:42

facebook-github-bot temporarily deployed to cherry-pick-bot September 12, 2025 02:42 — with GitHub Actions Inactive

pytorchbot mentioned this pull request Sep 12, 2025

[ET-VK] Enable automatic dtype conversion when copying to/from staging buffer #14244

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ET-VK] Enable automatic dtype conversion when copying to/from staging buffer #14222

[ET-VK] Enable automatic dtype conversion when copying to/from staging buffer #14222

Uh oh!

SS-JIA commented Sep 11, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Sep 11, 2025 •

edited

Loading

Uh oh!

facebook-github-bot commented Sep 11, 2025

Uh oh!

github-actions bot commented Sep 11, 2025

Uh oh!

facebook-github-bot commented Sep 11, 2025

Uh oh!

facebook-github-bot commented Sep 11, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[ET-VK] Enable automatic dtype conversion when copying to/from staging buffer #14222

[ET-VK] Enable automatic dtype conversion when copying to/from staging buffer #14222

Uh oh!

Conversation

SS-JIA commented Sep 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Context

Changes

Uh oh!

pytorch-bot bot commented Sep 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/14222

❌ 1 New Failure, 3 Pending, 6 Unrelated Failures

Uh oh!

facebook-github-bot commented Sep 11, 2025

Uh oh!

github-actions bot commented Sep 11, 2025

This PR needs a release notes: label

Uh oh!

facebook-github-bot commented Sep 11, 2025

Uh oh!

facebook-github-bot commented Sep 11, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

SS-JIA commented Sep 11, 2025 •

edited

Loading

pytorch-bot bot commented Sep 11, 2025 •

edited

Loading

This PR needs a `release notes:` label