Delete opt_mul_scalar_out #12145

swolchok · 2025-07-01T20:34:18Z

The handwritten optimized code is similar to what we should be getting from the optimized portable op, as follows.

handwritten optimized code:

if the input type matches the output type, then perform a vectorized loop
otherwise, generate specific mixed-dtype kernels, which aren't vectorized.

optimized portable op:

if the input type matches the output type, then perform a vectorized loop. (dtype_specialized_elementwise_fn_impl in elementwise_util.h)
otherwise, generate one specific kernel per compute type. those
kernels use non-inlined function calls to do loads and stores,
trading off performance for a significant size reduction. (apply_elementwise_fn_generic_impl in elementwise_util.h)

Both cases in the portable op variant also use parallel_for.

I attempted to do a performance test, but I found that
torch.mul(some_tensor, 2.0) is exported as a call to mul.Tensor,
not
mul.Scalar. 41e7ffa
added the ability to pass our tests if we do emit mul.Scalar for this,
but the follow-up diff to make that happen seems not to have
landed. So, I think another reason to delete this is that (if I
understand correctly) it's not used, therefore we don't have specific
knowledge that we need it to exist and we can't just use the optimized
portable op.

[ghstack-poisoned]

swolchok · 2025-07-01T20:34:19Z

Stack from ghstack (oldest at bottom):

-> Delete opt_mul_scalar_out #12145

pytorch-bot · 2025-07-01T20:34:22Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/12145

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (2 Unrelated Failures)

As of commit 80a49db with merge base 7e28a04 ():

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / test-models-linux (mobilebert, portable, linux.2xlarge) / linux-job (gh) (trunk failure)
##[error]The operation was canceled.
pull / test-models-linux (mobilebert, xnnpack-quantization-delegation, linux.2xlarge) / linux-job (gh) (trunk failure)
##[error]The operation was canceled.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

The handwritten optimized code is similar to what we should be getting from the optimized portable op, as follows. handwritten optimized code: - if the input type matches the output type, then perform a vectorized loop - otherwise, generate specific mixed-dtype kernels, which aren't vectorized. optimized portable op: - if the input type matches the output type, then perform a vectorized loop. (dtype_specialized_elementwise_fn_impl in elementwise_util.h) - otherwise, generate one specific kernel per compute type. those kernels use non-inlined function calls to do loads and stores, trading off performance for a significant size reduction. (apply_elementwise_fn_generic_impl in elementwise_util.h) Both cases in the portable op variant also use parallel_for. I attempted to do a performance test, but I found that `torch.mul(some_tensor, 2.0)` is exported as a call to mul.Tensor, *not* mul.Scalar. 41e7ffa added the ability to pass our tests if we do emit mul.Scalar for this, but the follow-up diff to make that happen seems not to have landed. So, I think another reason to delete this is that (if I understand correctly) it's not used, therefore we don't have specific knowledge that we need it to exist and we can't just use the optimized portable op. ghstack-source-id: 9a25f5a ghstack-comment-id: 3025417542 Pull-Request-resolved: #12145

github-actions · 2025-07-01T20:34:52Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

kimishpatel

I trust that the summary holds true. I think the question of mult-threading remains in that it is not always beneficial to multithread. Our portable implementation should account for that else for some case, I wont be surprised if I saw pref regression

The handwritten optimized code is similar to what we should be getting from the optimized portable op, as follows. handwritten optimized code: - if the input type matches the output type, then perform a vectorized loop - otherwise, generate specific mixed-dtype kernels, which aren't vectorized. optimized portable op: - if the input type matches the output type, then perform a vectorized loop. (dtype_specialized_elementwise_fn_impl in elementwise_util.h) - otherwise, generate one specific kernel per compute type. those kernels use non-inlined function calls to do loads and stores, trading off performance for a significant size reduction. (apply_elementwise_fn_generic_impl in elementwise_util.h) Both cases in the portable op variant also use parallel_for. I attempted to do a performance test, but I found that `torch.mul(some_tensor, 2.0)` is exported as a call to mul.Tensor, *not* mul.Scalar. pytorch@41e7ffa added the ability to pass our tests if we do emit mul.Scalar for this, but the follow-up diff to make that happen seems not to have landed. So, I think another reason to delete this is that (if I understand correctly) it's not used, therefore we don't have specific knowledge that we need it to exist and we can't just use the optimized portable op.

This triggered internal failures; kernels/optimized's tests don't build with Buck in OSS because they use ATen. ghstack-source-id: 27f599d ghstack-comment-id: 3054364234 Pull-Request: #12321

This triggered internal failures; kernels/optimized's tests don't build with Buck in OSS because they use ATen.

Update

80a49db

[ghstack-poisoned]

swolchok requested a review from manuelcandales as a code owner July 1, 2025 20:34

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 1, 2025

swolchok requested a review from kimishpatel July 3, 2025 20:22

kimishpatel approved these changes Jul 6, 2025

View reviewed changes

swolchok merged commit 2d095b8 into main Jul 8, 2025
94 of 97 checks passed

swolchok deleted the gh/swolchok/488/head branch July 8, 2025 21:14

swolchok added a commit that referenced this pull request Jul 9, 2025

Revert "Delete opt_mul_scalar_out (#12145)" (#12321)

6ac5df2

This triggered internal failures; kernels/optimized's tests don't build with Buck in OSS because they use ATen.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Delete opt_mul_scalar_out #12145

Delete opt_mul_scalar_out #12145

Uh oh!

swolchok commented Jul 1, 2025

Uh oh!

swolchok commented Jul 1, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Jul 1, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Jul 1, 2025

Uh oh!

kimishpatel left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Delete opt_mul_scalar_out #12145

Delete opt_mul_scalar_out #12145

Uh oh!

Conversation

swolchok commented Jul 1, 2025

Uh oh!

swolchok commented Jul 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Jul 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/12145

✅ You can merge normally! (2 Unrelated Failures)

Uh oh!

github-actions bot commented Jul 1, 2025

This PR needs a release notes: label

Uh oh!

kimishpatel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

swolchok commented Jul 1, 2025 •

edited

Loading

pytorch-bot bot commented Jul 1, 2025 •

edited

Loading

This PR needs a `release notes:` label