Add initial backends/cadence/vision module scaffold with optimized softmax kernel (no iDMA) #12480

cad-rlc · 2025-07-15T07:10:24Z

Summary

Release notes: backends

Created backends/cadence/vision module scaffold to which we plan to add optimized kernels for Vision 130.
Added an optimized softmax kernel without iDMA to vision/third_party.

Test plan

Add -DEXECUTORCH_VISION_OPT=ON to enable vision optimized kernels while building the cadence cmake. Rest of the steps remain the same.

pytorch-bot · 2025-07-15T07:10:28Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/12480

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 26d8591 with merge base 5348ea9 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

facebook-github-bot · 2025-07-15T07:10:30Z

Hi @cad-rlc!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at [email protected]. Thanks!

github-actions · 2025-07-15T07:10:53Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

cad-rlc · 2025-07-15T07:19:26Z

@pytorchbot label "release notes: backends"

pytorch-bot · 2025-07-15T07:19:29Z

Didn't find following labels among repository labels: release notes: backends

mcremon-meta · 2025-07-15T22:19:10Z

backends/cadence/aot/functions_vision.yaml

+- func: cadence::quantized_relu.per_tensor_out(Tensor X, int X_zero_point, int out_zero_point, int out_multiplier, int out_shift, *, Tensor(a!) out) -> Tensor(a!)
+  kernels:
+    - arg_meta: null
+      kernel_name: cadence::impl::vision::quantized_relu_per_tensor_out


I think using

kernel_name: cadence::impl::reference::quantized_relu_per_tensor_out

should work (picked relu at random, applied to all other ops too)

facebook-github-bot · 2025-07-24T18:40:43Z

@zonglinpeng has imported this pull request. If you are a Meta employee, you can view this in D78911716.

…ftmax kernel (no iDMA) (pytorch#12480) Summary: #### Release notes: backends 1. Created backends/cadence/vision module scaffold to which we plan to add optimized kernels for Vision 130. 2. Added an optimized softmax kernel without iDMA to vision/third_party. Pull Request resolved: pytorch#12480 Test Plan: Add -DEXECUTORCH_VISION_OPT=ON to enable vision optimized kernels while building the cadence cmake. Rest of the steps remain the same. Differential Revision: D78911716 Pulled By: zonglinpeng

zonglinpeng

This PR is good but requires some additional coding to fix bugs, improve on infra, and make it compatible with meta internal build systems. Please address the issues in this PR:

apply changes on top of #12864 to avoid merge conflicts.
convert the namespace to MV130
include the missing API library(s)

Please also make sure to move the third-party nnlib to a independent GitHub repo per suggested in the comment. This can be done in a follow up PR

zonglinpeng · 2025-07-25T21:22:25Z

backends/cadence/vision/third-party/library/api/vsoftmaxf.c

+         * components.     */
+        /* Q54 <- Q24*Q30 */
+        w = IVP_MULN_2X32(xin_i, invln2_Q30);
+        exp = IVP_PACKVNRN_2X64W(w, 54);


IVP_PACKVNRN_2X64W is not included in this change

Hi,

IVP_PACKVNRN_2X64W is an ISA instruction on Cadence Vision 130. It should be available in xt_ivpn.h which has been included in the common.h header file. Could you please elaborate on what issues were observed at your end?

Thanks

@cad-rlc We use buck internally and the xt linker is picking up the IVP_PACKVNRN_2X64W as vsoftmaxf.c:132: undefined reference to IVP_PACKVNRN_2X64W'`. I wonder

If anything from xtensa/tie should be picked up, as <xtensa/tie/xt_ivpn.h> is one of them.

Should any xtensa/tie be included in this PR, if yes to 1.

Any compiler flags I need to feed specifically for your vision kernel, if not to 1, such as disable COMPILER_XTENSA flag

zonglinpeng · 2025-07-25T21:27:42Z

backends/cadence/vision/operators/op_softmax.cpp

+namespace cadence {
+namespace impl {
+namespace vision {
+namespace native {


Please make all the namespaces to be

namespace cadence { namespace impl { namespace MV130 { namespace native {

zonglinpeng · 2025-07-25T21:28:08Z

backends/cadence/vision/operators/op_full.cpp

+#include <executorch/kernels/portable/cpu/scalar_utils.h>
+#include <executorch/runtime/kernel/kernel_includes.h>
+
+namespace torch {


Please use the consistent namespace

namespace cadence { namespace impl { namespace MV130 { namespace native {

zonglinpeng · 2025-07-25T21:30:18Z

backends/cadence/vision/third-party/include/api.h

@@ -0,0 +1,83 @@
+/* ------------------------------------------------------------------------ */


It's not scalable and maintainable when the third-party has subdirectories, which make it suitable to be a standalone GitHub repo like HiFi (https://github.com/foss-xtensa/nnlib-hifi4) and G3 (https://github.com/foss-xtensa/nnlib-FusionG3/). Please make the nnlib a repo before we adding much more kernels/APIs

facebook-github-bot · 2025-08-04T21:01:17Z

@zonglinpeng has imported this pull request. If you are a Meta employee, you can view this in D78911716.

cad-rlc · 2025-08-21T09:50:06Z

@zonglinpeng @mcremon-meta Latest changes are now available. Please review.

mcremon-meta

overall comment is that we shouldn't be duplicating that many files IMO, virtually all of this is not vision specific and we should be able to import/compile/link from the reference/portable ops. Let's talk about that in our sync

mcremon-meta · 2025-08-21T14:43:22Z

backends/cadence/vision/kernels/kernels.cpp

+  constexpr float min_val = std::numeric_limits<T>::min();
+  constexpr float max_val = std::numeric_limits<T>::max();
+  float tmp = roundf(x * scale + zero_point);
+  return std::max(std::min(tmp, max_val), min_val);


if the kernels don't use anything specific to vision chips, we don't need that file at all and we can just import from the reference kernels

mcremon-meta · 2025-08-21T14:45:11Z

backends/cadence/vision/operators/op_softmax.cpp

+
+namespace cadence {
+namespace impl {
+namespace vision {


is there a compatibility between V110 and V130 that we can use? If yes, we can use a vision namespace for ops that will be compatible with both. If not, let's make them V130 since it's probably the accurate definition

cad-rlc · 2025-09-03T06:58:08Z

overall comment is that we shouldn't be duplicating that many files IMO, virtually all of this is not vision specific and we should be able to import/compile/link from the reference/portable ops. Let's talk about that in our sync

Hi @mcremon-meta,

We understand that duplicating files may not the most clean way of setting up the vision directory but we chose to proceed with said approach to allow us make incremental changes to the now existing files, rather than having to add new files for optimized/vision specific implementations. This provides us with two benifits:
1) Allows us to use the reference code as fall back within the vision namespace, even when the optimized versions have been integrated.
2) Prevents unexpected failures and crashes, especially while building and running the quantized models.
We hope that this is okay with you.

Thanks

facebook-github-bot · 2025-09-04T15:35:46Z

@zonglinpeng has imported this pull request. If you are a Meta employee, you can view this in D78911716.

facebook-github-bot · 2025-09-04T15:46:07Z

@zonglinpeng has imported this pull request. If you are a Meta employee, you can view this in D78911716.

facebook-github-bot · 2025-09-17T21:03:08Z

@zonglinpeng has imported this pull request. If you are a Meta employee, you can view this in D78911716.

facebook-github-bot · 2025-09-17T21:07:48Z

@zonglinpeng has imported this pull request. If you are a Meta employee, you can view this in D78911716.

…ftmax kernel (no iDMA), fix new op dependencies,, update namespace (pytorch#12480) Summary: #### Release notes: backends 1. Created backends/cadence/vision module scaffold to which we plan to add optimized kernels for Vision 130. 2. Added an optimized softmax kernel without iDMA to vision/third_party. Pull Request resolved: pytorch#12480 Test Plan: Add -DEXECUTORCH_VISION_OPT=ON to enable vision optimized kernels while building the cadence cmake. Rest of the steps remain the same. Rollback Plan: Differential Revision: D82685201 Pulled By: zonglinpeng

…ftmax kernel (no iDMA), fix new op dependencies, update namespaces, fix compatibility build errors (pytorch#14398) Summary: Pull Request resolved: pytorch#14398 #### Release notes: backends 1. Created backends/cadence/vision module scaffold to which we plan to add optimized kernels for Vision 130. 2. Added an optimized softmax kernel without iDMA to vision/third_party. Pull Request resolved: pytorch#12480 Test Plan: Add -DEXECUTORCH_VISION_OPT=ON to enable vision optimized kernels while building the cadence cmake. Rest of the steps remain the same. Rollback Plan: Differential Revision: D82685201 Pulled By: zonglinpeng

…ftmax kernel (no iDMA), fix new op dependencies, update namespaces, fix compatibility build errors (pytorch#14398) Summary: Pull Request resolved: pytorch#14398 #### Release notes: backends 1. Created backends/cadence/vision module scaffold to which we plan to add optimized kernels for Vision 130. 2. Added an optimized softmax kernel without iDMA to vision/third_party. Pull Request resolved: pytorch#12480 Test Plan: Add -DEXECUTORCH_VISION_OPT=ON to enable vision optimized kernels while building the cadence cmake. Rest of the steps remain the same. buck test -j 4 'fbcode//mode/opt' fbcode//on_device_ai/Assistant/Jarvis/nightly:test_mv130_nightly -- test_aten__softmax_out Buck UI: https://www.internalfb.com/buck2/367cba5f-fffe-4b96-a4b0-cc9383b3d595 Test UI: https://www.internalfb.com/intern/testinfra/testrun/9007199367513270 Differential Revision: D82685201 Pulled By: zonglinpeng

…ftmax kernel (no iDMA), fix new op dependencies, update namespaces, fix compatibility build errors (pytorch#14398) Summary: Pull Request resolved: pytorch#14398 #### Release notes: backends 1. Created backends/cadence/vision module scaffold to which we plan to add optimized kernels for Vision 130. 2. Added an optimized softmax kernel without iDMA to vision/third_party. Pull Request resolved: pytorch#12480 Test Plan: Add -DEXECUTORCH_VISION_OPT=ON to enable vision optimized kernels while building the cadence cmake. Rest of the steps remain the same. buck test -j 4 'fbcode//mode/opt' fbcode//on_device_ai/Assistant/Jarvis/nightly:test_mv130_nightly -- test_aten__softmax_out Buck UI: https://www.internalfb.com/buck2/367cba5f-fffe-4b96-a4b0-cc9383b3d595 Test UI: https://www.internalfb.com/intern/testinfra/testrun/9007199367513270 Reviewed By: hsharma35 Differential Revision: D82685201 Pulled By: zonglinpeng

…ftmax kernel (no iDMA), fix new op dependencies,, update namespace (#12480) Differential Revision: D82685201 Pull Request resolved: #14398

…ftmax kernel (no iDMA), fix new op dependencies,, update namespace (pytorch#12480) Differential Revision: D82685201 Pull Request resolved: pytorch#14398

zonglinpeng · 2025-09-24T18:13:54Z

merged in #14398

cad-rlc requested review from kirklandsign and larryliu0820 as code owners July 15, 2025 07:10

mcremon-meta reviewed Jul 15, 2025

View reviewed changes

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 16, 2025

zonglinpeng reviewed Jul 25, 2025

View reviewed changes

cad-rlc force-pushed the integrating-vision-kernels branch 4 times, most recently from 5b3b5f6 to ae50637 Compare August 19, 2025 06:51

cad-rlc force-pushed the integrating-vision-kernels branch 2 times, most recently from 589baa3 to 925e988 Compare August 19, 2025 13:24

cad-rlc requested review from manuelcandales and swolchok as code owners August 19, 2025 13:24

cad-rlc force-pushed the integrating-vision-kernels branch 5 times, most recently from bc2417f to 9716a40 Compare August 21, 2025 09:31

mcremon-meta reviewed Aug 21, 2025

View reviewed changes

cad-rlc force-pushed the integrating-vision-kernels branch from 9716a40 to 92ca856 Compare August 28, 2025 09:30

cad-rlc force-pushed the integrating-vision-kernels branch 3 times, most recently from 9849c92 to b859639 Compare September 12, 2025 07:30

cad-rlc force-pushed the integrating-vision-kernels branch 2 times, most recently from 26f901d to 13de990 Compare September 16, 2025 08:56

mcremon-meta approved these changes Sep 18, 2025

View reviewed changes

cad-rlc closed this Sep 22, 2025

cad-rlc force-pushed the integrating-vision-kernels branch from 26d8591 to 5ae40f1 Compare September 22, 2025 11:29

facebook-github-bot temporarily deployed to update-viable-strict September 22, 2025 11:47 — with GitHub Actions Inactive

facebook-github-bot temporarily deployed to update-viable-strict September 22, 2025 12:24 — with GitHub Actions Inactive

facebook-github-bot temporarily deployed to update-viable-strict September 22, 2025 12:59 — with GitHub Actions Inactive

huydhn temporarily deployed to upload-benchmark-results September 22, 2025 13:11 — with GitHub Actions Inactive

huydhn temporarily deployed to upload-benchmark-results September 22, 2025 14:14 — with GitHub Actions Inactive

		@@ -0,0 +1,83 @@
		/* ------------------------------------------------------------------------ */

Add initial backends/cadence/vision module scaffold with optimized softmax kernel (no iDMA) #12480

Add initial backends/cadence/vision module scaffold with optimized softmax kernel (no iDMA) #12480

Uh oh!

Conversation

cad-rlc commented Jul 15, 2025

Summary

Release notes: backends

Test plan

Uh oh!

pytorch-bot bot commented Jul 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/12480

✅ No Failures

Uh oh!

facebook-github-bot commented Jul 15, 2025

Action Required

Process

Uh oh!

github-actions bot commented Jul 15, 2025

This PR needs a release notes: label

Uh oh!

cad-rlc commented Jul 15, 2025

Uh oh!

pytorch-bot bot commented Jul 15, 2025

Uh oh!

mcremon-meta Jul 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Jul 24, 2025

Uh oh!

zonglinpeng left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Aug 4, 2025

Uh oh!

cad-rlc commented Aug 21, 2025

Uh oh!

mcremon-meta left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cad-rlc commented Sep 3, 2025

Uh oh!

facebook-github-bot commented Sep 4, 2025

Uh oh!

facebook-github-bot commented Sep 4, 2025

Uh oh!

facebook-github-bot commented Sep 17, 2025

Uh oh!

facebook-github-bot commented Sep 17, 2025

Uh oh!

zonglinpeng commented Sep 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

pytorch-bot bot commented Jul 15, 2025 •

edited

Loading

This PR needs a `release notes:` label

mcremon-meta Jul 15, 2025 •

edited

Loading

zonglinpeng left a comment •

edited

Loading