Skip to content

Conversation

@cad-rlc
Copy link

@cad-rlc cad-rlc commented Jul 15, 2025

Summary

Release notes: backends

  1. Created backends/cadence/vision module scaffold to which we plan to add optimized kernels for Vision 130.
  2. Added an optimized softmax kernel without iDMA to vision/third_party.

Test plan

Add -DEXECUTORCH_VISION_OPT=ON to enable vision optimized kernels while building the cadence cmake. Rest of the steps remain the same.

@pytorch-bot
Copy link

pytorch-bot bot commented Jul 15, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/12480

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 26d8591 with merge base 5348ea9 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot
Copy link
Contributor

Hi @cad-rlc!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at [email protected]. Thanks!

@github-actions
Copy link

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

@cad-rlc
Copy link
Author

cad-rlc commented Jul 15, 2025

@pytorchbot label "release notes: backends"

@pytorch-bot
Copy link

pytorch-bot bot commented Jul 15, 2025

Didn't find following labels among repository labels: release notes: backends

- func: cadence::quantized_relu.per_tensor_out(Tensor X, int X_zero_point, int out_zero_point, int out_multiplier, int out_shift, *, Tensor(a!) out) -> Tensor(a!)
kernels:
- arg_meta: null
kernel_name: cadence::impl::vision::quantized_relu_per_tensor_out
Copy link
Contributor

@mcremon-meta mcremon-meta Jul 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think using

kernel_name: cadence::impl::reference::quantized_relu_per_tensor_out

should work (picked relu at random, applied to all other ops too)

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 16, 2025
@facebook-github-bot
Copy link
Contributor

@zonglinpeng has imported this pull request. If you are a Meta employee, you can view this in D78911716.

zonglinpeng pushed a commit to zonglinpeng/executorch that referenced this pull request Jul 25, 2025
…ftmax kernel (no iDMA) (pytorch#12480)

Summary:
#### Release notes: backends
1. Created backends/cadence/vision module scaffold to which we plan to add optimized kernels for Vision 130.
2. Added an optimized softmax kernel without iDMA to vision/third_party.

Pull Request resolved: pytorch#12480

Test Plan: Add -DEXECUTORCH_VISION_OPT=ON to enable vision optimized kernels while building the cadence cmake. Rest of the steps remain the same.

Differential Revision: D78911716

Pulled By: zonglinpeng
Copy link
Contributor

@zonglinpeng zonglinpeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is good but requires some additional coding to fix bugs, improve on infra, and make it compatible with meta internal build systems. Please address the issues in this PR:

  1. apply changes on top of #12864 to avoid merge conflicts.
  2. convert the namespace to MV130
  3. include the missing API library(s)

Please also make sure to move the third-party nnlib to a independent GitHub repo per suggested in the comment. This can be done in a follow up PR

* components. */
/* Q54 <- Q24*Q30 */
w = IVP_MULN_2X32(xin_i, invln2_Q30);
exp = IVP_PACKVNRN_2X64W(w, 54);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IVP_PACKVNRN_2X64W is not included in this change

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi,

IVP_PACKVNRN_2X64W is an ISA instruction on Cadence Vision 130. It should be available in xt_ivpn.h which has been included in the common.h header file. Could you please elaborate on what issues were observed at your end?

Thanks

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cad-rlc We use buck internally and the xt linker is picking up the IVP_PACKVNRN_2X64W as vsoftmaxf.c:132: undefined reference to IVP_PACKVNRN_2X64W'`. I wonder

  1. If anything from xtensa/tie should be picked up, as <xtensa/tie/xt_ivpn.h> is one of them.
  2. Should any xtensa/tie be included in this PR, if yes to 1.
  3. Any compiler flags I need to feed specifically for your vision kernel, if not to 1, such as disable COMPILER_XTENSA flag

namespace cadence {
namespace impl {
namespace vision {
namespace native {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make all the namespaces to be

namespace cadence {
namespace impl {
namespace MV130 {
namespace native {

#include <executorch/kernels/portable/cpu/scalar_utils.h>
#include <executorch/runtime/kernel/kernel_includes.h>

namespace torch {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use the consistent namespace

namespace cadence {
namespace impl {
namespace MV130 {
namespace native {

@@ -0,0 +1,83 @@
/* ------------------------------------------------------------------------ */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not scalable and maintainable when the third-party has subdirectories, which make it suitable to be a standalone GitHub repo like HiFi (https://github.com/foss-xtensa/nnlib-hifi4) and G3 (https://github.com/foss-xtensa/nnlib-FusionG3/). Please make the nnlib a repo before we adding much more kernels/APIs

@facebook-github-bot
Copy link
Contributor

@zonglinpeng has imported this pull request. If you are a Meta employee, you can view this in D78911716.

@cad-rlc cad-rlc force-pushed the integrating-vision-kernels branch 4 times, most recently from 5b3b5f6 to ae50637 Compare August 19, 2025 06:51
@cad-rlc cad-rlc force-pushed the integrating-vision-kernels branch 2 times, most recently from 589baa3 to 925e988 Compare August 19, 2025 13:24
@cad-rlc cad-rlc force-pushed the integrating-vision-kernels branch 5 times, most recently from bc2417f to 9716a40 Compare August 21, 2025 09:31
@cad-rlc
Copy link
Author

cad-rlc commented Aug 21, 2025

@zonglinpeng @mcremon-meta Latest changes are now available. Please review.

Copy link
Contributor

@mcremon-meta mcremon-meta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall comment is that we shouldn't be duplicating that many files IMO, virtually all of this is not vision specific and we should be able to import/compile/link from the reference/portable ops. Let's talk about that in our sync

constexpr float min_val = std::numeric_limits<T>::min();
constexpr float max_val = std::numeric_limits<T>::max();
float tmp = roundf(x * scale + zero_point);
return std::max(std::min(tmp, max_val), min_val);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the kernels don't use anything specific to vision chips, we don't need that file at all and we can just import from the reference kernels


namespace cadence {
namespace impl {
namespace vision {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a compatibility between V110 and V130 that we can use? If yes, we can use a vision namespace for ops that will be compatible with both. If not, let's make them V130 since it's probably the accurate definition

@cad-rlc cad-rlc force-pushed the integrating-vision-kernels branch from 9716a40 to 92ca856 Compare August 28, 2025 09:30
@cad-rlc
Copy link
Author

cad-rlc commented Sep 3, 2025

overall comment is that we shouldn't be duplicating that many files IMO, virtually all of this is not vision specific and we should be able to import/compile/link from the reference/portable ops. Let's talk about that in our sync

Hi @mcremon-meta,

We understand that duplicating files may not the most clean way of setting up the vision directory but we chose to proceed with said approach to allow us make incremental changes to the now existing files, rather than having to add new files for optimized/vision specific implementations. This provides us with two benifits:
1) Allows us to use the reference code as fall back within the vision namespace, even when the optimized versions have been integrated.
2) Prevents unexpected failures and crashes, especially while building and running the quantized models.
We hope that this is okay with you.

Thanks

@facebook-github-bot
Copy link
Contributor

@zonglinpeng has imported this pull request. If you are a Meta employee, you can view this in D78911716.

@facebook-github-bot
Copy link
Contributor

@zonglinpeng has imported this pull request. If you are a Meta employee, you can view this in D78911716.

@cad-rlc cad-rlc force-pushed the integrating-vision-kernels branch 3 times, most recently from 9849c92 to b859639 Compare September 12, 2025 07:30
@cad-rlc cad-rlc force-pushed the integrating-vision-kernels branch 2 times, most recently from 26f901d to 13de990 Compare September 16, 2025 08:56
@facebook-github-bot
Copy link
Contributor

@zonglinpeng has imported this pull request. If you are a Meta employee, you can view this in D78911716.

@facebook-github-bot
Copy link
Contributor

@zonglinpeng has imported this pull request. If you are a Meta employee, you can view this in D78911716.

zonglinpeng pushed a commit to zonglinpeng/executorch that referenced this pull request Sep 18, 2025
…ftmax kernel (no iDMA), fix new op dependencies,, update namespace (pytorch#12480)

Summary:
#### Release notes: backends
1. Created backends/cadence/vision module scaffold to which we plan to add optimized kernels for Vision 130.
2. Added an optimized softmax kernel without iDMA to vision/third_party.

Pull Request resolved: pytorch#12480

Test Plan:
Add -DEXECUTORCH_VISION_OPT=ON to enable vision optimized kernels while building the cadence cmake. Rest of the steps remain the same.

Rollback Plan:

Differential Revision: D82685201

Pulled By: zonglinpeng
zonglinpeng pushed a commit to zonglinpeng/executorch that referenced this pull request Sep 18, 2025
…ftmax kernel (no iDMA), fix new op dependencies, update namespaces, fix compatibility build errors (pytorch#14398)

Summary:
Pull Request resolved: pytorch#14398

#### Release notes: backends
1. Created backends/cadence/vision module scaffold to which we plan to add optimized kernels for Vision 130.
2. Added an optimized softmax kernel without iDMA to vision/third_party.

Pull Request resolved: pytorch#12480

Test Plan:
Add -DEXECUTORCH_VISION_OPT=ON to enable vision optimized kernels while building the cadence cmake. Rest of the steps remain the same.

Rollback Plan:

Differential Revision: D82685201

Pulled By: zonglinpeng
zonglinpeng pushed a commit to zonglinpeng/executorch that referenced this pull request Sep 18, 2025
…ftmax kernel (no iDMA), fix new op dependencies, update namespaces, fix compatibility build errors (pytorch#14398)

Summary:
Pull Request resolved: pytorch#14398

#### Release notes: backends
1. Created backends/cadence/vision module scaffold to which we plan to add optimized kernels for Vision 130.
2. Added an optimized softmax kernel without iDMA to vision/third_party.

Pull Request resolved: pytorch#12480

Test Plan:
Add -DEXECUTORCH_VISION_OPT=ON to enable vision optimized kernels while building the cadence cmake. Rest of the steps remain the same.

Rollback Plan:

Differential Revision: D82685201

Pulled By: zonglinpeng
zonglinpeng pushed a commit to zonglinpeng/executorch that referenced this pull request Sep 18, 2025
…ftmax kernel (no iDMA), fix new op dependencies, update namespaces, fix compatibility build errors (pytorch#14398)

Summary:
Pull Request resolved: pytorch#14398

#### Release notes: backends
1. Created backends/cadence/vision module scaffold to which we plan to add optimized kernels for Vision 130.
2. Added an optimized softmax kernel without iDMA to vision/third_party.

Pull Request resolved: pytorch#12480

Test Plan:
Add -DEXECUTORCH_VISION_OPT=ON to enable vision optimized kernels while building the cadence cmake. Rest of the steps remain the same.

buck test -j 4 'fbcode//mode/opt' fbcode//on_device_ai/Assistant/Jarvis/nightly:test_mv130_nightly -- test_aten__softmax_out
Buck UI: https://www.internalfb.com/buck2/367cba5f-fffe-4b96-a4b0-cc9383b3d595
Test UI: https://www.internalfb.com/intern/testinfra/testrun/9007199367513270

Differential Revision: D82685201

Pulled By: zonglinpeng
zonglinpeng pushed a commit to zonglinpeng/executorch that referenced this pull request Sep 18, 2025
…ftmax kernel (no iDMA), fix new op dependencies, update namespaces, fix compatibility build errors (pytorch#14398)

Summary:
Pull Request resolved: pytorch#14398

#### Release notes: backends
1. Created backends/cadence/vision module scaffold to which we plan to add optimized kernels for Vision 130.
2. Added an optimized softmax kernel without iDMA to vision/third_party.

Pull Request resolved: pytorch#12480

Test Plan:
Add -DEXECUTORCH_VISION_OPT=ON to enable vision optimized kernels while building the cadence cmake. Rest of the steps remain the same.

buck test -j 4 'fbcode//mode/opt' fbcode//on_device_ai/Assistant/Jarvis/nightly:test_mv130_nightly -- test_aten__softmax_out
Buck UI: https://www.internalfb.com/buck2/367cba5f-fffe-4b96-a4b0-cc9383b3d595
Test UI: https://www.internalfb.com/intern/testinfra/testrun/9007199367513270

Differential Revision: D82685201

Pulled By: zonglinpeng
zonglinpeng pushed a commit to zonglinpeng/executorch that referenced this pull request Sep 19, 2025
…ftmax kernel (no iDMA), fix new op dependencies, update namespaces, fix compatibility build errors (pytorch#14398)

Summary:
Pull Request resolved: pytorch#14398

#### Release notes: backends
1. Created backends/cadence/vision module scaffold to which we plan to add optimized kernels for Vision 130.
2. Added an optimized softmax kernel without iDMA to vision/third_party.

Pull Request resolved: pytorch#12480

Test Plan:
Add -DEXECUTORCH_VISION_OPT=ON to enable vision optimized kernels while building the cadence cmake. Rest of the steps remain the same.

buck test -j 4 'fbcode//mode/opt' fbcode//on_device_ai/Assistant/Jarvis/nightly:test_mv130_nightly -- test_aten__softmax_out
Buck UI: https://www.internalfb.com/buck2/367cba5f-fffe-4b96-a4b0-cc9383b3d595
Test UI: https://www.internalfb.com/intern/testinfra/testrun/9007199367513270

Reviewed By: hsharma35

Differential Revision: D82685201

Pulled By: zonglinpeng
zonglinpeng pushed a commit to zonglinpeng/executorch that referenced this pull request Sep 20, 2025
…ftmax kernel (no iDMA), fix new op dependencies, update namespaces, fix compatibility build errors (pytorch#14398)

Summary:
Pull Request resolved: pytorch#14398

#### Release notes: backends
1. Created backends/cadence/vision module scaffold to which we plan to add optimized kernels for Vision 130.
2. Added an optimized softmax kernel without iDMA to vision/third_party.

Pull Request resolved: pytorch#12480

Test Plan:
Add -DEXECUTORCH_VISION_OPT=ON to enable vision optimized kernels while building the cadence cmake. Rest of the steps remain the same.

buck test -j 4 'fbcode//mode/opt' fbcode//on_device_ai/Assistant/Jarvis/nightly:test_mv130_nightly -- test_aten__softmax_out
Buck UI: https://www.internalfb.com/buck2/367cba5f-fffe-4b96-a4b0-cc9383b3d595
Test UI: https://www.internalfb.com/intern/testinfra/testrun/9007199367513270

Reviewed By: hsharma35

Differential Revision: D82685201

Pulled By: zonglinpeng
zonglinpeng pushed a commit to zonglinpeng/executorch that referenced this pull request Sep 20, 2025
…ftmax kernel (no iDMA), fix new op dependencies, update namespaces, fix compatibility build errors (pytorch#14398)

Summary:
Pull Request resolved: pytorch#14398

#### Release notes: backends
1. Created backends/cadence/vision module scaffold to which we plan to add optimized kernels for Vision 130.
2. Added an optimized softmax kernel without iDMA to vision/third_party.

Pull Request resolved: pytorch#12480

Test Plan:
Add -DEXECUTORCH_VISION_OPT=ON to enable vision optimized kernels while building the cadence cmake. Rest of the steps remain the same.

buck test -j 4 'fbcode//mode/opt' fbcode//on_device_ai/Assistant/Jarvis/nightly:test_mv130_nightly -- test_aten__softmax_out
Buck UI: https://www.internalfb.com/buck2/367cba5f-fffe-4b96-a4b0-cc9383b3d595
Test UI: https://www.internalfb.com/intern/testinfra/testrun/9007199367513270

Reviewed By: hsharma35

Differential Revision: D82685201

Pulled By: zonglinpeng
facebook-github-bot pushed a commit that referenced this pull request Sep 20, 2025
…ftmax kernel (no iDMA), fix new op dependencies,, update namespace (#12480)

Differential Revision: D82685201

Pull Request resolved: #14398
@cad-rlc cad-rlc closed this Sep 22, 2025
@cad-rlc cad-rlc force-pushed the integrating-vision-kernels branch from 26d8591 to 5ae40f1 Compare September 22, 2025 11:29
@huydhn huydhn temporarily deployed to upload-benchmark-results September 22, 2025 13:11 — with GitHub Actions Inactive
@huydhn huydhn temporarily deployed to upload-benchmark-results September 22, 2025 14:14 — with GitHub Actions Inactive
StrycekSimon pushed a commit to nxp-upstream/executorch that referenced this pull request Sep 23, 2025
…ftmax kernel (no iDMA), fix new op dependencies,, update namespace (pytorch#12480)

Differential Revision: D82685201

Pull Request resolved: pytorch#14398
@zonglinpeng
Copy link
Contributor

merged in #14398

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants