-
Notifications
You must be signed in to change notification settings - Fork 751
Add initial backends/cadence/vision module scaffold with optimized softmax kernel (no iDMA) #12480
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/12480
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 26d8591 with merge base 5348ea9 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
Hi @cad-rlc! Thank you for your pull request and welcome to our community. Action RequiredIn order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you. ProcessIn order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA. Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with If you have received this in error or have any questions, please contact us at [email protected]. Thanks! |
This PR needs a
|
|
@pytorchbot label "release notes: backends" |
|
Didn't find following labels among repository labels: release notes: backends |
| - func: cadence::quantized_relu.per_tensor_out(Tensor X, int X_zero_point, int out_zero_point, int out_multiplier, int out_shift, *, Tensor(a!) out) -> Tensor(a!) | ||
| kernels: | ||
| - arg_meta: null | ||
| kernel_name: cadence::impl::vision::quantized_relu_per_tensor_out |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think using
kernel_name: cadence::impl::reference::quantized_relu_per_tensor_out
should work (picked relu at random, applied to all other ops too)
|
@zonglinpeng has imported this pull request. If you are a Meta employee, you can view this in D78911716. |
…ftmax kernel (no iDMA) (pytorch#12480) Summary: #### Release notes: backends 1. Created backends/cadence/vision module scaffold to which we plan to add optimized kernels for Vision 130. 2. Added an optimized softmax kernel without iDMA to vision/third_party. Pull Request resolved: pytorch#12480 Test Plan: Add -DEXECUTORCH_VISION_OPT=ON to enable vision optimized kernels while building the cadence cmake. Rest of the steps remain the same. Differential Revision: D78911716 Pulled By: zonglinpeng
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is good but requires some additional coding to fix bugs, improve on infra, and make it compatible with meta internal build systems. Please address the issues in this PR:
- apply changes on top of #12864 to avoid merge conflicts.
- convert the namespace to MV130
- include the missing API library(s)
Please also make sure to move the third-party nnlib to a independent GitHub repo per suggested in the comment. This can be done in a follow up PR
| * components. */ | ||
| /* Q54 <- Q24*Q30 */ | ||
| w = IVP_MULN_2X32(xin_i, invln2_Q30); | ||
| exp = IVP_PACKVNRN_2X64W(w, 54); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IVP_PACKVNRN_2X64W is not included in this change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi,
IVP_PACKVNRN_2X64W is an ISA instruction on Cadence Vision 130. It should be available in xt_ivpn.h which has been included in the common.h header file. Could you please elaborate on what issues were observed at your end?
Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cad-rlc We use buck internally and the xt linker is picking up the IVP_PACKVNRN_2X64W as vsoftmaxf.c:132: undefined reference to IVP_PACKVNRN_2X64W'`. I wonder
- If anything from
xtensa/tieshould be picked up, as<xtensa/tie/xt_ivpn.h>is one of them. - Should any
xtensa/tiebe included in this PR, if yes to 1. - Any compiler flags I need to feed specifically for your vision kernel, if not to 1, such as disable
COMPILER_XTENSAflag
| namespace cadence { | ||
| namespace impl { | ||
| namespace vision { | ||
| namespace native { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please make all the namespaces to be
namespace cadence {
namespace impl {
namespace MV130 {
namespace native {
| #include <executorch/kernels/portable/cpu/scalar_utils.h> | ||
| #include <executorch/runtime/kernel/kernel_includes.h> | ||
|
|
||
| namespace torch { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use the consistent namespace
namespace cadence {
namespace impl {
namespace MV130 {
namespace native {
| @@ -0,0 +1,83 @@ | |||
| /* ------------------------------------------------------------------------ */ | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not scalable and maintainable when the third-party has subdirectories, which make it suitable to be a standalone GitHub repo like HiFi (https://github.com/foss-xtensa/nnlib-hifi4) and G3 (https://github.com/foss-xtensa/nnlib-FusionG3/). Please make the nnlib a repo before we adding much more kernels/APIs
|
@zonglinpeng has imported this pull request. If you are a Meta employee, you can view this in D78911716. |
5b3b5f6 to
ae50637
Compare
589baa3 to
925e988
Compare
bc2417f to
9716a40
Compare
|
@zonglinpeng @mcremon-meta Latest changes are now available. Please review. |
mcremon-meta
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall comment is that we shouldn't be duplicating that many files IMO, virtually all of this is not vision specific and we should be able to import/compile/link from the reference/portable ops. Let's talk about that in our sync
| constexpr float min_val = std::numeric_limits<T>::min(); | ||
| constexpr float max_val = std::numeric_limits<T>::max(); | ||
| float tmp = roundf(x * scale + zero_point); | ||
| return std::max(std::min(tmp, max_val), min_val); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the kernels don't use anything specific to vision chips, we don't need that file at all and we can just import from the reference kernels
|
|
||
| namespace cadence { | ||
| namespace impl { | ||
| namespace vision { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a compatibility between V110 and V130 that we can use? If yes, we can use a vision namespace for ops that will be compatible with both. If not, let's make them V130 since it's probably the accurate definition
9716a40 to
92ca856
Compare
Hi @mcremon-meta, We understand that duplicating files may not the most clean way of setting up the vision directory but we chose to proceed with said approach to allow us make incremental changes to the now existing files, rather than having to add new files for optimized/vision specific implementations. This provides us with two benifits: Thanks |
|
@zonglinpeng has imported this pull request. If you are a Meta employee, you can view this in D78911716. |
|
@zonglinpeng has imported this pull request. If you are a Meta employee, you can view this in D78911716. |
9849c92 to
b859639
Compare
26f901d to
13de990
Compare
|
@zonglinpeng has imported this pull request. If you are a Meta employee, you can view this in D78911716. |
|
@zonglinpeng has imported this pull request. If you are a Meta employee, you can view this in D78911716. |
…ftmax kernel (no iDMA), fix new op dependencies,, update namespace (pytorch#12480) Summary: #### Release notes: backends 1. Created backends/cadence/vision module scaffold to which we plan to add optimized kernels for Vision 130. 2. Added an optimized softmax kernel without iDMA to vision/third_party. Pull Request resolved: pytorch#12480 Test Plan: Add -DEXECUTORCH_VISION_OPT=ON to enable vision optimized kernels while building the cadence cmake. Rest of the steps remain the same. Rollback Plan: Differential Revision: D82685201 Pulled By: zonglinpeng
…ftmax kernel (no iDMA), fix new op dependencies, update namespaces, fix compatibility build errors (pytorch#14398) Summary: Pull Request resolved: pytorch#14398 #### Release notes: backends 1. Created backends/cadence/vision module scaffold to which we plan to add optimized kernels for Vision 130. 2. Added an optimized softmax kernel without iDMA to vision/third_party. Pull Request resolved: pytorch#12480 Test Plan: Add -DEXECUTORCH_VISION_OPT=ON to enable vision optimized kernels while building the cadence cmake. Rest of the steps remain the same. Rollback Plan: Differential Revision: D82685201 Pulled By: zonglinpeng
…ftmax kernel (no iDMA), fix new op dependencies, update namespaces, fix compatibility build errors (pytorch#14398) Summary: Pull Request resolved: pytorch#14398 #### Release notes: backends 1. Created backends/cadence/vision module scaffold to which we plan to add optimized kernels for Vision 130. 2. Added an optimized softmax kernel without iDMA to vision/third_party. Pull Request resolved: pytorch#12480 Test Plan: Add -DEXECUTORCH_VISION_OPT=ON to enable vision optimized kernels while building the cadence cmake. Rest of the steps remain the same. Rollback Plan: Differential Revision: D82685201 Pulled By: zonglinpeng
…ftmax kernel (no iDMA), fix new op dependencies, update namespaces, fix compatibility build errors (pytorch#14398) Summary: Pull Request resolved: pytorch#14398 #### Release notes: backends 1. Created backends/cadence/vision module scaffold to which we plan to add optimized kernels for Vision 130. 2. Added an optimized softmax kernel without iDMA to vision/third_party. Pull Request resolved: pytorch#12480 Test Plan: Add -DEXECUTORCH_VISION_OPT=ON to enable vision optimized kernels while building the cadence cmake. Rest of the steps remain the same. buck test -j 4 'fbcode//mode/opt' fbcode//on_device_ai/Assistant/Jarvis/nightly:test_mv130_nightly -- test_aten__softmax_out Buck UI: https://www.internalfb.com/buck2/367cba5f-fffe-4b96-a4b0-cc9383b3d595 Test UI: https://www.internalfb.com/intern/testinfra/testrun/9007199367513270 Differential Revision: D82685201 Pulled By: zonglinpeng
…ftmax kernel (no iDMA), fix new op dependencies, update namespaces, fix compatibility build errors (pytorch#14398) Summary: Pull Request resolved: pytorch#14398 #### Release notes: backends 1. Created backends/cadence/vision module scaffold to which we plan to add optimized kernels for Vision 130. 2. Added an optimized softmax kernel without iDMA to vision/third_party. Pull Request resolved: pytorch#12480 Test Plan: Add -DEXECUTORCH_VISION_OPT=ON to enable vision optimized kernels while building the cadence cmake. Rest of the steps remain the same. buck test -j 4 'fbcode//mode/opt' fbcode//on_device_ai/Assistant/Jarvis/nightly:test_mv130_nightly -- test_aten__softmax_out Buck UI: https://www.internalfb.com/buck2/367cba5f-fffe-4b96-a4b0-cc9383b3d595 Test UI: https://www.internalfb.com/intern/testinfra/testrun/9007199367513270 Differential Revision: D82685201 Pulled By: zonglinpeng
…ftmax kernel (no iDMA), fix new op dependencies, update namespaces, fix compatibility build errors (pytorch#14398) Summary: Pull Request resolved: pytorch#14398 #### Release notes: backends 1. Created backends/cadence/vision module scaffold to which we plan to add optimized kernels for Vision 130. 2. Added an optimized softmax kernel without iDMA to vision/third_party. Pull Request resolved: pytorch#12480 Test Plan: Add -DEXECUTORCH_VISION_OPT=ON to enable vision optimized kernels while building the cadence cmake. Rest of the steps remain the same. buck test -j 4 'fbcode//mode/opt' fbcode//on_device_ai/Assistant/Jarvis/nightly:test_mv130_nightly -- test_aten__softmax_out Buck UI: https://www.internalfb.com/buck2/367cba5f-fffe-4b96-a4b0-cc9383b3d595 Test UI: https://www.internalfb.com/intern/testinfra/testrun/9007199367513270 Reviewed By: hsharma35 Differential Revision: D82685201 Pulled By: zonglinpeng
…ftmax kernel (no iDMA), fix new op dependencies, update namespaces, fix compatibility build errors (pytorch#14398) Summary: Pull Request resolved: pytorch#14398 #### Release notes: backends 1. Created backends/cadence/vision module scaffold to which we plan to add optimized kernels for Vision 130. 2. Added an optimized softmax kernel without iDMA to vision/third_party. Pull Request resolved: pytorch#12480 Test Plan: Add -DEXECUTORCH_VISION_OPT=ON to enable vision optimized kernels while building the cadence cmake. Rest of the steps remain the same. buck test -j 4 'fbcode//mode/opt' fbcode//on_device_ai/Assistant/Jarvis/nightly:test_mv130_nightly -- test_aten__softmax_out Buck UI: https://www.internalfb.com/buck2/367cba5f-fffe-4b96-a4b0-cc9383b3d595 Test UI: https://www.internalfb.com/intern/testinfra/testrun/9007199367513270 Reviewed By: hsharma35 Differential Revision: D82685201 Pulled By: zonglinpeng
…ftmax kernel (no iDMA), fix new op dependencies, update namespaces, fix compatibility build errors (pytorch#14398) Summary: Pull Request resolved: pytorch#14398 #### Release notes: backends 1. Created backends/cadence/vision module scaffold to which we plan to add optimized kernels for Vision 130. 2. Added an optimized softmax kernel without iDMA to vision/third_party. Pull Request resolved: pytorch#12480 Test Plan: Add -DEXECUTORCH_VISION_OPT=ON to enable vision optimized kernels while building the cadence cmake. Rest of the steps remain the same. buck test -j 4 'fbcode//mode/opt' fbcode//on_device_ai/Assistant/Jarvis/nightly:test_mv130_nightly -- test_aten__softmax_out Buck UI: https://www.internalfb.com/buck2/367cba5f-fffe-4b96-a4b0-cc9383b3d595 Test UI: https://www.internalfb.com/intern/testinfra/testrun/9007199367513270 Reviewed By: hsharma35 Differential Revision: D82685201 Pulled By: zonglinpeng
26d8591 to
5ae40f1
Compare
…ftmax kernel (no iDMA), fix new op dependencies,, update namespace (pytorch#12480) Differential Revision: D82685201 Pull Request resolved: pytorch#14398
|
merged in #14398 |
Summary
Release notes: backends
Test plan
Add -DEXECUTORCH_VISION_OPT=ON to enable vision optimized kernels while building the cadence cmake. Rest of the steps remain the same.