-
Notifications
You must be signed in to change notification settings - Fork 688
[ExecuTorch] FFHT: ARM NEON port #5289
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Patch the code generator to be capable of generating NEON code and leave it configured to do that since we already have the checked-in generated AVX and SSE code. Generated code size was a potential issue so I also patched the generator to 1) reuse generated code for previous smaller sizes whereapplicable and 2) choose the smallest code that isn't more than 10% slower than the very fastest code. Differential Revision: [D60194970](https://our.internmc.facebook.com/intern/diff/D60194970/) [ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/5289
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit bbf5ef4 with merge base 6328d41 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This pull request was exported from Phabricator. Differential Revision: D60194970 |
Patch the code generator to be capable of generating NEON code and leave it configured to do that since we already have the checked-in generated AVX and SSE code. Generated code size was a potential issue so I also patched the generator to 1) reuse generated code for previous smaller sizes whereapplicable and 2) choose the smallest code that isn't more than 10% slower than the very fastest code. Differential Revision: [D60194970](https://our.internmc.facebook.com/intern/diff/D60194970/) [ghstack-poisoned]
This pull request was exported from Phabricator. Differential Revision: D60194970 |
Patch the code generator to be capable of generating NEON code and leave it configured to do that since we already have the checked-in generated AVX and SSE code. Generated code size was a potential issue so I also patched the generator to 1) reuse generated code for previous smaller sizes whereapplicable and 2) choose the smallest code that isn't more than 10% slower than the very fastest code. Differential Revision: [D60194970](https://our.internmc.facebook.com/intern/diff/D60194970/) [ghstack-poisoned]
This pull request was exported from Phabricator. Differential Revision: D60194970 |
Patch the code generator to be capable of generating NEON code and leave it configured to do that since we already have the checked-in generated AVX and SSE code. Generated code size was a potential issue so I also patched the generator to 1) reuse generated code for previous smaller sizes whereapplicable and 2) choose the smallest code that isn't more than 10% slower than the very fastest code. Differential Revision: [D60194970](https://our.internmc.facebook.com/intern/diff/D60194970/) [ghstack-poisoned]
This pull request was exported from Phabricator. Differential Revision: D60194970 |
This pull request has been merged in eedc38a. |
Pull Request resolved: pytorch/executorch#5289 Patch the code generator to be capable of generating NEON code and leave it configured to do that since we already have the checked-in generated AVX and SSE code. Generated code size was a potential issue so I also patched the generator to 1) reuse generated code for previous smaller sizes whereapplicable and 2) choose the smallest code that isn't more than 10% slower than the very fastest code. ghstack-source-id: 242230777 @exported-using-ghexport Differential Revision: [D60194970](https://our.internmc.facebook.com/intern/diff/D60194970/)
Stack from ghstack (oldest at bottom):
Patch the code generator to be capable of generating NEON
code and leave it configured to do that since we already have the
checked-in generated AVX and SSE code. Generated code size was
a potential issue so I also patched the generator to 1) reuse
generated code for previous smaller sizes whereapplicable and 2)
choose the smallest code that isn't more than 10%
slower than the very fastest code.
Differential Revision: D60194970