Skip to content

Conversation

@ykhrustalev
Copy link
Contributor

Summary

The sleef.h header contains

#if (defined(__MINGW32__) || defined(__MINGW64__) || defined(__CYGWIN__) || defined(_MSC_VER)) && !defined(SLEEF_STATIC_LIBS)
#ifdef SLEEF_IMPORT_IS_EXPORT
#define SLEEF_IMPORT __declspec(dllexport)
#else // #ifdef SLEEF_IMPORT_IS_EXPORT
#define SLEEF_IMPORT __declspec(dllimport)
#if (defined(_MSC_VER))
#pragma comment(lib,"sleef.lib")
#endif // #if (defined(_MSC_VER))
#endif // #ifdef SLEEF_IMPORT_IS_EXPORT
#else // #if (defined(__MINGW32__) || defined(__MINGW64__) || defined(__CYGWIN__) || defined(_MSC_VER)) && !defined(SLEEF_STATIC_LIBS)
#define SLEEF_IMPORT
#endif // #if (defined(__MINGW32__) || defined(__MINGW64__) || defined(__CYGWIN__) || defined(_MSC_VER)) && !defined(SLEEF_STATIC_LIBS)

#pragma comment(lib,"sleef.lib")

which forces the sleef.lib to be linked for the static windows lib.
The 2 particular optimized kernels are not using any sleef functions and could be used as a subset that is statically compiled.

Test plan

Removing the header makes the static link to pass without a need to have a sleef lib.

@pytorch-bot
Copy link

pytorch-bot bot commented Aug 2, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/13079

Note: Links to docs will display an error until the docs builds have been completed.

❌ 3 New Failures, 1 Pending

As of commit 74c887e with merge base 27fa18e (image):

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 2, 2025
@github-actions
Copy link

github-actions bot commented Aug 2, 2025

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

@digantdesai
Copy link
Contributor

The 2 particular optimized kernels are not using any sleef functions and could be used as a subset that is statically compiled.

Good. yeah we use at::native::vectorized_gelu now for gelu for instance.
If the CI is clean perhaps we can merge. Do you mind fixing the Buck target file as well?

@facebook-github-bot
Copy link
Contributor

@digantdesai has imported this pull request. If you are a Meta employee, you can view this in D79648326.

@digantdesai
Copy link
Contributor

Running CI now

@ykhrustalev
Copy link
Contributor Author

The 2 particular optimized kernels are not using any sleef functions and could be used as a subset that is statically compiled.

Good. yeah we use at::native::vectorized_gelu now for gelu for instance. If the CI is clean perhaps we can merge. Do you mind fixing the Buck target file as well?

@digantdesai the linking comes from the "header" file, not from the buck or cmake. Can you please elaborate the Buck change you mean?

@digantdesai
Copy link
Contributor

NVM we may not need any update to the BUCK file.

@kimishpatel
Copy link
Contributor

wait but gelu.h from aten native is all header only impl which pulls in other headers like Vectorized.h. So inclusion of sleef should likely happen there, which does include sleef under these conditions, https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/cpu/vec/vec128/vec128_float_neon.h#L10-L12. So if you are avoiding use of sleef tahts fine (as long as it doesnt break other builds), but it will cost perf on windows

@ykhrustalev
Copy link
Contributor Author

wait but gelu.h from aten native is all header only impl which pulls in other headers like Vectorized.h. So inclusion of sleef should likely happen there, which does include sleef under these conditions, https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/cpu/vec/vec128/vec128_float_neon.h#L10-L12. So if you are avoiding use of sleef tahts fine (as long as it doesnt break other builds), but it will cost perf on windows

That's fine, we are using only a subset of kernels for that case, and the 2 kernels in question are not using Sleef in their implementation. Other sleef kernels are skipped via compilation flags.

@digantdesai
Copy link
Contributor

Once the CI is green (after rebase) I can merge this. @kimishpatel - OK with this?

@kimishpatel
Copy link
Contributor

yeah ok with me

@facebook-github-bot
Copy link
Contributor

@digantdesai has imported this pull request. If you are a Meta employee, you can view this in D79648326.

@digantdesai digantdesai merged commit 2debbfc into pytorch:main Aug 20, 2025
102 of 105 checks passed
@ykhrustalev ykhrustalev deleted the ykhrustalev/cross-compile-sleef branch August 20, 2025 21:19
agrima1304 pushed a commit to agrima1304/executorch that referenced this pull request Aug 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants