Skip to content

Conversation

@jaykang10
Copy link
Contributor

@jaykang10 jaykang10 commented Feb 3, 2026

Added Arm64 feature detections in STL

FEAT_DotProd (PF_ARM_V82_DP_INSTRUCTIONS_AVAILABLE)
FEAT_I8MM (PF_ARM_V82_I8MM_INSTRUCTIONS_AVAILABLE)
FEAT_SHA3 (PF_ARM_SHA3_INSTRUCTIONS_AVAILABLE)
FEAT_SVE (PF_ARM_SVE_INSTRUCTIONS_AVAILABLE)
FEAT_SVE2 (PF_ARM_SVE2_INSTRUCTIONS_AVAILABLE)
FEAT_SVE2p1 (PF_ARM_SVE2_1_INSTRUCTIONS_AVAILABLE)
FEAT_SVE_SHA3 (PF_ARM_SVE_SHA3_INSTRUCTIONS_AVAILABLE)
FEAT_SVE_AES (PF_ARM_SVE_AES_INSTRUCTIONS_AVAILABLE)
FEAT_SVE_BitPerm (PF_ARM_SVE_BITPERM_INSTRUCTIONS_AVAILABLE)

@jaykang10 jaykang10 requested a review from a team as a code owner February 3, 2026 17:28
@github-project-automation github-project-automation bot moved this to Initial Review in STL Code Reviews Feb 3, 2026
@jaykang10
Copy link
Contributor Author

@jaykang10 please read the following Contributor License Agreement(CLA). If you agree with the CLA, please reply with the following information.

@microsoft-github-policy-service agree [company="{your company}"]

Options:

  • (default - no company specified) I have sole ownership of intellectual property rights to my Submissions and I am not making Submissions in the course of work for my employer.
@microsoft-github-policy-service agree
  • (when company given) I am making Submissions in the course of work for my employer (or my employer has intellectual property rights in my Submissions by contract or applicable law). I have permission from my employer to make Submissions and enter into this Agreement on behalf of my employer. By signing below, the defined term “You” includes me and my employer.
@microsoft-github-policy-service agree company="Microsoft"

Contributor License Agreement

@microsoft-github-policy-service agree company="Arm"

FEAT_DotProd (PF_ARM_V82_DP_INSTRUCTIONS_AVAILABLE)
FEAT_I8MM (PF_ARM_V82_I8MM_INSTRUCTIONS_AVAILABLE)
FEAT_SHA3 (PF_ARM_SHA3_INSTRUCTIONS_AVAILABLE)
FEAT_SVE (PF_ARM_SVE_INSTRUCTIONS_AVAILABLE)
FEAT_SVE2 (PF_ARM_SVE2_INSTRUCTIONS_AVAILABLE)
FEAT_SVE2p1 (PF_ARM_SVE2_1_INSTRUCTIONS_AVAILABLE)
FEAT_SVE_SHA3 (PF_ARM_SVE_SHA3_INSTRUCTIONS_AVAILABLE)
FEAT_SVE_AES (PF_ARM_SVE_AES_INSTRUCTIONS_AVAILABLE)
FEAT_SVE_BitPerm (PF_ARM_SVE_BITPERM_INSTRUCTIONS_AVAILABLE)
@jaykang10 jaykang10 force-pushed the arm64-feature-detection branch from e1bc408 to 22ec516 Compare February 3, 2026 17:39
@frederick-vs-ja
Copy link
Contributor

I think we should add them together with certain improvement of vectorization.

@jaykang10
Copy link
Contributor Author

I think we should add them together with certain improvement of vectorization.

Yep, I agree with it.
@hazzlim I think we don’t need vcruntime support for Arm64 feature detection, and IsProcessorFeaturePresent should be sufficient.
Could you please add explicit feature detection for each case where the algorithms are vectorized for a specific feature?
I think this PR could be used as a reference.

@hazzlim
Copy link
Contributor

hazzlim commented Feb 4, 2026

I think we should add them together with certain improvement of vectorization.

Yep, I agree with it. @hazzlim I think we don’t need vcruntime support for Arm64 feature detection, and IsProcessorFeaturePresent should be sufficient. Could you please add explicit feature detection for each case where the algorithms are vectorized for a specific feature? I think this PR could be used as a reference.

Seems reasonable, I will take this approach.

From the initial discussion on the STL discord, it sounded like the use of IsProcessorFeaturePresent was considered a suitable temporary workaround until runtime ISA detection was added to vcruntime. @StephanTLavavej are you happy with this being the "final-form"?

@jaykang10
Copy link
Contributor Author

jaykang10 commented Feb 4, 2026

IsProcessorFeaturePresent

IsProcessorFeaturePresent is supported by the Windows OS. As you know, SVE requires OS support because its stack manipulation differs from NEON, even if the CPU itself supports the feature. From my perspective, using IsProcessorFeaturePresent would be safer.

@AlexGuteniev
Copy link
Contributor

I'm wondering why crypto stuff (FEAT_SHA3, FEAT_SVE_SHA3, FEAT_SVE_AES) would be needed.
The algorithms are general purpose the STL currently doesn't provide any crypto features, as far as I know (it has random and hash, but both are general purpose, not crypto purpose).
Do crypo features include some general purpose instructions?

@hazzlim
Copy link
Contributor

hazzlim commented Feb 4, 2026

I believe that this list comes from the set of features proposed by my team to add to the vcruntime (not necessarily only for use by the STL). We were advised that there would be quite a high latency between this being added and being available for use, so we were keen to cover a range of features.

I don't think the crypto features are going to be used in the STL.

@StephanTLavavej
Copy link
Member

@StephanTLavavej are you happy with this being the "final-form"?

If you're happy with the perf then I think this is perfectly maintainable, thanks.

@StephanTLavavej StephanTLavavej added performance Must go faster ARM64 Related to the ARM64 architecture labels Feb 4, 2026
@AlexGuteniev
Copy link
Contributor

I would like __isa_available from VSRuntime solution more, as it will allow testing with features disabled

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When creating a new PR, please ensure that your main is up to date, and that your feature branch is properly synced. (Ideally, clean commits rebased on top of main.) Your branch was relative to main as of 2025-10-10 which was pretty old for such a fast-moving file as vector_algorithms.cpp. In this case it made no difference, just something to remember for the future!

Comment on lines +89 to +91
bool _Use_FEAT_AES() noexcept {
return IsProcessorFeaturePresent(PF_ARM_SVE_AES_INSTRUCTIONS_AVAILABLE);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in the discussion, the cryptography stuff is going to be unused by the STL, but I am fine with merging all of this for now, then we can garbage-collect the unused functions when we have the full suite of STL vectorized algorithms for ARM64 completed. These are helpers in an unnamed namespace, so they can be removed without fear of breaking the import lib's compatibility.

@StephanTLavavej
Copy link
Member

I would like __isa_available from VSRuntime solution more, as it will allow testing with features disabled

It should be possible to add a "Vulcan nerve pinch" function, present in the import lib only when building the STL's tests, that allows us to simulate the features being disabled. (We had a worse version of this for ConcRT ages ago, that one was bad because it was exposed to users.) This can be retrofitted in the future.

@StephanTLavavej StephanTLavavej moved this from Initial Review to Ready To Merge in STL Code Reviews Feb 4, 2026
@hazzlim
Copy link
Contributor

hazzlim commented Feb 5, 2026

Nit: imperative present tense for commit message + slight typo fix:

- Added Arm64 feature detections in STL
+ Add Arm64 feature detection in STL

@StephanTLavavej StephanTLavavej moved this from Ready To Merge to Merging in STL Code Reviews Feb 9, 2026
@StephanTLavavej
Copy link
Member

I'm mirroring this to the MSVC-internal repo. Please notify me if any further changes are pushed, otherwise no action is required.

@StephanTLavavej StephanTLavavej merged commit 6081666 into microsoft:main Feb 11, 2026
45 checks passed
@github-project-automation github-project-automation bot moved this from Merging to Done in STL Code Reviews Feb 11, 2026
@StephanTLavavej
Copy link
Member

Thanks for adding this feature detection and congratulations on your first microsoft/STL commit! 🦾 6️⃣ 4️⃣

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ARM64 Related to the ARM64 architecture performance Must go faster

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

5 participants