ggml : simplify Arm fp16 CPU logic #1177

ggerganov · 2025-04-04T09:10:56Z

The ARM_NEON + MSVC checks are not needed any more because we don't support ARM + MSVC (ggml : fix arm build llama.cpp#10890). ARM + Clang is the recommended alternative.
The ggml_fp16_internal_t seems to no longer have any usage, so it can be removed
The CUDA check in ggml-impl.h is needed in order to fix the build with old CUDA compilers v11. These compilers don't support __fp16 (see Avoid using __fp16 on ARM with old nvcc, fixes #10555 llama.cpp#10616), so we don't include Arm headers such as arm_neon.h to avoid the compiler seeing this type. Note that the CPU code which is not compiled with nvcc can still include Arm headers and use __fp16.
Similar check is done for MUSA per MUSA: support ARM64 and enable dp4a .etc llama.cpp#11843
The Arm headers included in ggml-cpu-impl.h are redundant because they are already included by ggml-impl.h. So they are removed to deduplicate the code.

ggml-ci

cmdr2 · 2025-04-04T09:22:58Z

src/ggml-impl.h

+// 16-bit float
+// on Arm, we use __fp16
+// on x86, we use uint16_t
 #if defined(__ARM_NEON)


Shouldn't there be a corresponding && !defined(__MSC_VER) here? Since we're only removing the CUDA check

I think if MSVC supports __fp16, we don't need this check. But maybe it was added because it does not support it. If that's the case, we'll have to bring it back and fallback to the reference implementation.

I think if MSVC supports __fp16, we don't need this check.

Yeah true. The original PR said: can't use native __fp16 type as it's not supported by MSVC - ggml-org/llama.cpp#3007

Maybe it has changed since that PR, but I can't find anything to this on the internet.

Aha, this explains it.

But on the other hand, we have a MSVC+Arm CI job in llama.cpp that appears to be passing with this change:

https://github.com/ggml-org/llama.cpp/actions/runs/14262095042/job/39975786169

So maybe it was indeed fixed?

Sounds good. This could be merged in, and if there are any reports of regression on MSVC+Arm Neon, then it's easy to fix it back.

MSVC cannot be used to build for Arm, the CPU backend does not allow it.

This is because of the issue with the Arm inline assembly that is not supported by the MSVC compiler, correct?

Support for Arm MSVC was removed when fixing the CPU feature check with GGML_NATIVE in cmake, since it would require writing a specific implementation for it. You could still build with MSVC without the inline assembly kernels before that, it would just be slower. Since Clang is always available and shipped with Visual Studio, there was no reason to spend time on that.

Got it. So I think it that case, this PR should be good to merge?

I don't know, I don't have the full context of why this code exists in this way.

I did some additional digging and found the need for the CUDA and MUSA checks. They do seem to be necessary so I added comments in the code and updated the OP to try to make things a bit more clear. This PR now just removes the ggml_fp16_internal_t type which is now redundant and cleans up some macros and include logic.

Let me know if you see any red flags.

src/ggml-cpu/simd-mappings.h

ggerganov · 2025-04-04T10:01:56Z

@max-krasnyansky Could you check if this branch builds on an Arm + MSVC machine (I think you mentioned you have one)? Not sure if the MSVC supports __fp16 - if it does, we can simplify this code here. Thanks.

Edit: nvm, no longer needed.

ggml-ci

… with ARM Neon

ggml : simlpify Arm fp16 CPU logic

c32f863

ggml-ci

ggerganov requested review from cmdr2 and slaren and removed request for slaren April 4, 2025 09:11

ggerganov changed the title ~~ggml : simlpify Arm fp16 CPU logic~~ ggml : simplify Arm fp16 CPU logic Apr 4, 2025

cmdr2 reviewed Apr 4, 2025

View reviewed changes

cmdr2 approved these changes Apr 4, 2025

View reviewed changes

cmdr2 mentioned this pull request Apr 4, 2025

cpu: fix compile errors on arm+msvc (windows) for pointer casts #1176

Closed

cont : bring back CUDA/MUSA checks

4b5be49

ggml-ci

slaren approved these changes Apr 7, 2025

View reviewed changes

ggerganov merged commit 70e85f6 into master Apr 7, 2025
11 checks passed

ggerganov deleted the gg/arm-simplify branch April 7, 2025 09:25

ggerganov mentioned this pull request Apr 10, 2025

ARM NEON with CUDA 12 fails to build because nvcc tries to compile arm_neon.h #1186

Closed

cmdr2 added a commit to cmdr2/ggml that referenced this pull request Apr 10, 2025

ggml: fix ggml-org#1177 - don't include arm_neon.h when using CUDA 12…

aa299ee

… with ARM Neon

ggerganov mentioned this pull request Jun 9, 2025

fix compilation with MSVC for arm64 with __ARM_NEON defined ggml-org/whisper.cpp#3233

Closed

ggml : simplify Arm fp16 CPU logic #1177

ggml : simplify Arm fp16 CPU logic #1177

Uh oh!

Conversation

ggerganov commented Apr 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cmdr2 Apr 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cmdr2 Apr 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ggerganov commented Apr 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ggerganov commented Apr 4, 2025 •

edited

Loading

cmdr2 Apr 4, 2025 •

edited

Loading

cmdr2 Apr 4, 2025 •

edited

Loading

ggerganov commented Apr 4, 2025 •

edited

Loading