Skip to content

Conversation

@ggerganov
Copy link
Member

@ggerganov ggerganov commented Apr 4, 2025

ref ggml-org/llama.cpp#12732, #1176

  • The ARM_NEON + MSVC checks are not needed any more because we don't support ARM + MSVC (ggml : fix arm build llama.cpp#10890). ARM + Clang is the recommended alternative.
  • The ggml_fp16_internal_t seems to no longer have any usage, so it can be removed
  • The CUDA check in ggml-impl.h is needed in order to fix the build with old CUDA compilers v11. These compilers don't support __fp16 (see Avoid using __fp16 on ARM with old nvcc, fixes #10555 llama.cpp#10616), so we don't include Arm headers such as arm_neon.h to avoid the compiler seeing this type. Note that the CPU code which is not compiled with nvcc can still include Arm headers and use __fp16.
  • Similar check is done for MUSA per MUSA: support ARM64 and enable dp4a .etc llama.cpp#11843
  • The Arm headers included in ggml-cpu-impl.h are redundant because they are already included by ggml-impl.h. So they are removed to deduplicate the code.

@ggerganov ggerganov requested review from cmdr2 and slaren and removed request for slaren April 4, 2025 09:11
@ggerganov ggerganov changed the title ggml : simlpify Arm fp16 CPU logic ggml : simplify Arm fp16 CPU logic Apr 4, 2025
src/ggml-impl.h Outdated
// 16-bit float
// on Arm, we use __fp16
// on x86, we use uint16_t
#if defined(__ARM_NEON)
Copy link
Collaborator

@cmdr2 cmdr2 Apr 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't there be a corresponding && !defined(__MSC_VER) here? Since we're only removing the CUDA check

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if MSVC supports __fp16, we don't need this check. But maybe it was added because it does not support it. If that's the case, we'll have to bring it back and fallback to the reference implementation.

Copy link
Collaborator

@cmdr2 cmdr2 Apr 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if MSVC supports __fp16, we don't need this check.

Yeah true. The original PR said: can't use native __fp16 type as it's not supported by MSVC - ggml-org/llama.cpp#3007

Maybe it has changed since that PR, but I can't find anything to this on the internet.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha, this explains it.

But on the other hand, we have a MSVC+Arm CI job in llama.cpp that appears to be passing with this change:

https://github.com/ggml-org/llama.cpp/actions/runs/14262095042/job/39975786169

So maybe it was indeed fixed?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. This could be merged in, and if there are any reports of regression on MSVC+Arm Neon, then it's easy to fix it back.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MSVC cannot be used to build for Arm, the CPU backend does not allow it.

This is because of the issue with the Arm inline assembly that is not supported by the MSVC compiler, correct?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Support for Arm MSVC was removed when fixing the CPU feature check with GGML_NATIVE in cmake, since it would require writing a specific implementation for it. You could still build with MSVC without the inline assembly kernels before that, it would just be slower. Since Clang is always available and shipped with Visual Studio, there was no reason to spend time on that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. So I think it that case, this PR should be good to merge?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know, I don't have the full context of why this code exists in this way.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did some additional digging and found the need for the CUDA and MUSA checks. They do seem to be necessary so I added comments in the code and updated the OP to try to make things a bit more clear. This PR now just removes the ggml_fp16_internal_t type which is now redundant and cleans up some macros and include logic.

Let me know if you see any red flags.

@ggerganov
Copy link
Member Author

ggerganov commented Apr 4, 2025

@max-krasnyansky Could you check if this branch builds on an Arm + MSVC machine (I think you mentioned you have one)? Not sure if the MSVC supports __fp16 - if it does, we can simplify this code here. Thanks.

Edit: nvm, no longer needed.

@ggerganov ggerganov merged commit 70e85f6 into master Apr 7, 2025
11 checks passed
@ggerganov ggerganov deleted the gg/arm-simplify branch April 7, 2025 09:25
cmdr2 added a commit to cmdr2/ggml that referenced this pull request Apr 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants