-
Notifications
You must be signed in to change notification settings - Fork 1.4k
ggml : simplify Arm fp16 CPU logic #1177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
src/ggml-impl.h
Outdated
| // 16-bit float | ||
| // on Arm, we use __fp16 | ||
| // on x86, we use uint16_t | ||
| #if defined(__ARM_NEON) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't there be a corresponding && !defined(__MSC_VER) here? Since we're only removing the CUDA check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if MSVC supports __fp16, we don't need this check. But maybe it was added because it does not support it. If that's the case, we'll have to bring it back and fallback to the reference implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if MSVC supports __fp16, we don't need this check.
Yeah true. The original PR said: can't use native __fp16 type as it's not supported by MSVC - ggml-org/llama.cpp#3007
Maybe it has changed since that PR, but I can't find anything to this on the internet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aha, this explains it.
But on the other hand, we have a MSVC+Arm CI job in llama.cpp that appears to be passing with this change:
https://github.com/ggml-org/llama.cpp/actions/runs/14262095042/job/39975786169
So maybe it was indeed fixed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. This could be merged in, and if there are any reports of regression on MSVC+Arm Neon, then it's easy to fix it back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MSVC cannot be used to build for Arm, the CPU backend does not allow it.
This is because of the issue with the Arm inline assembly that is not supported by the MSVC compiler, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Support for Arm MSVC was removed when fixing the CPU feature check with GGML_NATIVE in cmake, since it would require writing a specific implementation for it. You could still build with MSVC without the inline assembly kernels before that, it would just be slower. Since Clang is always available and shipped with Visual Studio, there was no reason to spend time on that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. So I think it that case, this PR should be good to merge?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know, I don't have the full context of why this code exists in this way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did some additional digging and found the need for the CUDA and MUSA checks. They do seem to be necessary so I added comments in the code and updated the OP to try to make things a bit more clear. This PR now just removes the ggml_fp16_internal_t type which is now redundant and cleans up some macros and include logic.
Let me know if you see any red flags.
|
@max-krasnyansky Could you check if this branch builds on an Arm + MSVC machine (I think you mentioned you have one)? Not sure if the MSVC supports Edit: nvm, no longer needed. |
ref ggml-org/llama.cpp#12732, #1176
ggml_fp16_internal_tseems to no longer have any usage, so it can be removedggml-impl.his needed in order to fix the build with old CUDA compilers v11. These compilers don't support__fp16(see Avoid using __fp16 on ARM with old nvcc, fixes #10555 llama.cpp#10616), so we don't include Arm headers such asarm_neon.hto avoid the compiler seeing this type. Note that the CPU code which is not compiled withnvcccan still include Arm headers and use__fp16.ggml-cpu-impl.hare redundant because they are already included byggml-impl.h. So they are removed to deduplicate the code.