-
Notifications
You must be signed in to change notification settings - Fork 1.6k
disable fp16 flags on RISC-V unless BUILD_HFLOAT16=1 #5431
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
disable fp16 flags on RISC-V unless BUILD_HFLOAT16=1 #5431
Conversation
| TARGET_FLAGS = -march=rv64imafdcv_zba_zbb_zfh -mabi=lp64d | ||
| endif | ||
|
|
||
| ifeq ($(TARGET), RISCV64_ZVL256B) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does anyone see any issue with removing this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you want to remove these entirely, just revert them to their state before the addition of the HFLOAT16 PR (i.e. rv64imafdcv), unless we can be absolutely certain that the getarch utility and everything downstream of c_check still works on any affected RISCV64 platform (I'm especially worried about the -mabi=lp64d part) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've dropped the commit that removes these lines entirely. The remaining commit just removes the half float flags.
driver/others/dynamic_riscv64.c
Outdated
| } else { | ||
| #if defined(BUILD_HFLOAT16) | ||
| return NULL; | ||
| #endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I should probably put a #else here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's fine as is, no? If BUILD_HFLOAT16 isn't defined, we just get the existing pre-your-changes behaviour
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. Aren't we dropping out of the entire "vector extension supported" case with this change, when "only" the fp16 shgemm kernel is not supported by the detected hardware ? (Though I don't have an easy solution for how to keep using the others regardless, and switch just shgemm to the generic C implementation. Perhaps "rvv+fp16" would need to be a separate TARGET even)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the case where DYNAMIC_ARCH=1, yes. The approach this patch takes is if we've built with DYNAMIC_ARCH=1 and BUILD_HFLOAT16=1 the vector kernels will only be used if we can detect Zfh, Zvfh and V at runtime, which seems reasonable as this is how they've been compiled. Not ideal perhaps, but better than the current code on develop where a DYNAMIC_ARCH=1 build will happily execute the vector kernels on machines without half float support and probably crash.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree for shgemm but not sure why you'd expect the vector kernels in general to crash if Z(v)fh isn't available ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but these would crash only when actually executing SHGEMM - while your change makes them inexplicably slow for any other BLAS operation if only the build host happened to have these extensions. This might create a problem with third-party packagers (like Linux distributions) depending on how likely they are to active the BUILD_HFLOAT16 option
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently all the kernels in RISCV64_ZVL128B and RISCV64_ZVL256B are compiled with Zfh, Zvfh and V but at runtime we only check for V before executing them. In general I don't think it's a good idea to execute code compiled with optional extensions like Zfh and Zvfh without first checking to see whether the extensions are supported at runtime. Even if the other full float kernels aren't explicitly using half float instructions, by compiling them with Zfh and Zvfh, we give the compiler permission to use the half float instructions in those kernels, should it find some clever reason to do so.
This might create a problem with third-party packagers (like Linux distributions) depending on how likely they are to active the BUILD_HFLOAT16 option
I think it's very unlikely that Linux distributions will be able to activate BUILD_HFLOAT16=1 in its current form for riscv64 any time soon as Zvfh and Zfh are not mandatory in any of the existing RVA profiles, even RVA23U64. They could only safely build with BUILD_HFLOAT16=1 if runtime detection of zfh and zvfh were performed.
I'm happy to explore another solution, e.g., introducing new targets as you suggested above.
Incidentally, what is done on X86? How are bf16 and fp16 handled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On x86_64 (and power), bf16 capability is tied to TARGET, and fp16 is as yet unimplemented - it made its debut in the riscv64 PR.
Perhaps what could be done here is issue a warning about lack of Zfh (and consequentially SHGEMM) support before activating the RISCV64_ZVL128B or RISCV64_ZVL256B kernel regardless ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I think I'll just add a FALLBACK_VERBOSE type message (as a followup commit) so that the user has a chance to find out why performance is lower than expected
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that sounds like a good idea.
Apologies for not replying to your previous comment. I got a little distracted.
The compiler options that enable 16 bit floating point instructions should not be enabled by default when building the RISCV64_ZVL128B and RISCV64_ZVL256B targets. The zfh and zvfh extensions are not part of the 'V' extension and are not required by any of the RVA profiles. There's no guarantee that kernels built with zfh and zvfh will work correctly on fully compliant RVA23U64 devices. To fix the issue we only build the RISCV64_ZVL128B and RISCV64_ZVL256B kernels with the half float flags if BUILD_HFLOAT16=1. We also update the RISC-V dynamic detection code to disable the RISCV64_ZVL128B and RISCV64_ZVL256B kernels at runtime if we've built with DYNAMIC_ARCH=1 and BUILD_HFLOAT16=1 and are running on a device that does not support both Zfh and Zvfh. Fixes: OpenMathLib#5428
05c8654 to
ce79fe1
Compare
|
Does this also need to be done for BUILD_BFLOAT16 since it now has vector support for SBGEMM? |
By this do you mean update the dynamic_riscv64.c code to check for the bfloat16 extensions? I'd say we do need to do that, particularly as the BF16 extensions are not mandatory in RVA23. |
|
The runtime check for extension support in the DYNAMIC_ARCH case ? I guess so, if we cannot be sure that these are going to be present in any relevant -zvl128b or -zvl256b cpu. |
| if (ret == 0) { | ||
| if (!(pairs[0].value & RISCV_HWPROBE_IMA_V)) | ||
| #if defined(BUILD_HFLOAT16) | ||
| vector_mask = (RISCV_HWPROBE_IMA_V | RISCV_HWPROBE_EXT_ZFH | RISCV_HWPROBE_EXT_ZVFH); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't you need to check for these also?
#define RISCV_HWPROBE_EXT_ZFHMIN (1 << 28)
#define RISCV_HWPROBE_EXT_ZVFHMIN (1 << 31)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these extensions being used?
They don't appear to be enabled in the compiler flags
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
__riscv_vle16_v_f16m1 (and other variations) are listed under the Zvfhmin extension. I would think the scalar form is also being used in situations like this _Float16 B0 = B[bi+0];
The ones that you currently have are for the mult and madd instructions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I looked it up. Minimum extension is a subset of the regular extension. So what we and you have seem correct.
The compiler options that enable 16 bit floating point instructions
should not be enabled by default when building the RISCV64_ZVL128B
and RISCV64_ZVL256B targets. The zfh and zvfh extensions are not part
of the 'V' extension and are not required by any of the RVA profiles.
There's no guarantee that kernels built with zfh and zvfh will work
correctly on fully compliant RVA23U64 devices.
To fix the issue we only build the RISCV64_ZVL128B and RISCV64_ZVL256B
kernels with the half float flags if BUILD_HFLOAT16=1. We also update
the RISC-V dynamic detection code to disable the RISCV64_ZVL128B and
RISCV64_ZVL256B kernels at runtime if we've built with DYNAMIC_ARCH=1
and BUILD_HFLOAT16=1 and are running on a device that does not support
both Zfh and Zvfh.
Fixes: #5428