Skip to content

Conversation

@markdryan
Copy link
Contributor

@markdryan markdryan commented Aug 28, 2025

The compiler options that enable 16 bit floating point instructions
should not be enabled by default when building the RISCV64_ZVL128B
and RISCV64_ZVL256B targets. The zfh and zvfh extensions are not part
of the 'V' extension and are not required by any of the RVA profiles.
There's no guarantee that kernels built with zfh and zvfh will work
correctly on fully compliant RVA23U64 devices.

To fix the issue we only build the RISCV64_ZVL128B and RISCV64_ZVL256B
kernels with the half float flags if BUILD_HFLOAT16=1. We also update
the RISC-V dynamic detection code to disable the RISCV64_ZVL128B and
RISCV64_ZVL256B kernels at runtime if we've built with DYNAMIC_ARCH=1
and BUILD_HFLOAT16=1 and are running on a device that does not support
both Zfh and Zvfh.

Fixes: #5428

TARGET_FLAGS = -march=rv64imafdcv_zba_zbb_zfh -mabi=lp64d
endif

ifeq ($(TARGET), RISCV64_ZVL256B)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does anyone see any issue with removing this?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you want to remove these entirely, just revert them to their state before the addition of the HFLOAT16 PR (i.e. rv64imafdcv), unless we can be absolutely certain that the getarch utility and everything downstream of c_check still works on any affected RISCV64 platform (I'm especially worried about the -mabi=lp64d part) ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was removing these to fix

#5424

but perhaps this is not the way to do this. Perhaps it's best to remove

05c8654

from this PR and fix the issue separately.

Copy link
Contributor Author

@markdryan markdryan Aug 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've dropped the commit that removes these lines entirely. The remaining commit just removes the half float flags.

} else {
#if defined(BUILD_HFLOAT16)
return NULL;
#endif
Copy link
Contributor Author

@markdryan markdryan Aug 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I should probably put a #else here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's fine as is, no? If BUILD_HFLOAT16 isn't defined, we just get the existing pre-your-changes behaviour

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. Aren't we dropping out of the entire "vector extension supported" case with this change, when "only" the fp16 shgemm kernel is not supported by the detected hardware ? (Though I don't have an easy solution for how to keep using the others regardless, and switch just shgemm to the generic C implementation. Perhaps "rvv+fp16" would need to be a separate TARGET even)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the case where DYNAMIC_ARCH=1, yes. The approach this patch takes is if we've built with DYNAMIC_ARCH=1 and BUILD_HFLOAT16=1 the vector kernels will only be used if we can detect Zfh, Zvfh and V at runtime, which seems reasonable as this is how they've been compiled. Not ideal perhaps, but better than the current code on develop where a DYNAMIC_ARCH=1 build will happily execute the vector kernels on machines without half float support and probably crash.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree for shgemm but not sure why you'd expect the vector kernels in general to crash if Z(v)fh isn't available ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but these would crash only when actually executing SHGEMM - while your change makes them inexplicably slow for any other BLAS operation if only the build host happened to have these extensions. This might create a problem with third-party packagers (like Linux distributions) depending on how likely they are to active the BUILD_HFLOAT16 option

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently all the kernels in RISCV64_ZVL128B and RISCV64_ZVL256B are compiled with Zfh, Zvfh and V but at runtime we only check for V before executing them. In general I don't think it's a good idea to execute code compiled with optional extensions like Zfh and Zvfh without first checking to see whether the extensions are supported at runtime. Even if the other full float kernels aren't explicitly using half float instructions, by compiling them with Zfh and Zvfh, we give the compiler permission to use the half float instructions in those kernels, should it find some clever reason to do so.

This might create a problem with third-party packagers (like Linux distributions) depending on how likely they are to active the BUILD_HFLOAT16 option

I think it's very unlikely that Linux distributions will be able to activate BUILD_HFLOAT16=1 in its current form for riscv64 any time soon as Zvfh and Zfh are not mandatory in any of the existing RVA profiles, even RVA23U64. They could only safely build with BUILD_HFLOAT16=1 if runtime detection of zfh and zvfh were performed.

I'm happy to explore another solution, e.g., introducing new targets as you suggested above.

Incidentally, what is done on X86? How are bf16 and fp16 handled?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On x86_64 (and power), bf16 capability is tied to TARGET, and fp16 is as yet unimplemented - it made its debut in the riscv64 PR.
Perhaps what could be done here is issue a warning about lack of Zfh (and consequentially SHGEMM) support before activating the RISCV64_ZVL128B or RISCV64_ZVL256B kernel regardless ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I think I'll just add a FALLBACK_VERBOSE type message (as a followup commit) so that the user has a chance to find out why performance is lower than expected

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that sounds like a good idea.

Apologies for not replying to your previous comment. I got a little distracted.

The compiler options that enable 16 bit floating point instructions
should not be enabled by default when building the RISCV64_ZVL128B
and RISCV64_ZVL256B targets.  The zfh and zvfh extensions are not part
of the 'V' extension and are not required by any of the RVA profiles.
There's no guarantee that kernels built with zfh and zvfh will work
correctly on fully compliant RVA23U64 devices.

To fix the issue we only build the RISCV64_ZVL128B and RISCV64_ZVL256B
kernels with the half float flags if BUILD_HFLOAT16=1.  We also update
the RISC-V dynamic detection code to disable the RISCV64_ZVL128B and
RISCV64_ZVL256B kernels at runtime if we've built with DYNAMIC_ARCH=1
and BUILD_HFLOAT16=1 and are running on a device that does not support
both Zfh and Zvfh.

Fixes: OpenMathLib#5428
@markdryan markdryan force-pushed the markdryan/riscv-hf16-fix branch 2 times, most recently from 05c8654 to ce79fe1 Compare August 28, 2025 13:41
@markdryan markdryan changed the title Fix hfloat and fortran compiler detection issues with the RISC-V makefiles Fix hfloat16 issues with the RISC-V makefiles and dynamic kernel selection Aug 28, 2025
@markdryan markdryan changed the title Fix hfloat16 issues with the RISC-V makefiles and dynamic kernel selection disable fp16 flags on RISC-V unless BUILD_HFLOAT16=1 Aug 28, 2025
@martin-frbg martin-frbg added this to the 0.3.31 milestone Sep 9, 2025
@martin-frbg martin-frbg merged commit 5ab1437 into OpenMathLib:develop Sep 9, 2025
83 of 88 checks passed
@ChipKerchner
Copy link
Contributor

ChipKerchner commented Sep 23, 2025

Does this also need to be done for BUILD_BFLOAT16 since it now has vector support for SBGEMM?

@markdryan
Copy link
Contributor Author

Does this also need to be done for BUILD_BFLOAT16 since it now has vector support for SBGEMM?

By this do you mean update the dynamic_riscv64.c code to check for the bfloat16 extensions? I'd say we do need to do that, particularly as the BF16 extensions are not mandatory in RVA23.

@martin-frbg
Copy link
Collaborator

The runtime check for extension support in the DYNAMIC_ARCH case ? I guess so, if we cannot be sure that these are going to be present in any relevant -zvl128b or -zvl256b cpu.
That is unless we accept the risk that SBGEMM calls will just SIGILL on underequipped hardware - I still think any other BLAS calls should be fine, and I'm still not sure if I like the tradeoff of falling back to generic kernels for everything just because the optimized ones were built with either flavor of half-precision support - which the user may know to avoid calling. There is the risk of the compiler using half-precision tricks in optimizing regular precision code if we tell it they're available, but I assume it is largely theoretical ?

if (ret == 0) {
if (!(pairs[0].value & RISCV_HWPROBE_IMA_V))
#if defined(BUILD_HFLOAT16)
vector_mask = (RISCV_HWPROBE_IMA_V | RISCV_HWPROBE_EXT_ZFH | RISCV_HWPROBE_EXT_ZVFH);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't you need to check for these also?

#define RISCV_HWPROBE_EXT_ZFHMIN (1 << 28)
#define RISCV_HWPROBE_EXT_ZVFHMIN (1 << 31)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these extensions being used?

They don't appear to be enabled in the compiler flags

Copy link
Contributor

@ChipKerchner ChipKerchner Sep 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

__riscv_vle16_v_f16m1 (and other variations) are listed under the Zvfhmin extension. I would think the scalar form is also being used in situations like this _Float16 B0 = B[bi+0];

The ones that you currently have are for the mult and madd instructions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I looked it up. Minimum extension is a subset of the regular extension. So what we and you have seem correct.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

RISC-V RISCV64_ZVL128B and RISCV64_ZVL256B targets are broken

4 participants