[Draft] Accelerate `Half` with FP16 ISA by anthonycanino · Pull Request #122649 · dotnet/runtime

anthonycanino · 2025-12-18T20:45:27Z

Draft PR for in-progress work to accelerate System.Half with FP16 ISA.

Current work done:

Add a TYP_HALF to the .NET runtime, which is treated like a TYP_SIMDXX, but with some notable differences. Namely, a TYP_HALF is passed around via the xmm registers, and while it will pass a varTypeIsStruct test, it must be treated as a primitive in other places.
Accelerate System.Half operations with the TYP_HALF and some FP16 intrinsics. Not every possible function has been accelerated yet.

For discussion:

I have currently worked around some checks to make TYP_HALF behave like a struct and a primitive. It's very ad-hoc at the moment.
Much of the work to transform the named System.Half intrinsics into a sequence of intrinsic nodes is done in importcall.cpp and might want to be moved up into some of the gtNewSimdXX nodes.

anthonycanino · 2025-12-18T20:50:41Z

@tannergooding @jakobbotsch please take a look when you get a chance.

src/coreclr/jit/codegencommon.cpp

anthonycanino · 2026-01-06T12:47:42Z

@dotnet/intel @tannergooding may I get some high level feedback on the structure of the PR?

src/coreclr/jit/codegenxarch.cpp

tannergooding · 2026-01-06T16:34:34Z

src/coreclr/jit/compiler.cpp

+    if (!compOpportunisticallyDependsOn(InstructionSet_AVX10v1))
+    {
+        return false;
+    }


We need this last, not first, otherwise code gets tagged as benefiting from using AVX10v1 unnecessarily

src/coreclr/jit/compiler.cpp

tannergooding · 2026-01-06T16:47:37Z

src/coreclr/jit/emitxarch.cpp

-               // kmov instructions reach this path with EA_8BYTE size, even on x86
-               || IsKMOVInstruction(ins)


What's the reason for removing this part of the assert?

Think that was an error, will fix.

tannergooding · 2026-01-06T16:49:34Z

src/coreclr/jit/emitxarch.cpp


+        case INS_vmovsh:
+        {
+            hasSideEffect = false;


Doesn't this have a side effect of clearing the upper-bits?

That is, it always does DEST[MAXVL:128] := 0

You are correct, I will change.

tannergooding · 2026-01-06T16:51:02Z

src/coreclr/jit/emitxarch.cpp


 #if defined(TARGET_AMD64)
        case INS_movsxd:
+        case INS_vmovsh:


This isn't TARGET_AMD64 exclusive as vmovsh is listed with V/V for support, so is valid for both 64 and 32-bit mode.

tannergooding · 2026-01-06T16:51:35Z

src/coreclr/jit/emitxarch.cpp

+            if (IsXMMReg(reg))
+            {
+                return emitXMMregName(reg);
+            }


This shouldn't be TARGET_AMD64 exclusive either.

tannergooding · 2026-01-06T16:52:44Z

src/coreclr/jit/emitxarch.cpp

    else if (code & 0xFF000000)
    {
-        if (size == EA_2BYTE)
+        if (size == EA_2BYTE && (ins != INS_vmovsh && ins != INS_vaddsh))


Can we just use && !IsSimdInstruction(ins)?

tannergooding · 2026-01-06T16:54:14Z

src/coreclr/jit/emitxarch.cpp

        case INS_movapd:
        case INS_movupd:
+        // todo-xarch-half: come back to fix
+        case INS_vmovsh:


Shouldn't this be grouped with vmovss and vmovsd? While we may not have exact numbers, I'd expect it to have identical perf/latency to those rather than the more general movaps and friends.

tannergooding · 2026-01-06T16:55:22Z

src/coreclr/jit/emitxarch.cpp

            float insLatency = insLatencyInfos[ins];

+            // todo-xarch-half: hacking an exit on the unhandled ins to make prototyping easier
+            if (ins == INS_vcvtss2sh || ins == INS_vcvtsh2ss || ins == INS_vaddsh || ins == INS_vsubsh ||


I think we want to put most of these with the v*ss and v*sd equivalents prior to mergine this PR.

Yes, and for the above, I will get the proper numbers before putting the PR in non-draft.

tannergooding · 2026-01-06T16:59:38Z

src/coreclr/jit/gentree.cpp

+                // todo-half: this is only to create zero constant half nodes for use in instrincis, anything
+                // else will not work


Not sure I understand this comment.

Presumably we just need a FloatingPointUtils::convertDoubleToHalf(...) method which returns a float16_t type (these were added in C++23, which is newer than our baseline, so we'd just typedef uint16_t float16_t; for the time being).

We then vecCon->gtSimdVal.f16[i] = cnsVal

tannergooding · 2026-01-06T17:04:24Z

src/coreclr/jit/gentree.h

+            {
+                if (arg->IsCnsFltOrDbl())
+                {
+                    simdVal.f16[argIdx] = static_cast<uint16_t>(arg->AsDblCon()->DconValue());


This looks incorrect as it does a double->uint16_t cast, when we rather need double->float16_t

tannergooding · 2026-01-06T17:07:37Z

src/coreclr/jit/hwintrinsiccodegenxarch.cpp

                    }
                }
-                else if (node->TypeIs(TYP_VOID))
+                else if (node->TypeIs(TYP_VOID) || node->TypeIs(TYP_INT))


What's the reason for this change?

Think it was also a bug, I have removed.

src/coreclr/jit/importer.cpp

tannergooding · 2026-01-06T17:10:08Z

src/coreclr/jit/importer.cpp

+                if (sizeBytes < getMinVectorByteLength())
                {
-                    *pSimdBaseJitType = simdBaseType;
+                    // The struct itself is accelerated, in this case, it is `Half`.


Add an assert(sizeBytes == 2) in case we add other sizes in the future?

tannergooding · 2026-01-06T17:12:45Z

src/coreclr/jit/importercalls.cpp

+                break;
+            }
+
+            case NI_System_Half_op_Increment:


Some of these, like Increment/Decrement, could be merged as well using lookupHalfIntrinsic

tannergooding · 2026-01-06T17:20:58Z

src/coreclr/jit/instr.cpp

+    if (srcSize == 2)
+        return INS_vmovsh;


General convention is to have braces, particularly if it is part of an if/else chain:

Suggested change

if (srcSize == 2)

return INS_vmovsh;

if (srcSize == 2)

{

return INS_vmovsh;

}

tannergooding · 2026-01-06T17:23:15Z

src/coreclr/jit/lower.cpp

+    // if (node->TypeGet() == TYP_HALF)
+    //{
+    //     return false;
+    // }


tannergooding · 2026-01-06T17:24:23Z

src/coreclr/jit/lsrabuild.cpp

+                    case TYP_HALF:
+#ifdef TARGET_X86
+                        useCandidates = RBM_FLOATRET;
+#else
+                    useCandidates = RBM_FLOATRET.GetFloatRegSet();
+#endif
+                        break;


This looks to be identical to the TYP_FLOAT path and can be collapsed to share it:

Suggested change

case TYP_HALF:

#ifdef TARGET_X86

useCandidates = RBM_FLOATRET;

#else

useCandidates = RBM_FLOATRET.GetFloatRegSet();

#endif

break;

case TYP_HALF:

tannergooding · 2026-01-06T17:24:59Z

src/coreclr/jit/lsrabuild.cpp

                        // We ONLY want the valid double register in the RBM_DOUBLERET mask.
 #ifdef TARGET_AMD64
                        useCandidates = (RBM_DOUBLERET & RBM_ALLDOUBLE).GetFloatRegSet();
 #else
                    useCandidates = (RBM_DOUBLERET & RBM_ALLDOUBLE).GetFloatRegSet();
 #endif // TARGET_AMD64


not related to this PR, but these two paths are the same

anthonycanino · 2026-01-08T23:52:19Z

@tannergooding I've made a number of changes for the PR.

I think I will go ahead and add the F16C conversions, and then bottle this up as one PR. That should cover most of the initial acceleration for Half, which would address #123017 and #123018

anthonycanino · 2026-01-09T16:20:31Z

I was incorrect about F16C: it looks like it is for vectorized fp16 conversions.

I think we are converging on a first PR now. I am looking into if any remaining operations can be covered with the FP16 ISA instructions.

tannergooding · 2026-01-09T16:57:01Z

I was incorrect about F16C: it looks like it is for vectorized fp16 conversions.

It can still be used to accelerate a lot of functionality for scalars (and is essentially the same support needed where we generate ConvertToHalf(CreateScalarUnsafe(value)).ToScalar())

However, I think it's fine to wait for a subsequent PR to do that work (we do want to do it since that covers all x86-64-v3, i.e. AVX2 capable, hardware).

dotnet-policy-service · 2026-02-12T22:15:55Z

Draft Pull Request was automatically closed for 30 days of inactivity. Please let us know if you'd like to reopen it.

Copilot

Pull request overview

Draft work to enable System.Half acceleration on xarch by introducing a dedicated TYP_HALF JIT type and mapping key Half operations/conversions to AVX10v1 FP16 scalar instructions, while updating VM calling-convention plumbing to match the new ABI behavior.

Changes:

Mark System.Half and several operators/properties/conversions as [Intrinsic] to enable JIT recognition and expansion.
Extend CoreCLR VM + JIT ABI paths so Half can be passed/returned in FP registers on xarch when AVX10v1 is available.
Add broad JIT support for TYP_HALF across SIMD/type normalization, codegen/emitter, HW intrinsics tables, and value numbering.

Reviewed changes

Copilot reviewed 44 out of 45 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
src/libraries/System.Private.CoreLib/src/System/Half.cs	Marks `Half` members as intrinsics to enable JIT recognition/expansion.
src/coreclr/vm/reflectioninvocation.cpp	Adjusts reg type map so reflection invocation passes `Half` like `float` on xarch.
src/coreclr/vm/methodtable.h	Declares `MethodTable::IsNativeHalfType()` for xarch ABI checks.
src/coreclr/vm/invokeutil.cpp	Ensures `Half` args are copied/extended appropriately for call dispatch on xarch.
src/coreclr/vm/class.cpp	Implements `IsNativeHalfType()` gated by intrinsic-ness, layout, and AVX10v1.
src/coreclr/vm/callingconvention.h	Treats `Half` as FP-reg passed/returned in arg iterator/return flags on xarch.
src/coreclr/vm/callhelpers.cpp	Updates call descriptor reg map generation to treat `Half` like `R4` when applicable.
src/coreclr/vm/amd64/profiler.cpp	Updates profiler arg/return handling to treat native `Half` as FP register data.
src/coreclr/jit/vartype.h	Adds helper `varTypeIsAccelerated` and updates float arg-reg usage for `TYP_HALF`.
src/coreclr/jit/valuenumfuncs.h	Expands xarch HW intrinsic VN macro shape to include a `TYP_HALF` slot.
src/coreclr/jit/valuenum.h	Adds `VNForHalfCon` and type conversion traits for `TYP_HALF`.
src/coreclr/jit/valuenum.cpp	Implements Half constant VN allocation and extends various VN helpers for `TYP_HALF`.
src/coreclr/jit/utils.h	Adds `FloatingPointUtils::convertDoubleToFloat16` declaration.
src/coreclr/jit/utils.cpp	Implements software `double -> float16` conversion used in vector constant materialization.
src/coreclr/jit/typelist.h	Defines `TYP_HALF` in the core type list with FP-reg classification.
src/coreclr/jit/simd.h	Adds `float16_t` lanes to SIMD value unions; introduces `SIZE_UNKNOWN`.
src/coreclr/jit/simd.cpp	Extends SIMD type recognition to normalize `System.Half` to `TYP_HALF` (xarch+AVX10v1).
src/coreclr/jit/scopeinfo.cpp	Extends variable location encoding to handle `TYP_HALF` in stack/register locs.
src/coreclr/jit/registeropswasm.cpp	Marks `TYP_HALF` as invalid for wasm value types.
src/coreclr/jit/regalloc.cpp	Allows register allocation candidacy for `TYP_HALF`.
src/coreclr/jit/namedintrinsiclist.h	Adds `NI_System_Half_*` named intrinsics and expands HW intrinsic macro shape.
src/coreclr/jit/morph.cpp	Updates struct/SIMD size checks to include accelerated types; excludes `TYP_HALF` from struct promotion.
src/coreclr/jit/lsraxarch.cpp	Extends LSR handling for new AVX10v1 FP16 FMA scalar intrinsic.
src/coreclr/jit/lsrabuild.cpp	Ensures return handling includes `TYP_HALF` in float return candidates.
src/coreclr/jit/lowerxarch.cpp	Adds lowering for AVX10v1 half-compare helpers and updates scalar base-type asserts.
src/coreclr/jit/lower.cpp	Treats `TYP_HALF` similarly to SIMD for some lowering paths; excludes from FP store retyping.
src/coreclr/jit/lclvars.cpp	Updates struct promotion helper to use accelerated-type sizing predicate.
src/coreclr/jit/instrsxarch.h	Updates instruction metadata/flags for FP16 scalar ops and defines AVX10v1 FMA range markers.
src/coreclr/jit/instr.cpp	Selects `INS_vmovsh` for 2-byte FP-reg load/store/copy (`TYP_HALF`) on xarch.
src/coreclr/jit/importercalls.cpp	Adds importer expansions for `System.Half` ops/conversions/properties to AVX10v1 scalar intrinsics; adjusts Half arg normalization.
src/coreclr/jit/importer.cpp	Extends struct normalization logic to treat intrinsic 2-byte `Half` as accelerated `TYP_HALF`.
src/coreclr/jit/hwintrinsiccodegenxarch.cpp	Enables AVX10v1 family codegen path and relaxes base-type asserts for `TYP_HALF`.
src/coreclr/jit/hwintrinsic.h	Expands instruction table storage on xarch to include a `TYP_HALF` instruction slot.
src/coreclr/jit/hwintrinsic.cpp	Updates HW intrinsic macro expansion and type-range checks to include `TYP_HALF`.
src/coreclr/jit/gentree.h	Allows `TYP_HALF` in some floating-constant assertions and adds vector-constant population for half lanes.
src/coreclr/jit/gentree.cpp	Extends zero constants, scalar create, to-scalar asserts, and embedded rounding handling to include `TYP_HALF`.
src/coreclr/jit/float16.h	Adds shared `float16_t` typedef for JIT components without relying on C++23.
src/coreclr/jit/emitxarch.cpp	Extends xarch emitter for AVX10v1 ranges, EVEX prefix maps, `vmovsh`, and perf scoring for FP16 instructions.
src/coreclr/jit/emit.h	Adds perf-score throughput constants used by new FP16 perf modeling.
src/coreclr/jit/compiler.h	Adds Half intrinsic helper declarations and renames SIMD-size predicate to “accelerated”.
src/coreclr/jit/compiler.cpp	Implements `isNativeHalfStructType` and uses it to map 2-byte structs to `TYP_HALF` when applicable.
src/coreclr/jit/codegenxarch.cpp	Treats `TYP_HALF` like floating for return registers and stack arg emission in key paths.
src/coreclr/jit/codegencommon.cpp	Updates struct-return assertions to allow `TYP_HALF` special casing.
src/coreclr/jit/abi.cpp	Maps 2-byte ABI passing segments to `TYP_HALF`.

Copilot · 2026-02-19T21:05:08Z

src/coreclr/jit/float16.h

+ Both simd.cpp, gentree.cpp, and utils.cpp need a definition of float16_t
+ but do not share a common header.
+
+ Defining here so as to not create accidental implict include dependencies.


Typo in comment: "implict" should be "implicit".

Suggested change

Defining here so as to not create accidental implict include dependencies.

Defining here so as to not create accidental implicit include dependencies.

Copilot · 2026-02-19T21:05:08Z

src/coreclr/jit/importercalls.cpp

+            case NI_System_Half_FusedMultiplyAdd:
+            {
+#if defined(TARGET_XARCH)
+                if (compOpportunisticallyDependsOn(InstructionSet_AVX10v1))
+                {
+                    // We are constructing a chain of intrinsics similar to:
+                    //    return FMA.MultiplyAddScalar(
+                    //        Vector128.CreateScalarUnsafe(x),


New JIT intrinsic expansion for System.Half is introduced here (AVX10v1-based lowering), but the PR doesn't add corresponding JIT/HardwareIntrinsics tests. Please add targeted tests (correctness + codegen) under the existing AVX10v1 HW-intrinsics test projects so regressions/call-conv mismatches are caught.

Copilot · 2026-02-19T21:05:08Z

src/libraries/System.Private.CoreLib/src/System/Half.cs

 using System.Numerics;
 using System.Runtime.CompilerServices;
 using System.Runtime.InteropServices;
+using System.Runtime.Intrinsics;


using System.Runtime.Intrinsics; appears unused in this file (the [Intrinsic] attribute comes from System.Runtime.CompilerServices). Unused using directives typically trigger CS8019, which may fail the build if warnings are treated as errors.

Consider removing this using unless another Intrinsics type is going to be referenced from Half.cs.

Copilot · 2026-02-19T21:05:09Z

src/coreclr/jit/importercalls.cpp

+        if (sigType == TYP_STRUCT)
+        {
+            var_types normSigType = impNormStructType(classHnd);
+            sigType == (normSigType == TYP_HALF) ? TYP_HALF : sigType;


sigType is not updated here because the ternary expression uses == instead of assignment. This means Half arguments will still be treated as TYP_STRUCT, which can cause incorrect implicit-coercion decisions and GDV incompatibility checks.

Change this to an assignment (and keep the conditional expression) so the signature type is actually normalized to TYP_HALF when appropriate.

Suggested change

sigType == (normSigType == TYP_HALF) ? TYP_HALF : sigType;

sigType = (normSigType == TYP_HALF) ? TYP_HALF : sigType;

Copilot · 2026-02-19T21:05:09Z

src/coreclr/jit/valuenum.cpp

+// todo-half: we need to make this work properly
+ValueNum ValueNumStore::VNForHalfCon(float cnsVal)
+{
+    return VnForConst(cnsVal, GetFloatCnsMap(), TYP_HALF);
+}


VNForHalfCon currently uses GetFloatCnsMap() as its lookup table. This will cause Half constants and float constants with the same bit pattern to share the same ValueNum, which breaks the invariant that a VN’s constant storage type matches its var_types (e.g., you can end up with a VN allocated in a TYP_FLOAT chunk but later used as TYP_HALF).

Introduce a dedicated Half constant map (keyed by the 16-bit Half payload or a distinct key type) so Half constants cannot collide with float constants.

Copilot · 2026-02-19T21:05:09Z

src/coreclr/jit/utils.cpp

+static constexpr uint64_t HALF_POSITIVE_INFINITY_BITS = 0x7C00;
+static constexpr uint64_t HALF_NEGATIVE_INFINITY_BITS = 0xFC00;
+


HALF_POSITIVE_INFINITY_BITS/HALF_NEGATIVE_INFINITY_BITS are declared as uint64_t but are returned from helpers that return float16_t (uint16_t). This introduces implicit narrowing conversions that are likely to trigger warnings (and may fail the build under /WX).

Consider making these constants uint16_t (or explicitly casting at the return sites) so the return type matches without narrowing.

github-actions bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Dec 18, 2025

dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Dec 18, 2025

build-analysis bot mentioned this pull request Dec 19, 2025

[android][arm64] System.Net.Sockets.Tests.SendTo_SyncForceNonBlocking.Datagram_UDP_ShouldImplicitlyBindLocalEndpoint fails with NetworkUnreachable #120526

Open

jakobbotsch reviewed Jan 5, 2026

View reviewed changes

src/coreclr/jit/codegencommon.cpp Show resolved Hide resolved

anthonycanino force-pushed the half-xmm-struct-abi branch from 3b8abaa to f633726 Compare January 5, 2026 19:52

This was referenced Jan 5, 2026

[mono] mono_thread_info_install_interrupt: previous_token should be INTERRUPT_STATE #122669

Open

iOS.Device test WorkItemExecutions #122874

Open

tannergooding reviewed Jan 6, 2026

View reviewed changes

src/coreclr/jit/codegenxarch.cpp Show resolved Hide resolved

tannergooding reviewed Jan 6, 2026

View reviewed changes

src/coreclr/jit/compiler.cpp Show resolved Hide resolved

tannergooding reviewed Jan 6, 2026

View reviewed changes

src/coreclr/jit/importer.cpp Show resolved Hide resolved

tannergooding reviewed Jan 6, 2026

View reviewed changes

build-analysis bot mentioned this pull request Jan 8, 2026

slow macOS - "##[error]The job running on agent Azure Pipelines 9 ran longer than the maximum time of 60 minutes." dotnet/dnceng#1883

Open

3 tasks

This was referenced Jan 8, 2026

The Operation will be canceled. The next steps may not contain expected logs. dotnet/dnceng#3008

Open

[android-arm64] The Operation will be canceled. The next steps may not contain expected logs. dotnet/dnceng#6408

Open

anthonycanino force-pushed the half-xmm-struct-abi branch from 3537f96 to 4235f30 Compare January 13, 2026 21:39

dotnet-policy-service bot closed this Feb 12, 2026

anthonycanino reopened this Feb 19, 2026

anthonycanino added 2 commits February 19, 2026 09:47

Prototype changes for TYP_HALF.

511f342

Attempting Half VM ABI edits.

9b0edca

Copilot AI review requested due to automatic review settings February 19, 2026 20:56

anthonycanino force-pushed the half-xmm-struct-abi branch from af542eb to 9b0edca Compare February 19, 2026 20:56

Copilot started reviewing on behalf of anthonycanino February 19, 2026 20:58 View session

Copilot AI reviewed Feb 19, 2026

View reviewed changes

		// kmov instructions reach this path with EA_8BYTE size, even on x86
		\|\| IsKMOVInstruction(ins)

		// todo-half: this is only to create zero constant half nodes for use in instrincis, anything
		// else will not work

	Defining here so as to not create accidental implict include dependencies.
	Defining here so as to not create accidental implicit include dependencies.

	sigType == (normSigType == TYP_HALF) ? TYP_HALF : sigType;
	sigType = (normSigType == TYP_HALF) ? TYP_HALF : sigType;

		static constexpr uint64_t HALF_POSITIVE_INFINITY_BITS = 0x7C00;
		static constexpr uint64_t HALF_NEGATIVE_INFINITY_BITS = 0xFC00;

Conversation

anthonycanino commented Dec 18, 2025

Uh oh!

anthonycanino commented Dec 18, 2025

Uh oh!

Uh oh!

anthonycanino commented Jan 6, 2026

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tannergooding Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

anthonycanino commented Jan 8, 2026

Uh oh!

anthonycanino commented Jan 9, 2026

Uh oh!

tannergooding commented Jan 9, 2026

Uh oh!

dotnet-policy-service bot commented Feb 12, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

tannergooding Jan 6, 2026 •

edited

Loading