Skip to content

[Backport to LLVM 17] Reland "Remove CoordX and CoordY arguments of Of OpCooperativeMatrixPrefetchINTEL" (#2560) (#3637)#3646

Closed
obrotowy wants to merge 151 commits intoKhronosGroup:mainfrom
obrotowy:interim-cp
Closed

[Backport to LLVM 17] Reland "Remove CoordX and CoordY arguments of Of OpCooperativeMatrixPrefetchINTEL" (#2560) (#3637)#3646
obrotowy wants to merge 151 commits intoKhronosGroup:mainfrom
obrotowy:interim-cp

Conversation

@obrotowy
Copy link

Original PR: #2560

svenvh and others added 30 commits August 2, 2023 18:28
Co-authored-by: Stanley Gambarin <stanley.gambarin@intel.com>
…NTEL capability after Headers change (KhronosGroup#2258) (KhronosGroup#2308)

* Bump SPIRV-Headers to 1c6bb2743599e6eb6f37b2969acc0aef812e32e3
* replace internal SPV_INTEL_long_composites ext with the published SPV_INTEL_long_composites
* don't rename extension for now
This closes: KhronosGroup#2261

Co-authored-by: Viktoria Maximova <viktoria.maksimova@intel.com>
Co-authored-by: Wlodarczyk, Bertrand <bertrand.wlodarczyk@intel.com>
…pirv translator (KhronosGroup#2210) (KhronosGroup#2334)

This PR aims to add f16 type support for atomicrmw in llvm-spirv translator, with the reference to the extension documented in [1].
There are two concerns related to the subject:

SPIRVAtomicFAddEXTInst::getRequiredExtension() should return a list of required extension to support the requirement to list both SPV_EXT_shader_atomic_float16_add and SPV_EXT_shader_atomic_float_add extensions in the module (see "Extension Name" section of the ref [1]). However, the return type is std::optional<ExtensionID> and returning a vector would need a bigger rework.
Including SPV_EXT_shader_atomic_float16_add into --spirv-ext argument of llvm-spirv doesn't result in producing the correspondent capability (AtomicFloat16AddEXT) and extension in a SPIRV output.
$ llvm-spirv AtomicFAddEXT.ll.tmp.bc --spirv-ext=+SPV_EXT_shader_atomic_float_add,+SPV_EXT_shader_atomic_float16_add -o AtomicFAddEXT.ll.tmp.spv
$ llvm-spirv -to-text AtomicFAddEXT.ll.tmp.spv -o /dev/stdout
...
2 Capability AtomicFloat32AddEXT
2 Capability AtomicFloat64AddEXT
9 Extension "SPV_EXT_shader_atomic_float_add"
...
This prevents extending the test case of AtomicFAddEXT.ll in EXT/SPV_EXT_shader_atomic_float.

References:
[1] https://github.com/KhronosGroup/SPIRV-Registry/blob/main/extensions/EXT/SPV_EXT_shader_atomic_float16_add.asciidoc

Co-authored-by: Vyacheslav Levytskyy <89994100+VyacheslavLevytskyy@users.noreply.github.com>
…roup#2273) (KhronosGroup#2336)

Return type of image read and Texel type of image write builtins may be
unsigned. Before this PR, the builtin names in SPIR-V Friendly IR were
always mangled with signed type.

(cherry picked from commit e9b95fb)
…Expression (KhronosGroup#2326)

Ensure that SPIR-V that uses a DebugGlobalVariable's Variable field to hold an Expression
can be reverse translated. A Variable field can be used to hold an Expression in order to
preserve a DIExpression in a DIGlobalVariableExpression in LLVM IR.

Signed-off-by: Lu, John <john.lu@intel.com>
)

The SPIR-V Specification allows `OpConstantNull` types to be scalar or
vector booleans, integers, or floats.  Update an assert for this and
add a SPIR-V -> LLVM IR test.

(cherry picked from commit 9ec969c)
This commit implements bidirectional translation of the llvm.uadd.with.overflow and the IAddCarry intrinsic.
Intrinsic llvm.uadd.with.overflow returns struct which second element have a type of i1.
The llvm type i1 is, in llvm-spirv, directly translated to BoolType.
SPIRV specification requires that the composite which returns from IAddCarry needs to have both elements of the same type.
In result, current implementation is not compliant and should be considered temporary.
This commit implements bidirectional translation of the llvm.usub.with.overflow and the ISubBorrow intrinsic.
Intrinsic llvm.usub.with.overflow returns struct which second element have a type of i1.
The llvm type i1 is, in llvm-spirv, directly translated to BoolType.
SPIRV specification requires that the composite which returns from ISubBorrow needs to have both elements of the same type.
In result, current implementation is not compliant and should be considered temporary.
…p#2381) (KhronosGroup#2386)

Usually sret parameters are accessed by a memory instruction, from which
would tell SPIRVTypeScavenger which type to use for this function
parameter. But if sret parameter is unused later in the module scavenger
would fail attempting to deduce type from the mangled name.

Signed-off-by: Sidorov, Dmitry <dmitry.sidorov@intel.com>
…hronosGroup#2391) (KhronosGroup#2410)

The SPIR-V to LLVM conversion would bail out when encountering an
`OpVectorShuffle` whose vector operands differ in size.  SPIR-V
allows differing vector sizes, but LLVM's `shufflevector` does not.

Remove the assert and insert an additional `shufflevector` to align
the vector operands when needed.

(cherry picked from commit 3df5e38)
…ing load types (KhronosGroup#2160) (KhronosGroup#2245)

In some cases, we will see IR with the following

@__spirv_BuiltInGlobalInvocationId = external dso_local local_unnamed_addr addrspace(1) constant <3 x i64>, align 32

...

%0 = load <6 x i32>, ptr addrspace(1) @__spirv_BuiltInGlobalInvocationId, align 32
%1 = extractelement <6 x i32> %0, i64 0
Note the global type and load type are different. Change the handling of vector loads from vector globals to reconstruct the global vector type and then bitcast to the load type.

Thanks to @jcranmer-intel for helping me find the simplest solution.

Co-authored-by: Nick Sarnie <sarnex@users.noreply.github.com>
…or debug types (KhronosGroup#2341)

OpenCL and NonSemantic DebugInfo specifications are flexible in terms of allowing any debug information be replaced with DebugInfoNone, so various of SPIR-V producers follow that and generate it for base types of several debug instructions, leaving SPIR-V consumers to handle this. By default the translator replaces missing debug info with tag: null, which is in most cases correct. Yet, there are situations, where it's not allowed by both LLVM and DWARF, for example for DW_TAG_array_type DWARF spec sets, that DW_AT_type attribute is mandatory. For such cases new transNonNullDebugType wrapper function was added to the translator, generating "DIBasicType(tag: DW_TAG_unspecified_type, name: "SPIRV unknown type")" where DebugInfoNone was used as the type. This function doesn't replace all calls to transDebugInst<DIType> as there are cases, where we can generate null type, for example DWARF doesn't require it for DW_TAG_typedef, hence I'm not changing translation flow in this case. Additionally to this, while DWARF requires type attribute for DW_TAG_pointer_type, LLVM does not, hence I'm not changing translation flow in this case as well.

Signed-off-by: Sidorov, Dmitry <dmitry.sidorov@intel.com>
It should have tested DebugInfoNone base type

Signed-off-by: Sidorov, Dmitry <dmitry.sidorov@intel.com>
…sGroup#2440)

If a global is initialized with a ConstantExpr returning a pointer
(getelementptr), the right type has to be deduced instead of defaulting
it to i8*.
…2455)

The fix adds support for IR builtin calls
__spirv_IAddCarry and __spirv_ISubBorrow.
It's also first part of fix which removes noncompliance of uadd/sub_with_overflow intrinsics.
SPIRVUtil changes are needed to support situations where builtin don't have corresponding store instruction.

Co-authored-by: bwlodarcz <bertrand.wlodarczyk@intel.com>
Changes were cherry-picked from the following commit:
KhronosGroup@c6fe12b

Also cherry picked fixes from:
KhronosGroup#2208
KhronosGroup#2192

This changes add SPIR-V translator support for the SPIR-V extension documented here: KhronosGroup/SPIRV-Registry#193.
This extension adds one decoration to represent maximum error for FP operations and adds the related Capability.
SPIRV Headers support for representing this in SPIR-V: KhronosGroup/SPIRV-Headers#363

intel/llvm#8134 added a new call-site attribute associated with FP builtin intrinsics. This attribute is named 'fpbuiltin-max-error'.
Following example shows how this extension is supported in the translator. The input LLVM IR uses new LLVM builtin calls to represent FP operations. An attribute named 'fpbuiltin-max-error' is used to represent the max-error allowed in the FP operation.
Example
Input LLVM:
%t6 = call float @llvm.fpbuiltin.sin.f32(float %f1) KhronosGroup#2
attributes KhronosGroup#2 = { "fpbuiltin-max-error"="2.5" }

This is translated into a SPIR-V instruction (for add/sub/mul/div/rem) and OpenCl extended instruction for other instructions. A decoration to represent the max-error is attached to the SPIR-V instruction.

SPIR-V code:
4 Decorate 97 FPMaxErrorDecorationINTEL 1075838976
6 ExtInst 2 97 1 sin 88

No new support is added to support translating this SPIR_V back to LLVM. Existing support is used. The decoration is translated back into named metadata associated with the LLVM instruction. This can be readily consumed by backends.

Based on input from @andykaylor, we emit attributes when the FP operation is translated back to a call to a builtin function and emit metadata otherwise.

Translated LLVM code for basic math functions (add/sub/mul/div/rem):
%t6 = fmul float %f1, %f2, !fpbuiltin-max-error !7
!7 = !{!"2.500000"}

Translated LLVM code for other math functions:
%t6 = call spir_func float @_Z3sinf(float %f1) KhronosGroup#3
attributes KhronosGroup#3 = { "fpbuiltin-max-error"="4.000000" }
…hronosGroup#2463)

This PR aims to introduce CooperativeMatrixPrefetchINTEL capability and operation, and make initial introduction of entities in llvm-spirv translator.

Co-authored-by: Vyacheslav Levytskyy <89994100+VyacheslavLevytskyy@users.noreply.github.com>
…up#2277)

Add support for load/store operations for a cooperative matrix such that original matrix shape is known and implementations are able to reason about how to deal with the out of bounds.

CapabilityCooperativeMatrixCheckedInstructionsINTEL = 6192
CooperativeMatrixLoadCheckedINTEL = 6193
CooperativeMatrixStoreCheckedINTEL = 6194
…p#2174) (KhronosGroup#2492)

This change is basically an update of KhronosGroup#1389 for spec changes.

Implementation of the feature was based on Intel extension which was not officially published to Khronos.
Now it has been split, updated, and published to Khronos by KhronosGroup/SPIRV-Registry#205

Summary of the things that have changed:

Capability names and a new capability was added
Values for decorations have been updated to enums
Decoration names and IDs have been changed
Specs:
https://github.com/KhronosGroup/SPIRV-Registry/blob/main/extensions/INTEL/SPV_INTEL_global_variable_fpga_decorations.asciidoc https://github.com/KhronosGroup/SPIRV-Registry/blob/main/extensions/INTEL/SPV_INTEL_global_variable_host_access.asciidoc

Co-authored-by: Viktoria Maximova <viktoria.maksimova@intel.com>
…k with legacy TypeJointMatrixINTEL (KhronosGroup#2318) (KhronosGroup#2518)

This PR is to ensure that CooperativeMatrixLoadCheckedINTEL, CooperativeMatrixStoreCheckedINTEL and CooperativeMatrixConstructCheckedINTEL instructions can correctly accept TypeJointMatrixINTEL as an input during both forward and reverse translation.

Co-authored-by: Vyacheslav Levytskyy <89994100+VyacheslavLevytskyy@users.noreply.github.com>
…hronosGroup#2490) (KhronosGroup#2523)

Fixed ParentIdx was mismatched for DebugTypeInheritance type in context of NonSemantic.Shader.DebugInfo.100. That lead to heap buffer overflow in SPIRVExtInst::getExtOp getter when instruction was incorrectly casted
to SPIRVExtInst as parent of the culprit instruction in SPIRVToLLVMDbgTran::getDIBuilder. The missmatch happen because in previous used standard OpenCL.DebugInfo.100 DebugTypeInheritance had Child field as zero indexed argument. In newer standard that field is removed.
…2529) (KhronosGroup#2538)

When a temporary `OpForward` instruction is needed during translation
to SPIR-V, do not add the decorations yet, as that would result in
duplicate decorations when the actual instruction is visited and the
`OpForward` is replaced by a real SPIR-V instruction.

The SPIR-V Validator has recently started checking for duplicate
decorations; this fixes some but not all issues arising from the new
checks.

Contributes to KhronosGroup#2509

(cherry picked from commit a278313)
…2537) (KhronosGroup#2550)

The SPIR-V Validator has recently started checking for duplicate
decorations.  This commit fixes duplicate Alignment decorations that
affected the `test/read_image.cl` test.

Alignment decorations have two potential sources during LLVM to SPIR-V
translation: the instruction's alignment property and
`spirv.Decorations` metadata.  Handle both of these through the
`setAlignment` method, so that duplicates can be avoided.

Calling `setAlignment` with different alignments for the same entity
is probably an error, so add an assert.

Contributes to KhronosGroup#2509

(cherry picked from commit 926ca2a)
…riable extensions (KhronosGroup#2228) (KhronosGroup#2573)

This is done to provide smooth drivers transition to official
extensions.
Also updated tests for new extensions to use new tokens after
KhronosGroup/SPIRV-Headers@cca08c6

Co-authored-by: Viktoria Maximova <viktoria.maksimova@intel.com>
github-actions bot and others added 28 commits September 26, 2025 16:46
Backport of PR KhronosGroup#3355 into `llvm_release_170`.

All commits applied cleanly.

Signed-off-by: Sidorov, Dmitry <dmitry.sidorov@intel.com>
Co-authored-by: Dmitry Sidorov <dmitry.sidorov@intel.com>
…Group#3370) (KhronosGroup#3381)

This extension adds predicated load and store instructions. Predicated
load performs load from memory if predicate is true; otherwise, it uses
default_value as a result. Predicated store performs store of value to
memory if predicate is true; otherwise, it does nothing.

Spec: intel/llvm#20158

Signed-off-by: Zhang, Yixing <yixing.zhang@intel.com>
…builtin ptr arg type. (KhronosGroup#3390) (KhronosGroup#3396)

Add missing case so that pointer argument types for
`OpAtomicCompareExchangeWeak` are resolved correctly. The newly added
test, without this patch would introduce an invalid bitcast just before
the atomic builtin.
…3424)

This continues KhronosGroup#3343 and reflects specification update, including extension renaming. 
Specification: intel/llvm#20009
…R calls (KhronosGroup#3398)

Map all cooperative matrix type conversions to SPIR-V friendly IR calls,
regardless of the environment specified.

In particular, do not attempt to map such conversions to the OpenCL
`convert` builtin. The SPIR-V TargetExtType is already encoded in the
function suffix, so the previous translation was an odd hybrid between
OpenCL and SPIR-V friendly IR.

(cherry picked from commit 9d56f01)
…#3374)

Way they are implemented is described in:
KhronosGroup#3221

The PR also adds SPV_EXT_float8 extension and packed conversions for
SPV_INTEL_int4

Currently only conversion instructions (and internal builtins) are
covered.

TODO: in the following PR Saturation decoration will be added.

Signed-off-by: Sidorov, Dmitry <dmitry.sidorov@intel.com>

---------

Signed-off-by: Sidorov, Dmitry <dmitry.sidorov@intel.com>
…ons extensions (KhronosGroup#3419)

As well as their appropriate conversions via __builtin_spirv mechanism.

Specification: intel/llvm#20467

Signed-off-by: Dmitry Sidorov <dmitry.sidorov@intel.com>
…hronosGroup#3488)

The extension adds support for the `OpFmaKHR` instruction, which
provides a native SPIR-V instruction for fused multiply-add operations
as an alternative to using OpenCL.std::Fma extended instruction.

Translate both LLVM fma intrinsics as well as OCL builtins to `OpFmaKHR`
if the extension is available.

Specification:

https://github.khronos.org/SPIRV-Registry/extensions/KHR/SPV_KHR_fma.html
…KhronosGroup#3504)

Previous check in 199d2e0 translated __spirv_ocl_fmax to FMA.

(cherry picked from commit 64b7a07)
KhronosGroup#3512)

Instead of the hardcoded `-j2`, use the number of available processing
units.

Fixes KhronosGroup#3481

(cherry picked from commit f20a37d)
Changed in llvm-project commit f0fa2d7c292853b79b5bcd16be97940859a800ec
Changed in llvm-project commits:
- bb7feae218745c666718b7a16b1f57d0d2165cf1
- 3b01fa264c36c1c7bb293f001579e0f459a92b84
- 61e1c3d80db6e94e8b5b83b3819afefeec4d357b
Changed in llvm-project commit ee19fabc984747b0ce971d1d47662d89b63fa0ab
…ositeConstruct"

* Fix reverse translation of non-constant values of OpCompositeConstruct (KhronosGroup#2256)

This patch introduces a way to use runtime values for structure fields.

Co-authored-by: Arvind Sudarsanam <arvind.sudarsanam@intel.com>

* Fix reverse translation of non-constant values of OpCompositeConstruct pt.2 (KhronosGroup#2296)

This patch introduces a way to use runtime values for array and vector
types. It continues KhronosGroup#2256

---------

Co-authored-by: Viktoria Maximova <viktoria.maksimova@intel.com>
Co-authored-by: Arvind Sudarsanam <arvind.sudarsanam@intel.com>
…loat16ArithmeticINTEL` capability (KhronosGroup#3577)

Backport of PR KhronosGroup#3567 into `llvm_release_170`.

All commits applied cleanly.

Co-authored-by: Viktoria Maximova <viktoria.maksimova@intel.com>
…upMatrixMultiplyAccumulateINTEL (KhronosGroup#3609) (KhronosGroup#3631)

Extend SubgroupMatrixMultiplyAccumulateINTEL to support packed 4-bit and
8-bit floating-point matrix operands by implementing extensions:
- SPV_INTEL_subgroup_matrix_multiply_accumulate_float4
- SPV_INTEL_subgroup_matrix_multiply_accumulate_float8

These extensions add operand flags that interpret packed integer data as
FP4/FP8 without requiring actual FP4/FP8 type support added by
SPV_INTEL_float4 or SPV_EXT_float8.

FP4 operands:
`MatrixAPackedFloat4E2M1INTEL` (0x40000) /
`MatrixBPackedFloat4E2M1INTEL` (0x80000)
FP8 operands:
`MatrixAPackedFloat8E4M3INTEL` (0x4000) / `MatrixBPackedFloat8E4M3INTEL`
(0x8000)
`MatrixAPackedFloat8E5M2INTEL` (0x10000) /
`MatrixBPackedFloat8E5M2INTEL` (0x20000)

Specs:
https://github.com/intel/llvm/blob/sycl/sycl/doc/design/spirv-extensions/SPV_INTEL_subgroup_matrix_multiply_accumulate_float4.asciidoc
https://github.com/intel/llvm/blob/sycl/sycl/doc/design/spirv-extensions/SPV_INTEL_subgroup_matrix_multiply_accumulate_float8.asciidoc

Co-authored-by: Viktoria Maximova <viktoria.maksimova@intel.com>
…e upon mismatch (KhronosGroup#3255) (KhronosGroup#3612)

The source element type used in a GEP may differ from the actual type of
the pointer operand (e.g., ptr i8 vs. ptr [N x T]). This mismatch can
lead to incorrect address computations during translation to SPIR-V of
GEP used in constexpr context, which requires that pointer types match
the type of the object being accessed.

This patch inserts an explicit bitcast to convert the GEP pointer
operand to the expected type, derived from the GEP’s source element
type, before emitting an PtrAccessChain. This ensures the resulting
SPIR-V instruction has a correctly typed base pointer and produces valid
indexing behavior.

For example:
Before this change, the following GEP was translated incorrectly:
getelementptr(i8, ptr addrspace(1) @a_var, i64 2)
Whereas this nearly equivalent GEP was handled correctly: getelementptr
inbounds ([2 x i8], ptr @a_var, i64 0, i64 1)

Previously, the first form was incorrectly interpreted as: getelementptr
inbounds ([2 x i8], ptr @a_var, i64 0, i64 2)

(cherry picked from commit 1be9366)

Co-authored-by: Karol Zwolak <karolzwolak7@gmail.com>
…anslation (KhronosGroup#3524)

Co-authored-by: Dmitry Sidorov <dmitry.sidorov@intel.com>
…KhronosGroup#3542)

Backport of PR KhronosGroup#3529 into `llvm_release_170`.

All commits applied cleanly.

Co-authored-by: Viktoria Maximova <viktoria.maksimova@intel.com>
…o 1-length arrays in SPIR-V (KhronosGroup#3546)

Backport of PR KhronosGroup#2743 into `llvm_release_170`.

All commits applied cleanly.

Co-authored-by: Lorenc Bushi <lorenc.bushi@intel.com>
This reverts commit 310c785.

Previously reverted before implementing fpclass llvm intrinsic lowering.
…td lib"

This reverts commit 4619248.

Previously reverted before implementing fpclass llvm intrinsic lowering.
@obrotowy obrotowy closed this Mar 12, 2026
@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
9 out of 12 committers have signed the CLA.

✅ jaladreips
✅ pvelesko
✅ YuriPlyakhin
✅ vsemenov368
✅ jzadnik
✅ YixingZhang007
✅ KanclerzPiotr
✅ bashbaug
✅ obrotowy
❌ amielcza
❌ michalkr52
❌ Dmitry Sidorov


Dmitry Sidorov seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.