Obj2yaml/root descriptor #129096

joaosaffran · 2025-02-27T18:28:26Z

No description provided.

The TOSA specification allows the zero point of conv ops to be variable when the dynamic extension is being used, but information about which extensions are in use is only known when the validation pass is run. A variable zero point should be allowed in the conv ops verifiers. In terms of testing, there didn't seem to be an existing set of tests for the verifiers to add this check to, so the opportunity has been taken to run the verifiers on the tests in `ops.mlir`. Since the conv2d test there had variable zero points, this change in functionality is being tested. Signed-off-by: Luke Hutton <[email protected]> Co-authored-by: Georgios Pinitas <[email protected]>

- Removed assertion for duplicate values as adding them is valid. - Fix parsing: reject strings for unknown tags, allow any value for Tag_PAuth_Platform and Tag_PAuth_Schema. - Print tags by using numbers with comments to reduce compiler-assembler dependencies. - Parsing error messages now only point to the symbol (^) instead of printing it.

Fixes llvm#127271 Testing mostly done in Compiler Explorer https://godbolt.org/z/q1h3ohxr7

…vm#126529) This patch teaches optimizeExtendOrTruncateConversion to bail out if the user of a zero-extend is a partial reduction intrinsic that we know will get lowered efficiently to a udot instruction.

As per LLVM coding standards "Variable names should be nouns (as they represent state). The name should be camel case, and start with an upper case letter (e.g. Leader or Boats)."

This patch adds MLIR to LLVM IR translation support for standalone `omp.distribute` operations, as well as `distribute simd` through ignoring SIMD information (similarly to `do/for simd`). Co-authored-by: Dominik Adamski <[email protected]>

…ts (llvm#127818) This patch adds codegen for `kmpc_dist_for_static_init` runtime calls, used to support worksharing a single loop across teams and threads. This can be used to implement `distribute parallel for/do` support.

- Added sin/cos testcases. - Added i686 checks for all testcases. - Moved fp16 and fp128 cases into separate files. - Dropped tests for ppc_fp128 type. - Added global-isel runs as precommit testing for llvm#126931

This patch adds support for translating composite `omp.parallel` + `omp.distribute` + `omp.wsloop` loops to LLVM IR on the host. This is done by passing an updated `WorksharingLoopType` to the call to `applyWorkshareLoop` associated to the lowering of the `omp.wsloop` operation, so that `__kmpc_dist_for_static_init` is called at runtime in place of `__kmpc_for_static_init`. Existing translation rules take care of creating a parallel region to hold the workshared and workdistributed loop.

These patterns represent rev instructions, which reverse inside a portion of the full vector. See llvm/test/CodeGen/AArch64/arm64-rev.ll for codegen tests.

…llvm#127820) This patch splits off the calculation of canonical loop trip counts from the creation of canonical loops. This makes it possible to reuse this logic to, for instance, populate the `__tgt_target_kernel` runtime call for SPMD kernels. This feature is used to simplify one of the existing OpenMPIRBuilder tests.

This patch implements MLIR to LLVM IR translation of host-evaluated loop bounds, completing initial support for `target teams distribute parallel do [simd]` and `target teams distribute [simd]`.

…lvm#127822) This patch adds `target teams distribute [simd]` and equivalent construct nests to the list of cases where loop bounds can be evaluated in the host, as they represent kernels for which the trip count must also be evaluated in advance to the kernel call.

This is similar to what we do in the AddOffset instruction when adding an offset to a pointer.

…#127624) * Document the remaining test cases, add a note that these are exercising `TransferOpReduceRank` (addresses an existing TODO). * Add missing cases (for fixed-width and scalable vectors). * Remove scalable vectors from the negative test (the masked case) - this test will also fail with fixed-width vectors. For consistency, lets make all negative test use fixed-width vectors.

This is mostly true, and it tricks the rematerialization code into handling this without special casing it.

…_size (llvm#128692) Comparing the case where each dimension is used alone, the only codegen difference is a missed addressing mode fold for the constant offset in the old version due to an ancient bug.

This commit also enables fp16 log, which was previously missing. Other than that, no changes to codegen for AMDGPU/Nvidia targets. Note that for simplicity this commit doesn't try to refactor or optimize the implementations. Notably, each log is only implementated for scalar types; vector types are scalarized. It doesn't look too difficult to make the implementations suitable for vector codegen, so I'll try that in a future commit. There's also an unused implementation of log in clc_log_base.h, whereas the implementation currently used by libclc targets re-uses log2 with an additional multiplication. That should also be cleaned up as on first inspection it looks a more optimal implementation, though it would have to be checked against the OpenCL CTS for good measure.

This fixes the expected output to match the one of the current interpreter.

Both steakhal and balazs-benics-sonarsource accounts are mine. See llvm#125859

…ync (llvm#125433) If the creation of a thread fails, this causes an idle loop that will never end because the thread wasn't started in the first place. Fixes llvm#125428

…from combineX86ShufflesRecursively instead of computing it internally. NFC. Prep work toward better handling of shuffle combining across different vector widths.

Summary: This was missing the architecture macros as they were defined just below.

…lvm#128159) The SPIR-V Backend uses the same set of utility functions, mostly though not entirely from SPIRVGlobalRegistry, to generate gMIR and SPIR-V opcodes, depending on the current stage of translation. This is controlled by an explicit EmitIR flag rather than the current translation pass, and there are legacy pieces of code where the EmitIR flag is declared so that it has a default true value, allowing using utility functions without explicitly declaring their intent to work either in gMIR or in SPIR-V part of the lowering process. While it may be ok to leave this default EmitIR flag as is in generation of scalar integer/float types, as we don't expect to see any dependent opcodes derived from such OpTypeXXX instructions, using of EmitIR by default in aggregation types is a source of hidden logical flaws and actual issues. This PR provides a partial fix to the problem by removing default status of EmitIR, requiring a user call site to explicitly announce its intent to generate gMIR or SPIR-V code, fixes several cases of misuse of EmitIR, and, the most important, fixes a nasty logical error that breaks passing of actually asked EmitIR value by the default value in the middle of the chain of calls, in the `findSPIRVType` call. The latter error was a source of issues in the post-instruction selection pass that has been getting gMIR code where SPIR-V was explicitly requested due to overloaded with default parameters internal API in SPIRVGlobalRegistry (most notably, `findSPIRVType`).

…ow2` (llvm#128618) f80 is not a valid IEEE floating-point type. Closes llvm#128528.

…d` (llvm#128695) Address review comment llvm#128466 (comment) Compile-time impact: https://llvm-compile-time-tracker.com/compare.php?from=72781f58efddecee19feb07fec4e6104ef4c4812&to=3853aee61626b0eda06671b4cbbc4cdd1344440c&stat=instructions:u

The limited check lines make it difficult to reason about test changes in llvm#128375.

…llvm#127679) This patch changes the input_zp and weight_zp for convolution operators to be required inputs in order to align with the TOSA Spec 1.0. Convolution operators affected are: CONV2D, CONV3D, DEPTHWISE_CONV2D, and TRANSPOSE_CONV2D. Signed-off-by: Tai Ly <[email protected]>

This patch renames TOSA ReduceProd operator to ReduceProduct to align with the TOSA Spec 1.0 Signed-off-by: Tai Ly <[email protected]>

Adds targets for the stdbit functions. Since the names follow a strict pattern, this is done via list comprehensions. I don't want to handwrite all 50.

…8060) Enables 16-bit values to be spilled to scratch. Note, the memory instructions used are defined as reading and writing VGPR_32, but do not clobber the unspecified 16-bits of those registers, and so spills and reloads of lo and hi halves of the registers work.

…re current working directory (llvm#128446) This PR explicitly sets `DebugCompilationDir` to the system's root directory if it is safe to ignore the current working directory. This fixes a problem where a PCM file's embedded debug information can lead to compilation failure. The compiler may have decided it is indeed safe to ignore the current working directory. In this case, the PCM file's content is functionally correct regardless of the current working directory because no inputs use relative paths (see llvm#124786). However, a PCM may contain debug info. If debug info is requested, the compiler uses the current working directory value to set `DW_AT_comp_dir`. This may lead to the following situation: 1. Two different compilations need the same PCM file. 2. The PCM file is compiled assuming a working directory, which is embedded in the debug info, but otherwise has no effect. 3. The second compilation assumes a different working directory, and expects an identically-sized pcm file. However, it cannot find such a PCM, because the existing PCM file has been compiled assuming a different `DW_AT_comp_dir `, which is embedded in the debug info. This PR resets the `DebugCompilationDir` if it is functionally safe to ignore the working directory so the above situation is avoided, since all debug information will share the same working directory. rdar://145249881

If the buildvector has some matches with another node, which is a subvector of another buildvector node, need to check for this and cancel matching to avoid incorrect ordering of the nodes. Fixes llvm#128770

…128931)

…ine with `vector.extract` (llvm#128915) This is doing the same as llvm#117731 did for `vector.extract`, but for `vector.insert`. It is a bit more complicated as the insertion destination may itself need to be extracted. As the test shows, this fixes two previously unsupported cases: - Dynamic indices - 0-D vectors. --------- Signed-off-by: Benoit Jacob <[email protected]>

…attr. (llvm#125594) This commit adds support for casting memrefs into fat raw buffer pointers to the AMDGPU dialect. Fat raw buffer pointers - or, in LLVM terms, ptr addrspcae(7), allow encapsulating a buffer descriptor (as produced by the make.buffer.rsrc intrinsic or provided from some API) into a pointer that supports ordinary pointer operations like load or store. This allows people to take advantage of the additional semantics that buffer_load and similar instructions provide without forcing the use of entirely separate amdgpu.raw_buffer_* operations. Operations on fat raw buffer pointers are translated to the corresponding LLVM intrinsics by the backend. This commit also goes and and defines a #amdgpu.address_space<> attribute so that AMDGPU-specific memory spaces can be represented. Only #amdgpu.address_space<fat_raw_buffer> will work correctly with the memref dialect, but the other possible address spaces are included for completeness. --------- Co-authored-by: Jakub Kuderski <[email protected]> Co-authored-by: Prashant Kumar <[email protected]>

) Since LowerBufferFatPointers runs before PreISelIntrinsicLowering, which normally handles unsupported memcpy()s,, and since you can't have a `noalias {ptr addrspace(8), i32}` becasue it crashes later passes, manually expand memcpy()s involving buffer fat pointers to loops. Additionally, though they're unlikely to be used, this commit adds support for memset(). This commit doesn't implement writing direct-to-LDS loads as the intrinsics, but leaves the option in the future.

…ssible (llvm#128564) This change effectively reverts 296ccef (https://reviews.llvm.org/D77192) Most of these symbols are just normal C symbols that get imported from wither libcompiler-rt or from emscripten's JS library code. In most cases it should not be necessary to give them explicit import names. The advantage of doing this is that we can wasm-ld can/will fail with a useful error message when these symbols are missing. As opposed to today where it will simply import them and defer errors until later (when they are less specific).

) Reverts llvm#128144 Breaks clang prod x64 build (seen in Fuchsia toolchain)

…ing for the reduction If the operand of the instruction-to-be-removed is a reduction value, which is not reduced yet, and, thus, it has no users, it may be removed during operands analysis. Fixes llvm#128736

This separates out parsing of modulemaps from updating the `clang::ModuleMap` information. Currently this has no effect other than slightly changing diagnostics. Upcoming changes will use this to allow searching for modules without fully processing modulemaps. This creates a new `modulemap` namespace because there are too many things called ModuleMap* right now that mean different things. I'd like to clean this up, but I'm not sure yet what I want to call everything. This also drops the `SourceLocation` from `moduleMapFileRead`. This is never used in tree, and in future patches I plan to make the modulemap parser use a different `SourceManager` so that we can share modulemap parsing between `CompilerInstance`s. This will make the `SourceLocation` meaningless.

@xlauko

…llvm#128626) Currently, the llvm importer can only cover intrinsics that have a first class representation in an MLIR dialect (arm-neon, etc). This PR introduces a fallback mechanism that allow "unregistered" intrinsics to be imported by using the generic `llvm.intrinsic_call` operation. This is useful in several ways: 1. Allows round-trip the LLVM dialect output lowered from other dialects (example: ClangIR) 2. Enables MLIR-linking tools to operate on imported LLVM IR without requiring to add new operations to dozen of different targets (cc @xlauko @smeenai). If multiple dialects implement this interface hook, the last one to register is the one converting all unregistered intrinsics. --------- Co-authored-by: Tobias Gysi <[email protected]>

…lvm#126621)" This reverts commit 469757e. Multiple buildbot failures have been reported: llvm#126621

lhutton1 and others added 30 commits February 25, 2025 09:38

[clang][x86] Support -masm=intel in cpuid.h (llvm#127331)

547a8bc

Fixes llvm#127271 Testing mostly done in Compiler Explorer https://godbolt.org/z/q1h3ohxr7

[AArch64] Improve codegen for some fixed-width partial reductions (ll…

85cf958

…vm#126529) This patch teaches optimizeExtendOrTruncateConversion to bail out if the user of a zero-extend is a partial reduction intrinsic that we know will get lowered efficiently to a udot instruction.

[GVN][NFC] Match coding standards (llvm#128683)

2a0946b

As per LLVM coding standards "Variable names should be nouns (as they represent state). The name should be camel case, and start with an upper case letter (e.g. Leader or Boats)."

[X86][NFC] Added/Updated Trigonometric functions testcases (llvm#127094)

9fc2f78

- Added sin/cos testcases. - Added i686 checks for all testcases. - Moved fp16 and fp128 cases into separate files. - Dropped tests for ppc_fp128 type. - Added global-isel runs as precommit testing for llvm#126931

[AArch64] Add cost model for REV shuffles. (llvm#128498)

48397fe

These patterns represent rev instructions, which reverse inside a portion of the full vector. See llvm/test/CodeGen/AArch64/arm64-rev.ll for codegen tests.

[MLIR][OpenMP] Support target SPMD (llvm#127821)

29e1495

This patch implements MLIR to LLVM IR translation of host-evaluated loop bounds, completing initial support for `target teams distribute parallel do [simd]` and `target teams distribute [simd]`.

[clang][bytecode] Expand subscript base if of pointer type (llvm#128511)

dfa3af9

This is similar to what we do in the AddOffset instruction when adding an offset to a pointer.

AMDGPU: Mark v_mov_b64_pseudo as a VOP1 instruction (llvm#128677)

f95ad44

This is mostly true, and it tricks the rematerialization code into handling this without special casing it.

libclc: Stop using asm declarations for r600 on amdgcn for get_global…

b57e63b

…_size (llvm#128692) Comparing the case where each dimension is used alone, the only codegen difference is a missed addressing mode fold for the constant offset in the old version due to an ancient bug.

[ConstraintElim] Test for llvm#128588

6aeec5e

[clang][bytecode] Add special case for anonymous unions (llvm#128681)

dff2ca4

This fixes the expected output to match the one of the current interpreter.

[clang] Add alternative email for steakhal (llvm#128558)

70de57e

Both steakhal and balazs-benics-sonarsource accounts are mine. See llvm#125859

[bazel] port 29e1495

0f9720a

[libc++] Don't try to wait on a thread that hasn't started in std::as…

11766a4

…ync (llvm#125433) If the creation of a thread fails, this causes an idle loop that will never end because the thread wasn't started in the first place. Fixes llvm#125428

[X86] combineX86ShuffleChain - pull out repeated getOpcode() calls. NFC.

a93cda4

[X86] combineX86ShuffleChain - pass IsMaskedShuffle flag as argument …

e47cd46

…from combineX86ShufflesRecursively instead of computing it internally. NFC. Prep work toward better handling of shuffle combining across different vector widths.

[ConstraintElim] Preserve analyses when IR is unchanged. (llvm#128588)

4b29c28

[libc] Fix defaulting the full build

089f988

Summary: This was missing the architecture macros as they were defined just below.

[X86][DAGCombiner] Skip x87 fp80 values in `combineFMulOrFDivWithIntP…

44d1dbd

…ow2` (llvm#128618) f80 is not a valid IEEE floating-point type. Closes llvm#128528.

fhahn and others added 29 commits February 26, 2025 20:39

[LV] Generate check lines for if-conversion.ll

be28365

The limited check lines make it difficult to reason about test changes in llvm#128375.

[mlir][tosa] Rename ReduceProd to ReduceProduct (llvm#128751)

177ede2

This patch renames TOSA ReduceProd operator to ReduceProduct to align with the TOSA Spec 1.0 Signed-off-by: Tai Ly <[email protected]>

[libc][bazel] Add targets for stdbit functions (llvm#128934)

579ead1

Adds targets for the stdbit functions. Since the names follow a strict pattern, this is done via list comprehensions. I don't want to handwrite all 50.

[LV] Remove stray check lines after be28365.

7b6abd8

[SLP]Do not use node, if it is a subvector or buildvector node

418a987

If the buildvector has some matches with another node, which is a subvector of another buildvector node, need to check for this and cancel matching to avoid incorrect ordering of the nodes. Fixes llvm#128770

[flang][cuda] Add more math intrinsic interfaces in cudadevice (llvm#…

eb84c11

…128931)

Revert "DAG: Preserve range metadata when load is narrowed" (llvm#128948

0212834

) Reverts llvm#128144 Breaks clang prod x64 build (seen in Fuchsia toolchain)

[SLP]Check if the operand for removal is the reduction operand, await…

39bab1d

…ing for the reduction If the operand of the instruction-to-be-removed is a reduction value, which is not reduced yet, and, thus, it has no users, it may be removed during operands analysis. Fixes llvm#128736

Revert "[AMDGPU] Handle memcpy()-like ops in LowerBufferFatPointers (l…

1559a65

…lvm#126621)" This reverts commit 469757e. Multiple buildbot failures have been reported: llvm#126621

addressing changes

ec1dd87

Merge branch 'main' into users/joaosaffran/127840

eb9d7d3

fix test

f2a4f04

parsing root constant

426e6b8

add root constant support

4e23541

clean up

f4daf97

change test

da08d98

adding support in yaml2obj

1f42c8d

finish yaml2obj support

84791a8

clean up

af5b817

fix conflicts

4bedfc0

joaosaffran closed this Feb 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Obj2yaml/root descriptor #129096

Obj2yaml/root descriptor #129096

Uh oh!

joaosaffran commented Feb 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

113 participants

Obj2yaml/root descriptor #129096

Obj2yaml/root descriptor #129096

Uh oh!

Conversation

joaosaffran commented Feb 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

113 participants