Skip to content

Conversation

@joaosaffran
Copy link
Contributor

No description provided.

lhutton1 and others added 30 commits February 25, 2025 09:38
The TOSA specification allows the zero point of conv ops to be variable
when the dynamic extension is being used, but information about which
extensions are in use is only known when the validation pass is run. A
variable zero point should be allowed in the conv ops verifiers.

In terms of testing, there didn't seem to be an existing set of tests
for the verifiers to add this check to, so the opportunity has been
taken to run the verifiers on the tests in `ops.mlir`. Since the conv2d
test there had variable zero points, this change in functionality is
being tested.

Signed-off-by: Luke Hutton <[email protected]>
Co-authored-by: Georgios Pinitas <[email protected]>
- Removed assertion for duplicate values as adding them is valid.
- Fix parsing: reject strings for unknown tags, allow any value for
Tag_PAuth_Platform and Tag_PAuth_Schema.
- Print tags by using numbers with comments to reduce compiler-assembler
dependencies.
- Parsing error messages now only point to the symbol (^) instead of
printing it.
…vm#126529)

This patch teaches optimizeExtendOrTruncateConversion to bail out
if the user of a zero-extend is a partial reduction intrinsic
that we know will get lowered efficiently to a udot instruction.
As per LLVM coding standards
"Variable names should be nouns (as they represent state).
 The name should be camel case, and start with an upper
 case letter (e.g. Leader or Boats)."
This patch adds MLIR to LLVM IR translation support for standalone
`omp.distribute` operations, as well as `distribute simd` through
ignoring SIMD information (similarly to `do/for simd`).

Co-authored-by: Dominik Adamski <[email protected]>
…ts (llvm#127818)

This patch adds codegen for `kmpc_dist_for_static_init` runtime calls,
used to support worksharing a single loop across teams and threads. This
can be used to implement `distribute parallel for/do` support.
- Added sin/cos testcases.
- Added i686 checks for all testcases.
- Moved fp16 and fp128 cases into separate files.
- Dropped tests for ppc_fp128 type.
- Added global-isel runs as precommit testing for llvm#126931
This patch adds support for translating composite `omp.parallel` +
`omp.distribute` + `omp.wsloop` loops to LLVM IR on the host. This is
done by passing an updated `WorksharingLoopType` to the call to
`applyWorkshareLoop` associated to the lowering of the `omp.wsloop`
operation, so that `__kmpc_dist_for_static_init` is called at runtime in
place of `__kmpc_for_static_init`.

Existing translation rules take care of creating a parallel region to
hold the workshared and workdistributed loop.
These patterns represent rev instructions, which reverse inside a
portion of the full vector. See llvm/test/CodeGen/AArch64/arm64-rev.ll
for codegen tests.
…llvm#127820)

This patch splits off the calculation of canonical loop trip counts from
the creation of canonical loops. This makes it possible to reuse this
logic to, for instance, populate the `__tgt_target_kernel` runtime call
for SPMD kernels.

This feature is used to simplify one of the existing OpenMPIRBuilder
tests.
This patch implements MLIR to LLVM IR translation of host-evaluated loop
bounds, completing initial support for `target teams distribute parallel
do [simd]` and `target teams distribute [simd]`.
…lvm#127822)

This patch adds `target teams distribute [simd]` and equivalent
construct nests to the list of cases where loop bounds can be evaluated
in the host, as they represent kernels for which the trip count must
also be evaluated in advance to the kernel call.
This is similar to what we do in the AddOffset instruction when adding
an offset to a pointer.
…#127624)

* Document the remaining test cases, add a note that these are
  exercising `TransferOpReduceRank` (addresses an existing TODO).
* Add missing cases (for fixed-width and scalable vectors).
* Remove scalable vectors from the negative test (the masked case) - this test
  will also fail with fixed-width vectors. For consistency, lets make all
  negative test use fixed-width vectors.
This is mostly true, and it tricks the rematerialization
code into handling this without special casing it.
…_size (llvm#128692)

Comparing the case where each dimension is used alone, the only codegen
difference is a missed addressing mode fold for the constant offset in the old
version due to an ancient bug.
This commit also enables fp16 log, which was previously missing.

Other than that, no changes to codegen for AMDGPU/Nvidia targets.

Note that for simplicity this commit doesn't try to refactor or optimize
the implementations. Notably, each log is only implementated for scalar
types; vector types are scalarized. It doesn't look too difficult to
make the implementations suitable for vector codegen, so I'll try that
in a future commit.

There's also an unused implementation of log in clc_log_base.h, whereas
the implementation currently used by libclc targets re-uses log2 with an
additional multiplication. That should also be cleaned up as on first
inspection it looks a more optimal implementation, though it would have
to be checked against the OpenCL CTS for good measure.
This fixes the expected output to match the one of the current
interpreter.
Both steakhal and balazs-benics-sonarsource accounts are mine. See
llvm#125859
…ync (llvm#125433)

If the creation of a thread fails, this causes an idle loop that will
never end because the thread wasn't started in the first place.

Fixes llvm#125428
…from combineX86ShufflesRecursively instead of computing it internally. NFC.

Prep work toward better handling of shuffle combining across different vector widths.
Summary:
This was missing the architecture macros as they were defined just
below.
…lvm#128159)

The SPIR-V Backend uses the same set of utility functions, mostly though
not entirely from SPIRVGlobalRegistry, to generate gMIR and SPIR-V
opcodes, depending on the current stage of translation. This is
controlled by an explicit EmitIR flag rather than the current
translation pass, and there are legacy pieces of code where the EmitIR
flag is declared so that it has a default true value, allowing using
utility functions without explicitly declaring their intent to work
either in gMIR or in SPIR-V part of the lowering process.

While it may be ok to leave this default EmitIR flag as is in generation
of scalar integer/float types, as we don't expect to see any dependent
opcodes derived from such OpTypeXXX instructions, using of EmitIR by
default in aggregation types is a source of hidden logical flaws and
actual issues.

This PR provides a partial fix to the problem by removing default status
of EmitIR, requiring a user call site to explicitly announce its intent
to generate gMIR or SPIR-V code, fixes several cases of misuse of
EmitIR, and, the most important, fixes a nasty logical error that breaks
passing of actually asked EmitIR value by the default value in the
middle of the chain of calls, in the `findSPIRVType` call. The latter
error was a source of issues in the post-instruction selection pass that
has been getting gMIR code where SPIR-V was explicitly requested due to
overloaded with default parameters internal API in SPIRVGlobalRegistry
(most notably, `findSPIRVType`).
…ow2` (llvm#128618)

f80 is not a valid IEEE floating-point type.
Closes llvm#128528.
fhahn and others added 29 commits February 26, 2025 20:39
The limited check lines make it difficult to reason about test changes
in llvm#128375.
…llvm#127679)

This patch changes the input_zp and weight_zp for convolution operators
to be required inputs
in order to align with the TOSA Spec 1.0.

Convolution operators affected are:
	CONV2D, CONV3D, DEPTHWISE_CONV2D, and TRANSPOSE_CONV2D.


Signed-off-by: Tai Ly <[email protected]>
This patch renames TOSA ReduceProd operator to ReduceProduct to align
with the TOSA Spec 1.0

Signed-off-by: Tai Ly <[email protected]>
Adds targets for the stdbit functions. Since the names follow a strict
pattern, this is done via list comprehensions. I don't want to handwrite
all 50.
…8060)

Enables 16-bit values to be spilled to scratch.

Note, the memory instructions used are defined as reading and writing
VGPR_32, but do not clobber the unspecified 16-bits of those registers,
and so spills and reloads of lo and hi halves of the registers work.
…re current working directory (llvm#128446)

This PR explicitly sets `DebugCompilationDir` to the system's root
directory if it is safe to ignore the current working directory.

This fixes a problem where a PCM file's embedded debug information can
lead to compilation failure. The compiler may have decided it is indeed
safe to ignore the current working directory. In this case, the PCM
file's content is functionally correct regardless of the current working
directory because no inputs use relative paths (see
llvm#124786). However, a PCM may
contain debug info. If debug info is requested, the compiler uses the
current working directory value to set `DW_AT_comp_dir`. This may lead
to the following situation:
1. Two different compilations need the same PCM file. 
2. The PCM file is compiled assuming a working directory, which is
embedded in the debug info, but otherwise has no effect.
3. The second compilation assumes a different working directory, and
expects an identically-sized pcm file. However, it cannot find such a
PCM, because the existing PCM file has been compiled assuming a
different `DW_AT_comp_dir `, which is embedded in the debug info.

This PR resets the `DebugCompilationDir` if it is functionally safe to
ignore the working directory so the above situation is avoided, since
all debug information will share the same working directory.

rdar://145249881
If the buildvector has some matches with another node, which is
a subvector of another buildvector node, need to check for this and
cancel matching to avoid incorrect ordering of the nodes.

Fixes llvm#128770
…ine with `vector.extract` (llvm#128915)

This is doing the same as
llvm#117731 did for
`vector.extract`, but for `vector.insert`.

It is a bit more complicated as the insertion destination may itself
need to be extracted.

As the test shows, this fixes two previously unsupported cases:
- Dynamic indices
- 0-D vectors.

---------

Signed-off-by: Benoit Jacob <[email protected]>
…attr. (llvm#125594)

This commit adds support for casting memrefs into fat raw buffer
pointers to the AMDGPU dialect.

Fat raw buffer pointers - or, in LLVM terms, ptr addrspcae(7), allow
encapsulating a buffer descriptor (as produced by the make.buffer.rsrc
intrinsic or provided from some API) into a pointer that supports
ordinary pointer operations like load or store. This allows people to
take advantage of the additional semantics that buffer_load and similar
instructions provide without forcing the use of entirely separate
amdgpu.raw_buffer_* operations.

Operations on fat raw buffer pointers are translated to the
corresponding LLVM intrinsics by the backend.

This commit also goes and and defines a #amdgpu.address_space<>
attribute so that AMDGPU-specific memory spaces can be represented. Only
#amdgpu.address_space<fat_raw_buffer> will work correctly with the
memref dialect, but the other possible address spaces are included for
completeness.

---------

Co-authored-by: Jakub Kuderski <[email protected]>
Co-authored-by: Prashant Kumar <[email protected]>
)

Since LowerBufferFatPointers runs before PreISelIntrinsicLowering, which
normally handles unsupported memcpy()s,, and since you can't have a
`noalias {ptr addrspace(8), i32}` becasue it crashes later passes,
manually expand memcpy()s involving buffer fat pointers to loops.

Additionally, though they're unlikely to be used, this commit adds
support for memset().

This commit doesn't implement writing direct-to-LDS loads as the
intrinsics, but leaves the option in the future.
…ssible (llvm#128564)

This change effectively reverts 296ccef
(https://reviews.llvm.org/D77192)

Most of these symbols are just normal C symbols that get imported from
wither libcompiler-rt or from emscripten's JS library code. In most
cases it should not be necessary to give them explicit import names.

The advantage of doing this is that we can wasm-ld can/will fail with a
useful error message when these symbols are missing. As opposed to today
where it will simply import them and defer errors until later (when they
are less specific).
)

Reverts llvm#128144

Breaks clang prod x64 build (seen in Fuchsia toolchain)
…ing for the reduction

If the operand of the instruction-to-be-removed is a reduction value,
which is not reduced yet, and, thus, it has no users, it may be removed
during operands analysis.

Fixes llvm#128736
This separates out parsing of modulemaps from updating the
`clang::ModuleMap` information.

Currently this has no effect other than slightly changing diagnostics.
Upcoming changes will use this to allow searching for modules without
fully processing modulemaps.


This creates a new `modulemap` namespace because there are too many
things called ModuleMap* right now that mean different things. I'd like
to clean this up, but I'm not sure yet what I want to call everything.

This also drops the `SourceLocation` from `moduleMapFileRead`. This is
never used in tree, and in future patches I plan to make the modulemap
parser use a different `SourceManager` so that we can share modulemap
parsing between `CompilerInstance`s. This will make the `SourceLocation`
meaningless.
…llvm#128626)

Currently, the llvm importer can only cover intrinsics that have a first
class representation in an MLIR dialect (arm-neon, etc). This PR
introduces a fallback mechanism that allow "unregistered" intrinsics to
be imported by using the generic `llvm.intrinsic_call` operation. This
is useful in several ways:

1. Allows round-trip the LLVM dialect output lowered from other dialects
(example: ClangIR)
2. Enables MLIR-linking tools to operate on imported LLVM IR without
requiring to add new operations to dozen of different targets (cc
@xlauko @smeenai).

If multiple dialects implement this interface hook, the last one to
register is the one converting all unregistered intrinsics.

---------

Co-authored-by: Tobias Gysi <[email protected]>
…lvm#126621)"

This reverts commit 469757e.

Multiple buildbot failures have been reported:
llvm#126621
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.