-
Notifications
You must be signed in to change notification settings - Fork 15.3k
Obj2yaml/root descriptor #129096
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
joaosaffran
wants to merge
829
commits into
llvm:users/joaosaffran/127932
from
joaosaffran:obj2yaml/root-descriptor
Closed
Obj2yaml/root descriptor #129096
joaosaffran
wants to merge
829
commits into
llvm:users/joaosaffran/127932
from
joaosaffran:obj2yaml/root-descriptor
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The TOSA specification allows the zero point of conv ops to be variable when the dynamic extension is being used, but information about which extensions are in use is only known when the validation pass is run. A variable zero point should be allowed in the conv ops verifiers. In terms of testing, there didn't seem to be an existing set of tests for the verifiers to add this check to, so the opportunity has been taken to run the verifiers on the tests in `ops.mlir`. Since the conv2d test there had variable zero points, this change in functionality is being tested. Signed-off-by: Luke Hutton <[email protected]> Co-authored-by: Georgios Pinitas <[email protected]>
- Removed assertion for duplicate values as adding them is valid. - Fix parsing: reject strings for unknown tags, allow any value for Tag_PAuth_Platform and Tag_PAuth_Schema. - Print tags by using numbers with comments to reduce compiler-assembler dependencies. - Parsing error messages now only point to the symbol (^) instead of printing it.
Fixes llvm#127271 Testing mostly done in Compiler Explorer https://godbolt.org/z/q1h3ohxr7
…vm#126529) This patch teaches optimizeExtendOrTruncateConversion to bail out if the user of a zero-extend is a partial reduction intrinsic that we know will get lowered efficiently to a udot instruction.
As per LLVM coding standards "Variable names should be nouns (as they represent state). The name should be camel case, and start with an upper case letter (e.g. Leader or Boats)."
This patch adds MLIR to LLVM IR translation support for standalone `omp.distribute` operations, as well as `distribute simd` through ignoring SIMD information (similarly to `do/for simd`). Co-authored-by: Dominik Adamski <[email protected]>
…ts (llvm#127818) This patch adds codegen for `kmpc_dist_for_static_init` runtime calls, used to support worksharing a single loop across teams and threads. This can be used to implement `distribute parallel for/do` support.
- Added sin/cos testcases. - Added i686 checks for all testcases. - Moved fp16 and fp128 cases into separate files. - Dropped tests for ppc_fp128 type. - Added global-isel runs as precommit testing for llvm#126931
This patch adds support for translating composite `omp.parallel` + `omp.distribute` + `omp.wsloop` loops to LLVM IR on the host. This is done by passing an updated `WorksharingLoopType` to the call to `applyWorkshareLoop` associated to the lowering of the `omp.wsloop` operation, so that `__kmpc_dist_for_static_init` is called at runtime in place of `__kmpc_for_static_init`. Existing translation rules take care of creating a parallel region to hold the workshared and workdistributed loop.
These patterns represent rev instructions, which reverse inside a portion of the full vector. See llvm/test/CodeGen/AArch64/arm64-rev.ll for codegen tests.
…llvm#127820) This patch splits off the calculation of canonical loop trip counts from the creation of canonical loops. This makes it possible to reuse this logic to, for instance, populate the `__tgt_target_kernel` runtime call for SPMD kernels. This feature is used to simplify one of the existing OpenMPIRBuilder tests.
This patch implements MLIR to LLVM IR translation of host-evaluated loop bounds, completing initial support for `target teams distribute parallel do [simd]` and `target teams distribute [simd]`.
…lvm#127822) This patch adds `target teams distribute [simd]` and equivalent construct nests to the list of cases where loop bounds can be evaluated in the host, as they represent kernels for which the trip count must also be evaluated in advance to the kernel call.
This is similar to what we do in the AddOffset instruction when adding an offset to a pointer.
…#127624) * Document the remaining test cases, add a note that these are exercising `TransferOpReduceRank` (addresses an existing TODO). * Add missing cases (for fixed-width and scalable vectors). * Remove scalable vectors from the negative test (the masked case) - this test will also fail with fixed-width vectors. For consistency, lets make all negative test use fixed-width vectors.
This is mostly true, and it tricks the rematerialization code into handling this without special casing it.
…_size (llvm#128692) Comparing the case where each dimension is used alone, the only codegen difference is a missed addressing mode fold for the constant offset in the old version due to an ancient bug.
This commit also enables fp16 log, which was previously missing. Other than that, no changes to codegen for AMDGPU/Nvidia targets. Note that for simplicity this commit doesn't try to refactor or optimize the implementations. Notably, each log is only implementated for scalar types; vector types are scalarized. It doesn't look too difficult to make the implementations suitable for vector codegen, so I'll try that in a future commit. There's also an unused implementation of log in clc_log_base.h, whereas the implementation currently used by libclc targets re-uses log2 with an additional multiplication. That should also be cleaned up as on first inspection it looks a more optimal implementation, though it would have to be checked against the OpenCL CTS for good measure.
This fixes the expected output to match the one of the current interpreter.
Both steakhal and balazs-benics-sonarsource accounts are mine. See llvm#125859
…ync (llvm#125433) If the creation of a thread fails, this causes an idle loop that will never end because the thread wasn't started in the first place. Fixes llvm#125428
…from combineX86ShufflesRecursively instead of computing it internally. NFC. Prep work toward better handling of shuffle combining across different vector widths.
Summary: This was missing the architecture macros as they were defined just below.
…lvm#128159) The SPIR-V Backend uses the same set of utility functions, mostly though not entirely from SPIRVGlobalRegistry, to generate gMIR and SPIR-V opcodes, depending on the current stage of translation. This is controlled by an explicit EmitIR flag rather than the current translation pass, and there are legacy pieces of code where the EmitIR flag is declared so that it has a default true value, allowing using utility functions without explicitly declaring their intent to work either in gMIR or in SPIR-V part of the lowering process. While it may be ok to leave this default EmitIR flag as is in generation of scalar integer/float types, as we don't expect to see any dependent opcodes derived from such OpTypeXXX instructions, using of EmitIR by default in aggregation types is a source of hidden logical flaws and actual issues. This PR provides a partial fix to the problem by removing default status of EmitIR, requiring a user call site to explicitly announce its intent to generate gMIR or SPIR-V code, fixes several cases of misuse of EmitIR, and, the most important, fixes a nasty logical error that breaks passing of actually asked EmitIR value by the default value in the middle of the chain of calls, in the `findSPIRVType` call. The latter error was a source of issues in the post-instruction selection pass that has been getting gMIR code where SPIR-V was explicitly requested due to overloaded with default parameters internal API in SPIRVGlobalRegistry (most notably, `findSPIRVType`).
…ow2` (llvm#128618) f80 is not a valid IEEE floating-point type. Closes llvm#128528.
…d` (llvm#128695) Address review comment llvm#128466 (comment) Compile-time impact: https://llvm-compile-time-tracker.com/compare.php?from=72781f58efddecee19feb07fec4e6104ef4c4812&to=3853aee61626b0eda06671b4cbbc4cdd1344440c&stat=instructions:u
The limited check lines make it difficult to reason about test changes in llvm#128375.
…llvm#127679) This patch changes the input_zp and weight_zp for convolution operators to be required inputs in order to align with the TOSA Spec 1.0. Convolution operators affected are: CONV2D, CONV3D, DEPTHWISE_CONV2D, and TRANSPOSE_CONV2D. Signed-off-by: Tai Ly <[email protected]>
This patch renames TOSA ReduceProd operator to ReduceProduct to align with the TOSA Spec 1.0 Signed-off-by: Tai Ly <[email protected]>
Adds targets for the stdbit functions. Since the names follow a strict pattern, this is done via list comprehensions. I don't want to handwrite all 50.
…8060) Enables 16-bit values to be spilled to scratch. Note, the memory instructions used are defined as reading and writing VGPR_32, but do not clobber the unspecified 16-bits of those registers, and so spills and reloads of lo and hi halves of the registers work.
…re current working directory (llvm#128446) This PR explicitly sets `DebugCompilationDir` to the system's root directory if it is safe to ignore the current working directory. This fixes a problem where a PCM file's embedded debug information can lead to compilation failure. The compiler may have decided it is indeed safe to ignore the current working directory. In this case, the PCM file's content is functionally correct regardless of the current working directory because no inputs use relative paths (see llvm#124786). However, a PCM may contain debug info. If debug info is requested, the compiler uses the current working directory value to set `DW_AT_comp_dir`. This may lead to the following situation: 1. Two different compilations need the same PCM file. 2. The PCM file is compiled assuming a working directory, which is embedded in the debug info, but otherwise has no effect. 3. The second compilation assumes a different working directory, and expects an identically-sized pcm file. However, it cannot find such a PCM, because the existing PCM file has been compiled assuming a different `DW_AT_comp_dir `, which is embedded in the debug info. This PR resets the `DebugCompilationDir` if it is functionally safe to ignore the working directory so the above situation is avoided, since all debug information will share the same working directory. rdar://145249881
If the buildvector has some matches with another node, which is a subvector of another buildvector node, need to check for this and cancel matching to avoid incorrect ordering of the nodes. Fixes llvm#128770
…ine with `vector.extract` (llvm#128915) This is doing the same as llvm#117731 did for `vector.extract`, but for `vector.insert`. It is a bit more complicated as the insertion destination may itself need to be extracted. As the test shows, this fixes two previously unsupported cases: - Dynamic indices - 0-D vectors. --------- Signed-off-by: Benoit Jacob <[email protected]>
…attr. (llvm#125594) This commit adds support for casting memrefs into fat raw buffer pointers to the AMDGPU dialect. Fat raw buffer pointers - or, in LLVM terms, ptr addrspcae(7), allow encapsulating a buffer descriptor (as produced by the make.buffer.rsrc intrinsic or provided from some API) into a pointer that supports ordinary pointer operations like load or store. This allows people to take advantage of the additional semantics that buffer_load and similar instructions provide without forcing the use of entirely separate amdgpu.raw_buffer_* operations. Operations on fat raw buffer pointers are translated to the corresponding LLVM intrinsics by the backend. This commit also goes and and defines a #amdgpu.address_space<> attribute so that AMDGPU-specific memory spaces can be represented. Only #amdgpu.address_space<fat_raw_buffer> will work correctly with the memref dialect, but the other possible address spaces are included for completeness. --------- Co-authored-by: Jakub Kuderski <[email protected]> Co-authored-by: Prashant Kumar <[email protected]>
) Since LowerBufferFatPointers runs before PreISelIntrinsicLowering, which normally handles unsupported memcpy()s,, and since you can't have a `noalias {ptr addrspace(8), i32}` becasue it crashes later passes, manually expand memcpy()s involving buffer fat pointers to loops. Additionally, though they're unlikely to be used, this commit adds support for memset(). This commit doesn't implement writing direct-to-LDS loads as the intrinsics, but leaves the option in the future.
…ssible (llvm#128564) This change effectively reverts 296ccef (https://reviews.llvm.org/D77192) Most of these symbols are just normal C symbols that get imported from wither libcompiler-rt or from emscripten's JS library code. In most cases it should not be necessary to give them explicit import names. The advantage of doing this is that we can wasm-ld can/will fail with a useful error message when these symbols are missing. As opposed to today where it will simply import them and defer errors until later (when they are less specific).
) Reverts llvm#128144 Breaks clang prod x64 build (seen in Fuchsia toolchain)
…ing for the reduction If the operand of the instruction-to-be-removed is a reduction value, which is not reduced yet, and, thus, it has no users, it may be removed during operands analysis. Fixes llvm#128736
This separates out parsing of modulemaps from updating the `clang::ModuleMap` information. Currently this has no effect other than slightly changing diagnostics. Upcoming changes will use this to allow searching for modules without fully processing modulemaps. This creates a new `modulemap` namespace because there are too many things called ModuleMap* right now that mean different things. I'd like to clean this up, but I'm not sure yet what I want to call everything. This also drops the `SourceLocation` from `moduleMapFileRead`. This is never used in tree, and in future patches I plan to make the modulemap parser use a different `SourceManager` so that we can share modulemap parsing between `CompilerInstance`s. This will make the `SourceLocation` meaningless.
…llvm#128626) Currently, the llvm importer can only cover intrinsics that have a first class representation in an MLIR dialect (arm-neon, etc). This PR introduces a fallback mechanism that allow "unregistered" intrinsics to be imported by using the generic `llvm.intrinsic_call` operation. This is useful in several ways: 1. Allows round-trip the LLVM dialect output lowered from other dialects (example: ClangIR) 2. Enables MLIR-linking tools to operate on imported LLVM IR without requiring to add new operations to dozen of different targets (cc @xlauko @smeenai). If multiple dialects implement this interface hook, the last one to register is the one converting all unregistered intrinsics. --------- Co-authored-by: Tobias Gysi <[email protected]>
…lvm#126621)" This reverts commit 469757e. Multiple buildbot failures have been reported: llvm#126621
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.