-
Notifications
You must be signed in to change notification settings - Fork 0
[SYCL][RT upstreaming] Remove uses of XPTI #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@KseniyaTikhomirova Could you please have a look at these changes to see if I missed anything? |
KseniyaTikhomirova
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are you going to remove /xpti & /xptifw folders too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not really clear what this comment relates to but probably it is better to remove it also.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this class is also used for XPTI only and located in ./sycl/include/sycl/detail/common.hpp, usages could be found in queue ctor, for instance. Although we do not have xpti mentioned in that class declaration & definition so it could be kept useless but present.
Reverts llvm/llvm-project#120864 because it broke building compiler-rt on Mac. https://luci-milo.appspot.com/ui/p/fuchsia/builders/toolchain.ci/clang-mac-arm64/b8726812736235038609/overview
Fix #88650 In addition, delete the unneeded comment. https://sourceware.org/gnu-gabi/program-loading-and-dynamic-linking.txt
The commit 22b7b84 made the symbols provided by shared libraries "defined", and thus effectively made it impossible to generate non-pie dynamically linked executables using --unresolved-symbols=import-dynamic. This commit, based on llvm/llvm-project#109249, fixes it by checking sym->isShared() explictly. (as a bonus, you don't need to rely on --unresolved-symbols=import-dynamic anymore.) Fixes llvm/llvm-project#107387
Code sequence for tls-desc in large code model is not expected to be scheduled according to psABI 2.30. A later commit will fix it.
This ensures these classes are visible only to the appropriate translation unit and allows for more optimizations.
The debug section map was using MachO section names (with the "__" prefix), but DWARFContext expects section names with the object format prefix stripped off. This was preventing DWARFContext from accessing the debug_str section, resulting in bogus source name strings.
CONFLICT (content): Merge conflict in clang/lib/Driver/Driver.cpp
It wraps the body of namespace with additional newlines, turning this code:
```
namespace N {
int function();
}
```
into the following:
```
namespace N {
int function();
}
```
---------
Co-authored-by: Owen Pan <[email protected]>
This optimization was introduced by #70469. Like AArch64, we allow tail expansions for 3 on RV32 and 3/5/6 on RV64. This can simplify the comparison and reduce the number of blocks.
Similar to a9e75b1: During MachOPlatform bootstrap we need to defer actions until essential platform functionality has been loaded, but the platform itself may be loaded under a concurrent dispatcher so we have to guard against the deferred actions vector being accessed concurrently. This fixes a probablistic failure in the ORC runtime regression tests on Darwin/x86-64 that was spotted after edca1d9 (which turned on concurrent linking by default in llvm-jitlink).
The gcov version is set to 11.1 (compatible with gcov 9) even if `-Xclang -coverage-version=` specified version is less than 11.1. Therefore, we can drop producer support for version < 11.1.
`RegisterClassInfo::getRegPressureSetLimit` is a wrapper of `TargetRegisterInfo::getRegPressureSetLimit` with some logics to adjust the limit by removing reserved registers. It seems that we shouldn't use `TargetRegisterInfo::getRegPressureSetLimit` directly, just like the comment "This limit must be adjusted dynamically for reserved registers" said. Separate from llvm/llvm-project#118787
All the sources of `llvm-min-tblgen` are also used for `llvm-tblgen`, with identical compilation flags. Reuse the object files of `llvm-min-tblgen` for `llvm-tblgen` by applying the usual source structure of an executable: One file per executable which named after the executable name containing the (in this case trivial) main function, which just calls the tblgen_main in TableGen.cpp. This should also clear up any confusion (including mine) of where each executable's main function is. While this slightly reduces build time, the main motivation is ccache. Using the hard_link option, building the object files for `llvm-tblgen` will result in a hard link to the same object file already used for `llvm-min-tblgen`. To signal the build system that the file is new, ccache will update the file's time stamp. Unfortunately, time stamps are shared between all hard-linked files s.t. this will indirectly also update the time stamps for the object files used for `llvm-tblgen`. At the next run, Ninja will recognize this time stamp discrepancy to the expected stamp recorded in `.ninja_log` and rebuild those object files for `llvm-min-tblgen`, which again will also update the stamp for the `llvm-tblgen`... . This is especially annoying for tablegen because it means Ninja will re-run all tablegenning in every build. I am using the hard_link option because it reduces the cost of having multiple build-trees of the LLVM sources and reduces the wear to the SSD they are stored on.
V_MAX3/MIN3_NUM_F16 are alias GFX12 instructions with V_MAX3/MIN3_F16 in GFX11 and they should be updated together. This fix a bug introduced in llvm/llvm-project#113603 such that only V_MAX3/MIN3_F16 are replaced in true16 format. Also added GFX12 runlines for CodeGen test
This patch specializes `FoldTensorCastProducerOp` for `tensor::UnPackOp` by
introducing a dedicated pattern: `FoldTensorCastUnPackOp`. This mirrors a
similar update made for `tensor::PackOp` in #114559. Below is the updated
rationale tailored to `tensor::UnPackOp`.
ISSUE DESCRIPTION
Currently, `FoldTensorCastProducerOp` incorrectly folds the following:
```mlir
%cast = tensor.cast %dest : tensor<1x1x8x1xi32> to tensor<1x1x?x1xi32>
// Note: `%c8` and `?`.
%unpack = tensor.unpack %cast
inner_dims_pos = [0, 1]
inner_tiles = [%c8, 1]
into %res : tensor<1x1x?x1xi32> -> tensor<7x?xi32>
```
as:
```mlir
// Note: `%c8` and `8`.
%unpack = tensor.unpack %cast
inner_dims_pos = [0, 1]
inner_tiles = [%c8, 1]
into %res : tensor<1x1x8x1xi32> -> tensor<7x?xi32>
```
This triggers an Op verification failure because the folder does not
update the inner tile sizes in the unpack Op. This patch addresses the
issue by ensuring proper handling of inner tile sizes.
ADDITIONAL CHANGES
* invalid.mlir: Fixed a typo.
* TensorOps.cpp:
* Removed unnecessary `(void)tileSize`.
* Added comments following the discussion in PR #115772.
* Made minor updates to `FoldTensorCastPackOp` for consistency with the newly
introduced `FoldTensorCastUnPackOp`.
* Tensor/canonicalize.mlir: Ensured consistent usage of `test_attr` (e.g.,
replaced mixed use of `test_attr` and `some_attr`).
…#121400) This PR is a follow-up to #116373 and #116439, where a Transform Dialect (TD) operation was introduced to collect patterns for decomposing tensor.pack. The second patch renamed the patterns and the TD Op. Originally, adding patterns for `tensor.unpack` was marked as a TODO, which this PR addresses. No new tests are introduced in this PR. Instead, existing tests from: * "decompose-tensor-unpack.mlir" are reused. To achieve this: * The test is updated to use the TD operation `transform.apply_patterns.linalg.decompose_pack_unpack` instead of the flag `--test-linalg-transform-patterns="test-decompose-tensor-unpack"`, avoiding artificial tests created solely for the TD Op. * The TD sequence is saved to a new file, "decompose_unpack.mlir", and preloaded using the option.
`Sdext` and `Sdtrig` are RISC-V extensions related to debugging. The full specification can be found at https://github.com/riscv/riscv-debug-spec/releases/download/1.0.0-rc4/riscv-debug-specification.pdf
This test is failing on Windows (see e.g. https://lab.llvm.org/buildbot/#/builders/146/builds/1983), probably due to incomplete debugger support there (the test registers debug info in-process, so non-Darwin builds shouldn't be expected to have the right symbols).
When a single #embed directive is used to initialize a char array, the case is optimized via swap of EmbedExpr to underlying StringLiteral, resulting in better performance in AST consumers. While browsing through the code, I realized that 7122b70 which changed type of EmbedExpr made the "fast path" unreachable. This patch fixes this unfortunate situation.
…huffle(z)),binop(shuffle(y),shuffle(w)) -> binop(shuffle(x,z),shuffle(y,w)) (#120984) Some patterns (in particular horizontal style patterns) can end up with shuffles straddling both sides of a binop/cmp. Where individually the folds aren't worth it, by merging the (oneuse) shuffles we can notably reduce the net instruction count and cost. One of the final steps towards finally addressing #34072
UR Part: oneapi-src/unified-runtime#2520 --------- Co-authored-by: Kenneth Benzie (Benie) <[email protected]>
We are currently using `24.52.32224.5` and the corresponding IGC version is v.2.5.6 as per [here](https://github.com/intel/compute-runtime/releases/tag/24.52.32224.5). The script doesn't update it, I'll fix it tomorrow but for now just manually bump it. Signed-off-by: Sarnie, Nick <[email protected]>
B320 & B410 were deleted - https://github.com/PyCQA/bandit/releases/tag/1.8.1 This patch fixes the action - https://github.com/intel/llvm/actions/workflows/bandit.yml
This is a draft for a new extension to add the notion of a default device. It adds a method to query the current device from the calling thread as well as a method to change it. --------- Signed-off-by: James Brodman <[email protected]> Co-authored-by: John Pennycook <[email protected]>
3ee8318 to
cc953c3
Compare
Tested in intel#16011