Skip to content

Conversation

@sergey-semenov
Copy link
Owner

@sergey-semenov sergey-semenov commented Nov 7, 2024

Tested in intel#16011

@sergey-semenov
Copy link
Owner Author

@KseniyaTikhomirova Could you please have a look at these changes to see if I missed anything?

Copy link

@KseniyaTikhomirova KseniyaTikhomirova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you going to remove /xpti & /xptifw folders too?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not really clear what this comment relates to but probably it is better to remove it also.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this class is also used for XPTI only and located in ./sycl/include/sycl/detail/common.hpp, usages could be found in queue ctor, for instance. Although we do not have xpti mentioned in that class declaration & definition so it could be kept useless but present.

gulfemsavrun and others added 26 commits January 2, 2025 18:34
The commit 22b7b84
made the symbols provided by shared libraries "defined",
and thus effectively made it impossible to generate non-pie
dynamically linked executables using
--unresolved-symbols=import-dynamic.

This commit, based on llvm/llvm-project#109249,
fixes it by checking sym->isShared() explictly.
(as a bonus, you don't need to rely on
--unresolved-symbols=import-dynamic
anymore.)

Fixes llvm/llvm-project#107387
Code sequence for tls-desc in large code model is not expected to be
scheduled according to psABI 2.30.

A later commit will fix it.
This ensures these classes are visible only to the appropriate
translation unit and allows for more optimizations.
The debug section map was using MachO section names (with the "__" prefix), but
DWARFContext expects section names with the object format prefix stripped off.
This was preventing DWARFContext from accessing the debug_str section,
resulting in bogus source name strings.
  CONFLICT (content): Merge conflict in clang/lib/Driver/Driver.cpp
It wraps the body of namespace with additional newlines, turning this code:
```
namespace N {
int function();
}
```
into the following:
```
namespace N {

int function();

}
```

---------

Co-authored-by: Owen Pan <[email protected]>
This optimization was introduced by #70469.

Like AArch64, we allow tail expansions for 3 on RV32 and 3/5/6
on RV64.

This can simplify the comparison and reduce the number of blocks.
Similar to a9e75b1: During MachOPlatform bootstrap we need to defer actions
until essential platform functionality has been loaded, but the platform itself
may be loaded under a concurrent dispatcher so we have to guard against the
deferred actions vector being accessed concurrently.

This fixes a probablistic failure in the ORC runtime regression tests on
Darwin/x86-64 that was spotted after edca1d9 (which turned on concurrent
linking by default in llvm-jitlink).
The gcov version is set to 11.1 (compatible with gcov 9) even if
`-Xclang -coverage-version=` specified version is less than 11.1.

Therefore, we can drop producer support for version < 11.1.
`RegisterClassInfo::getRegPressureSetLimit` is a wrapper of
`TargetRegisterInfo::getRegPressureSetLimit` with some logics to
adjust the limit by removing reserved registers.

It seems that we shouldn't use
`TargetRegisterInfo::getRegPressureSetLimit`
directly, just like the comment "This limit must be adjusted
dynamically for reserved registers" said.

Separate from llvm/llvm-project#118787
All the sources of `llvm-min-tblgen` are also used for `llvm-tblgen`,
with identical compilation flags. Reuse the object files of
`llvm-min-tblgen` for `llvm-tblgen` by applying the usual source
structure of an executable: One file per executable which named after
the executable name containing the (in this case trivial) main function,
which just calls the tblgen_main in TableGen.cpp. This should also clear
up any confusion (including mine) of where each executable's main
function is.

While this slightly reduces build time, the main motivation is ccache.
Using the hard_link
option, building the object files for `llvm-tblgen` will result in a
hard link to the same object file already used for `llvm-min-tblgen`. To
signal the build system that the file is new, ccache will update the
file's time stamp. Unfortunately, time stamps are shared between all
hard-linked files s.t. this will indirectly also update the time stamps
for the object files used for `llvm-tblgen`. At the next run, Ninja will
recognize this time stamp discrepancy to the expected stamp recorded in
`.ninja_log` and rebuild those object files for `llvm-min-tblgen`, which
again will also update the stamp for the `llvm-tblgen`... . This is
especially annoying for tablegen because it means Ninja will re-run all
tablegenning in every build.

I am using the hard_link option because it reduces the cost of having
multiple build-trees of the LLVM sources and reduces the wear to the SSD
they are stored on.
V_MAX3/MIN3_NUM_F16 are alias GFX12 instructions with V_MAX3/MIN3_F16 in
GFX11 and they should be updated together.

This fix a bug introduced in
llvm/llvm-project#113603 such that only
V_MAX3/MIN3_F16 are replaced in true16 format. Also added GFX12 runlines
for CodeGen test
This patch specializes `FoldTensorCastProducerOp` for `tensor::UnPackOp` by
introducing a dedicated pattern: `FoldTensorCastUnPackOp`. This mirrors a
similar update made for `tensor::PackOp` in #114559. Below is the updated
rationale tailored to `tensor::UnPackOp`.

ISSUE DESCRIPTION

Currently, `FoldTensorCastProducerOp` incorrectly folds the following:

```mlir
%cast = tensor.cast %dest : tensor<1x1x8x1xi32> to tensor<1x1x?x1xi32>
// Note: `%c8` and `?`.
%unpack = tensor.unpack %cast
  inner_dims_pos = [0, 1]
  inner_tiles = [%c8, 1]
  into %res : tensor<1x1x?x1xi32> -> tensor<7x?xi32>
```
as:

```mlir
// Note: `%c8` and `8`.
%unpack = tensor.unpack %cast
  inner_dims_pos = [0, 1]
  inner_tiles = [%c8, 1]
  into %res : tensor<1x1x8x1xi32> -> tensor<7x?xi32>
```
This triggers an Op verification failure because the folder does not
update the inner tile sizes in the unpack Op. This patch addresses the
issue by ensuring proper handling of inner tile sizes.

ADDITIONAL CHANGES

* invalid.mlir: Fixed a typo.
* TensorOps.cpp:
  * Removed unnecessary `(void)tileSize`.
  * Added comments following the discussion in PR #115772.
  * Made minor updates to `FoldTensorCastPackOp` for consistency with the newly
    introduced `FoldTensorCastUnPackOp`.
* Tensor/canonicalize.mlir: Ensured consistent usage of `test_attr` (e.g.,
  replaced mixed use of `test_attr` and `some_attr`).
…#121400)

This PR is a follow-up to #116373 and #116439, where a Transform Dialect
(TD) operation was introduced to collect patterns for decomposing
tensor.pack. The second patch renamed the patterns and the TD Op.
Originally, adding patterns for `tensor.unpack` was marked as a TODO,
which this PR addresses.

No new tests are introduced in this PR. Instead, existing tests from:

* "decompose-tensor-unpack.mlir"

are reused. To achieve this:

* The test is updated to use the TD operation
  `transform.apply_patterns.linalg.decompose_pack_unpack` instead of the
  flag `--test-linalg-transform-patterns="test-decompose-tensor-unpack"`,
  avoiding artificial tests created solely for the TD Op.
* The TD sequence is saved to a new file, "decompose_unpack.mlir", and
  preloaded using the option.
`Sdext` and `Sdtrig` are RISC-V extensions related to debugging.

The full specification can be found at

https://github.com/riscv/riscv-debug-spec/releases/download/1.0.0-rc4/riscv-debug-specification.pdf
This test is failing on Windows (see e.g.
https://lab.llvm.org/buildbot/#/builders/146/builds/1983), probably due to
incomplete debugger support there (the test registers debug info in-process, so
non-Darwin builds shouldn't be expected to have the right symbols).
When a single #embed directive is used to initialize a char array, the
case is optimized via swap of EmbedExpr to underlying StringLiteral,
resulting in better performance in AST consumers.
While browsing through the code, I realized that
7122b70
which changed type of EmbedExpr made the "fast path" unreachable. This
patch fixes this unfortunate situation.
…huffle(z)),binop(shuffle(y),shuffle(w)) -> binop(shuffle(x,z),shuffle(y,w)) (#120984)

Some patterns (in particular horizontal style patterns) can end up with shuffles straddling both sides of a binop/cmp.

Where individually the folds aren't worth it, by merging the (oneuse) shuffles we can notably reduce the net instruction count and cost.

One of the final steps towards finally addressing #34072
zhaomaosu and others added 12 commits January 15, 2025 14:11
We are currently using `24.52.32224.5` and the corresponding IGC version
is v.2.5.6 as per
[here](https://github.com/intel/compute-runtime/releases/tag/24.52.32224.5).
The script doesn't update it, I'll fix it tomorrow but for now just
manually bump it.

Signed-off-by: Sarnie, Nick <[email protected]>
This is a draft for a new extension to add the notion of a default
device. It adds a method to query the current device from the calling
thread as well as a method to change it.

---------

Signed-off-by: James Brodman <[email protected]>
Co-authored-by: John Pennycook <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet