[LoopVectorize] Add the cost of VPInstruction::AnyOf to vplan #127085

hassnaaHamdi · 2025-02-13T16:00:21Z

This patch adds an initial implementation of VPInstruction::computeCost with support for only one instruction so far - VPInstruction::AnyOf. This is only used when vectorising loops with uncountable early exits.

While attempting to teach ScalarEvolution about samesign in another effort, a complicated testcase with nested loops, and zero-extends of the induction-variable regresses, but only when the target datalayout is present. The regression was originally reported on IndVarSimplify, but an improvement of symbolic BTC was also observed on SCEV. Check in the test into both IndVarSimplify and SCEV, to ease investigation and further development.

DataLayout is already available as a member variable.

… value in libcxx container summary (llvm#125294) This has two changes: 1. Set show value for libcxx and libstdcxx summary provider. This will print the pointer value for both pointer type and reference type. 2. Remove pointer value printing in libcxx container summary. Discussion: https://discourse.llvm.org/t/lldb-hides-raw-pointer-value-for-libcxx-and-libstdcxx-pointer-types-in-summary-string/84226

This shows missed opportunity to fold (fshl ld1, ld0, c) -> (ld0[ofs]) if the load chain results are used.

…d0[ofs]) combine. (llvm#124871) Happened to notice some odd things related to chains in this code. The code calls hasOneUse on LoadSDNode* which will check users of the data and the chain. I think this was trying to check that the data had one use so one of the loads would definitely be removed by the transform. Load chains don't always have users so our testing may not have noticed that the chains being used would block the transform. The code makes all users of ld1's chain use the new load's chain, but we don't know that ld1 becomes dead. This can cause incorrect dependencies if ld1's chain is used and it isn't deleted. I think the better thing to do is use makeEquivalentMemoryOrdering to make all users of ld0 and ld1 depend on the new load and the original loads. If the olds loads become dead, their chain will be cleaned up later. I'm having trouble getting a test for any ordering issue with the current code. areNonVolatileConsecutiveLoads requires the two loads to have the same input chain. Given that, I don't know how to use one of the load chain results without also using the other. If they are both used we don't do the transform because SDNode::hasOneUse will return false for both.

…m#125526) This is attempt 2 to merge this, the first one is llvm#117622. This properly disables the tests when building for playstation, since the warning is disabled there. When a hidden object is built into multiple shared libraries, each instance of the library will get its own copy. If the object was supposed to be globally unique (e.g. a global variable or static data member), this can cause very subtle bugs. An object might be incorrectly duplicated if it: Is defined in a header (so it might appear in multiple TUs), and Has external linkage (otherwise it's supposed to be duplicated), and Has hidden visibility (or else the dynamic linker will handle it) The duplication is only a problem semantically if one of the following is true: The object is mutable (the copies won't be in sync), or Its initialization has side effects (it may now run more than once), or The value of its address is used (different copies have different addresses). To detect this, we add a new -Wunique-object-duplication warning. It warns on cases (1) and (2) above. To be conservative, we only warn in case (2) if we are certain the initializer has side effects, and we don't warn on new because the only side effect is some extra memory usage. We don't currently warn on case (3) because doing so is prone to false positives: there are many reasons for taking the address which aren't inherently problematic (e.g. passing to a function that expects a pointer). We only run into problems if the code inspects the value of the address. The check is currently disabled for windows, which uses its own analogue of visibility (declimport/declexport). The check is also disabled inside templates, since it can give false positives if a template is never instantiated. Resolving the warning The warning can be fixed in several ways: If the object in question doesn't need to be mutable, it should be made const. Note that the variable must be completely immutable, e.g. we'll warn on const int* p because the pointer itself is mutable. To silence the warning, it should instead be const int* const p. If the object must be mutable, it (or the enclosing function, in the case of static local variables) should be made visible using __attribute((visibility("default"))) If the object is supposed to be duplicated, it should be be given internal linkage. Testing I've tested the warning by running it on clang itself, as well as on chromium. Compiling clang resulted in [10 warnings across 6 files](https://github.com/user-attachments/files/17908069/clang-warnings.txt), while Chromium resulted in [160 warnings across 85 files](https://github.com/user-attachments/files/17908072/chromium-warnings.txt), mostly in third-party code. Almost all warnings were due to mutable variables. I evaluated the warnings by manual inspection. I believe all the resulting warnings are true positives, i.e. they represent potentially-problematic code where duplication might cause a problem. For the clang warnings, I also validated them by either adding const or visibility annotations as appropriate. Limitations I am aware of four main limitations with the current warning: We do not warn when the address of a duplicated object is taken, since doing so is prone to false positives. I'm hopeful that we can create a refined version in the future, however. We only warn for side-effectful initialization if we are certain side effects exist. Warning on potential side effects produced a huge number of false positives; I don't expect there's much that can be done about this in modern C++ code bases, since proving a lack of side effects is difficult. Windows uses a different system (declexport/import) instead of visibility. From manual testing, it seems to behave analogously to the visibility system for the purposes of this warning, but to keep things simple the warning is disabled on windows for now. We don't warn on code inside templates. This is unfortuate, since it masks many real issues, e.g. a templated variable which is implicitly instantiated the same way in multiple TUs should be globally unique, but may accidentally be duplicated. Unfortunately, we found some potential false positives during testing that caused us to disable the warning for now.

Note that PointerUnion::dyn_cast has been soft deprecated in PointerUnion.h: // FIXME: Replace the uses of is(), get() and dyn_cast() with // isa<T>, cast<T> and the llvm::dyn_cast<T> Literal migration would result in dyn_cast_if_present (see the definition of PointerUnion::dyn_cast), but this patch uses dyn_cast because we expect E to be nonnull.

Note that PointerUnion::dyn_cast has been soft deprecated in PointerUnion.h: // FIXME: Replace the uses of is(), get() and dyn_cast() with // isa<T>, cast<T> and the llvm::dyn_cast<T> Literal migration would result in dyn_cast_if_present (see the definition of PointerUnion::dyn_cast), but this patch uses dyn_cast because we expect typeDecl to be nonnull. Note that getObjCInterfaceType starts out dereferencing Decl.

Note that PointerUnion::dyn_cast has been soft deprecated in PointerUnion.h: // FIXME: Replace the uses of is(), get() and dyn_cast() with // isa<T>, cast<T> and the llvm::dyn_cast<T> Literal migration would result in dyn_cast_if_present (see the definition of PointerUnion::dyn_cast), but this patch uses dyn_cast because we expect referent to be nonnull.

…25378) This way we don't need to duplicate the list of supported targets in the release-tasks workflow.

Signed-off-by: Mikhail R. Gadelha <[email protected]>

…#125329)

This also changes the container version numbers in the tag from unix timestamps to the abbreviated commit hash for the workflow. This ensures that the amd64 and arm64 containers have the same tag. For amd64 we now generate 4 tags: * ghcr.io/llvm/ci-ubuntu-22.04:latest * ghcr.io/llvm/ci-ubuntu-22.04:$GITHUB_SHA * ghcr.io/llvm/amd64/ci-ubuntu-22.04:latest * ghcr.io/llvm/amd64/ci-ubuntu-22.04:$GITHUB_SHA For arm64 we generate 2 tags: * ghcr.io/tstellar/arm64v8/ci-ubuntu-22.04:latest * ghcr.io/tstellar/arm64v8/ci-ubuntu-22.04:$GITHUB_SHA

…5260) This change introduces lowering from CIR to LLVM IR of global integer and floating-point variables, using defaults for attributes that aren't yet implemented.

…ationship (llvm#125300) This enables finding the backed thread from the backing thread without going through the thread list, and it will be useful for subsequent commits.

… in getSingleShuffleSrc. (llvm#125455) I have been unsuccessful at further reducing the test. The failure requires a shuffle with 2 scalable->fixed extracts with the same source. 0 is the only valid index for a scalable->fixed extract so the 2 sources must be the same extract. Shuffles with the same source are aggressively canonicalized to a unary shuffle. So it requires the extracts to become identical through other optimizations without the shuffle being canonicalized before it is lowered. Fixes llvm#125306.

…vm#115099) This is similar in spirit to previous changes to make _mm_mfence builtins to avoid conflicts with winnt.h and other MSVC ecosystem headers that pre-declare compiler intrinsics as extern "C" symbols. Also update the feature flag for _mm_prefetch to sse, which is more accurate than mmx. This should fix issue llvm#87515.

From llvm#115099

Summary: This only shows up during the build of the server, silence it.

Split off from llvm#124432 as suggested. Adds VPBuilder::insert, inspired by IRBuilderBase.

LLDB: correct event when removing all watchpoints Previously we incorrectly checked for a "breakpoint changed" event listener removing all watchpoints (e.g. via SBTarget::DeleteAllWatchpoints()), although we would emit a "watchpoint changed" event if there were a listener for 'breakpoint changed'. This meant that we might not emit a "watchpoint changed" event if there was a listener for this event. Correct it to check for the "watchpoint changed" event. --- Updated regression tests which were also incorrectly peeking for the wrong event type. The 'remove' action actually triggers 2 events which the test didn't allow, so I updated it to allow specifically what was requested. The test fails (expectedly) at the line following "DeleteAllWatchpoints" prior to this patch, and passes after.

…#125302) Generally speaking, process plugins (e.g. ProcessGDBRemote) should not be aware of OS plugin threads. However, ProcessGDBRemote attempts to check for the existence of OS threads when calculating stop info. When OS threads are present, it sets the stop info directly on the OS plugin thread and leaves the ThreadGDBRemote without a StopInfo. This is problematic for a few reasons: 1. No other process plugins do this, as they shouldn't. They should set the stop info for their own process threads, and let the abstractions built on top propagate StopInfos. 2. This conflicts with the expectations of ThreadMemory, which checks for the backing threads's info, and then attempts to propagate it (in the future, it should probably ask the plugin itself too...). We see this happening in the code below. The `if` condition will not trigger, because `backing_stop_info_sp` will be null (remember, ProcessGDB remote is ignoring its own threads), and then this method returns false. ``` bool ThreadMemory::CalculateStopInfo() { ... lldb::StopInfoSP backing_stop_info_sp( m_backing_thread_sp->GetPrivateStopInfo()); if (backing_stop_info_sp && backing_stop_info_sp->IsValidForOperatingSystemThread(*this)) { backing_stop_info_sp->SetThread(shared_from_this()); ``` ``` Thread::GetPrivateStopInfo ... if (!CalculateStopInfo()) SetStopInfo(StopInfoSP()); ``` To solve this, we change ProcessGDB remote so that it does the principled thing: it now only sets the stop info of its own threads. This change by itself breaks the tests TestPythonOSPlugin.py and TestOSPluginStepping.py and probably explains why ProcessGDB had originally "violated" this isolation of layers. To make this work, BreakpointSites must be aware of BackingThreads when answering the question: "Is this breakpoint valid for this thread?". Why? Breakpoints are created on top of the OS threads (that's what the user sees), but breakpoints are hit by process threads. In the presence of OS threads, a TID-specific breakpoint is valid for a process thread if it is backing an OS thread with that TID.

Using 'compiler-rt' in 'LLVM_ENABLE_PROJECTS' causes the clang runtime libraries to be build and installed with arch suffix names i.e ```'*-<arch>.a'``` and ```'*-<arch>.so'```.

This patch adds an initial implementation of VPInstruction::computeCost with support for only one instruction so far - VPInstruction::AnyOf. This is only used when vectorising loops with uncountable early exits. Change-Id: I5076e92f2cfe6df3d65c5c9f9fbb1731e919b912

artagnon and others added 30 commits February 3, 2025 19:28

IndVarSimplify: strip redundant getDataLayout (NFC) (llvm#125546)

a29ed04

DataLayout is already available as a member variable.

[X86] Add test case for llvm#124871. NFC

c3b7894

This shows missed opportunity to fold (fshl ld1, ld0, c) -> (ld0[ofs]) if the load chain results are used.

Fix "not all control paths return a value" warning; NFC

070c338

[Hexagon] Avoid repeated hash lookups (NFC) (llvm#125459)

546d03c

[Analysis] Avoid repeated hash lookups (NFC) (llvm#125462)

36fb886

[CodeGen] Avoid repeated hash lookups (NFC) (llvm#125463)

22bc029

[ProfileData] Avoid repeated hash lookups (NFC) (llvm#125464)

09d945d

workflows/release-tasks: Re-use release-binaries-all workflow (llvm#1…

d194c6b

…25378) This way we don't need to duplicate the list of supported targets in the release-tasks workflow.

[RISCV] Precommit test for llvm#124932

d156b85

Signed-off-by: Mikhail R. Gadelha <[email protected]>

workflows/premerge: Cancel in progress jobs when a PR is merged (llvm…

2deba08

…#125329)

[AArch64] Move arith-fp-sve.ll to sve-arith-fp.ll. NFC

a9b3e11

[CIR] Initial implementation of CIR-to-LLVM IR lowering pass (llvm#12…

622ee03

…5260) This change introduces lowering from CIR to LLVM IR of global integer and floating-point variables, using defaults for attributes that aren't yet implemented.

[lldb] Implement bidirectional access for backing<->backed thread rel…

90a51a4

…ationship (llvm#125300) This enables finding the backed thread from the backing thread without going through the thread list, and it will be useful for subsequent commits.

[clang] Unbreak build

5dccfd9

From llvm#115099

[libc][NFC] Silence C99 extension warnings on clang for RPC

36d6b63

Summary: This only shows up during the build of the server, silence it.

[OpenMP] Guard OpenMP specific entry handling

a284a6e

[LV] Add VPBuilder::insert, use to insert created vector pointer (NFC).

f8fa931

Split off from llvm#124432 as suggested. Adds VPBuilder::insert, inspired by IRBuilderBase.

[Hexagon][Docs] document the change in the default target (llvm#125584)

1abe7e8

ampandey-1995 and others added 2 commits February 5, 2025 20:16

[NFC] Fix Lit test 'amdgpu-openmp-sanitize-options.c'. (llvm#125857)

56a4988

Using 'compiler-rt' in 'LLVM_ENABLE_PROJECTS' causes the clang runtime libraries to be build and installed with arch suffix names i.e ```'*-<arch>.a'``` and ```'*-<arch>.so'```.

hassnaaHamdi requested review from a team, JDevlieghere, QuietMisdreavus, aaupov, antiagainst, ayermolo, bcardosolopes, daniel-grumberg, dcaballe, dcci, ftynse, grypp, kuhar, lanza, maksfb, nicolasvasilache, nikic, rafaelauler, rengolin and yota9 as code owners February 13, 2025 16:00

hassnaaHamdi marked this pull request as draft February 13, 2025 16:01

hassnaaHamdi closed this Feb 13, 2025

hassnaaHamdi deleted the ee_costs1 branch February 13, 2025 16:02

ldionne removed request for a team, ayermolo, daniel-grumberg, dcaballe and yota9 February 15, 2025 10:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[LoopVectorize] Add the cost of VPInstruction::AnyOf to vplan #127085

[LoopVectorize] Add the cost of VPInstruction::AnyOf to vplan #127085

Uh oh!

hassnaaHamdi commented Feb 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

127 participants

[LoopVectorize] Add the cost of VPInstruction::AnyOf to vplan #127085

[LoopVectorize] Add the cost of VPInstruction::AnyOf to vplan #127085

Uh oh!

Conversation

hassnaaHamdi commented Feb 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

127 participants