Skip to content

[CIR][Transforms] Introduce StdVectorCtorOp & StdVectorDtorOp #1769

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2,635 commits into
base: main
Choose a base branch
from

Conversation

bruteforceboy
Copy link
Contributor

In this PR, I continue work on Issue#1653.

This PR introduces std.vector_cxx_ctor and std.vector_cxx_dtor building on the special member attributes from PR#1711. Extending the implementation for other std constructors/destructors is quite easy too. For example, to introduce std::map, the only changes needed are:

+++ b/clang/include/clang/CIR/Dialect/IR/CIRStdOps.td
+def CIR_StdMapCtorOp: CIR_StdOp<"map_cxx_ctor",
+  (ins CIR_AnyType:$first),
+  (outs Optional<CIR_AnyType>:$result)>;

+++ b/clang/lib/CIR/Dialect/Transforms/IdiomRecognizer.cpp
@@ -239,7 +239,7 @@ void IdiomRecognizerPass::recognizeCall(CallOp call) {
   using StdFunctionsRecognizer =
       std::tuple<StdRecognizer<StdFindOp>, StdRecognizer<StdVectorCtorOp>,
-                 StdRecognizer<StdVectorDtorOp>>;
+                 StdRecognizer<StdVectorDtorOp>, StdRecognizer<StdMapCtorOp>>;

To get this to work, I updated the CXXCtorAttr and CXXDtorAttr special member attributes to also have the source clang::RecordDecl. A few things that may be worth discussing are:

  • Do we get rid of the mlir::Type in both attributes now that we have clang::RecordDecl?
  • Should we also print/parse the clang::RecordDecl? For now, I think it's unnecessary, because it is pretty much the same information as the mlir::Type.

I have also added one test to demo the added operations. Please, let me know your thoughts. Thanks!

FantasqueX and others added 30 commits April 9, 2025 15:18
This PR removes a useless argument `convertToInt` and removes hardcoded
`Sint32Type`.

I realized I committed a new file with CRLF before. Really sorry about
that >_<
There are some subtleties here.

This is the code in OG:
```cpp
// note: this is different from default ABI
if (!RetTy->isScalarType())
  return ABIArgInfo::getDirect();
```
which says we should return structs directly. It's correct, has have the
same behaviour as `nvcc`, and it obeys the PTX ABI as well.
The comment dates back to 2013 (see [this
commit](llvm/llvm-project@f9329ff)
-- it didn't provide any explanation either), so I believe it's
outdated. I didn't include this comment in the PR.
…lvm#1486)

The pattern `call {{.*}} i32` mismatches `call i32` due to double spaces
surrounding `{{.*}}`. This patch removes the first space to fix the
failure.
…1487)

This PR resolves an assertion failure in
`CIRGenTypes::isFuncParamTypeConvertible`, which is involved when trying
to emit a vtable entry to a virtual function whose type includes a
pointer-to-member-function.
…lvm#1431)

Implements `::verify` for operations cir.atomic.xchg and
cir.atomic.cmp_xchg

I believe the existing regression tests don't get to the CIR level type
check failure and I was not able to implement a case that does.

Most attempts of reproducing cir.atomic.xchg type check failure were
along the lines of:
```
int a;
long long b,c;
__atomic_exchange(&a, &b, &c, memory_order_seq_cst);
```

And they seem to never trigger the failure on `::verify` because they
fail earlier in function parameter checking:
```
exmp.cpp:7:27: error: cannot initialize a parameter of type 'int *' with an rvalue of type 'long long *'
    7 |     __atomic_exchange(&a, &b, &c, memory_order_seq_cst);
      |                           ^~
```

Closes llvm#1378 .
This PR adds a new boolean flag to the `cir.load` and the `cir.store`
operation that distinguishes nontemporal loads and stores. Besides, this
PR also adds support for the `__builtin_nontemporal_load` and the
`__builtin_nontemporal_store` intrinsic function.
This PR adds a new boolean flag to the `cir.load` and the `cir.store`
operation that distinguishes nontemporal loads and stores. Besides, this
PR also adds support for the `__builtin_nontemporal_load` and the
`__builtin_nontemporal_store` intrinsic function.
This PR adds an insertion guard for the try body scope for try-catch.
Currently, the following code snippet fails during CodeGen:

```
void foo() {
  int r = 1;
  try {
    ++r;
    return;
  } catch (...) {
  }
}
```

The insertion point doesn't get reset properly and the cleanup is being
ran for a wrong/deleted block causing a segmentation fault. I also added
a test.
The comments suggested that we should use TableGen to generate the
recognizing functions. However, I think templates might be more suitable
for generating them -- and I can't find any existing TableGen backends
that let us generate arbitrary functions.

My choice of design is to offer a template to match standard library
functions:
```cpp
// matches std::find with 3 arguments, and raise it into StdFindOp
StdRecognizer<3, StdFindOp, StdFuncsID::Find>
```
I have to use a TableGen'd enum to map names to IDs, as we can't pass
string literals to template arguments easily in C++17.

This also constraints design of future `StdXXXOp`s: they must take
operands the same way of StdFindOp, where the first one is the original
function, and the rest are function arguments.

I'm not sure if this approach is the best way. Please tell me if you
have concerns or any alternative ways.
…was set explicitly (llvm#1482)

This is backported from a change made in
llvm/llvm-project#131181

---------

Co-authored-by: Morris Hafner <[email protected]>
…R attribute. (llvm#1467)

Started decorating CUDA shadow variables with the shadow_name CIR
attribute which will be used for registering the globals.
… target was set explicitly" (llvm#1509)

Reverts llvm#1482

@mmha this is crashing on macos on asserts build:
```
FAIL: Clang :: CIR/Tools/cir-translate/warn-default-triple.cir (472 of 552)
******************** TEST 'Clang :: CIR/Tools/cir-translate/warn-default-triple.cir' FAILED ********************
Exit Code: 134

Command Output (stdout):
--
Assertion failed: (!DataLayoutString.empty() && "Uninitialized DataLayout!"), function getDataLayoutString, file TargetInfo.h, line 1282.
```

Perhaps besides picking a default you maybe need to do some missing
datalayout init?
…lementwise_acos (llvm#1507)

Closes: llvm#1374

Replaces LLVMIntrinsicCallOp with ACosOp in __builtin_elementwise_acos.
Sub-issue of llvm#1192. Adds
CIR_ASinOp and support for __builtin_elementwise_asin.
This un-xfails the 6 files in llvm#1497 related to variadic calls.
Sub-issue of llvm#1192. Adds
CIR_ATanOp and support for __builtin_elementwise_atan.
Part of llvm#258 .

1. Added `AddressPointAttr`
2. Change all occurrences of `VTableAddrPointOp` into using the
attribute
3. Update tests

---------

Co-authored-by: Sirui Mu <[email protected]>
Co-authored-by: Morris Hafner <[email protected]>
Co-authored-by: Morris Hafner <[email protected]>
Co-authored-by: Sharp-Edged <[email protected]>
Co-authored-by: Amr Hesham <[email protected]>
Co-authored-by: Bruno Cardoso Lopes <[email protected]>
Co-authored-by: Letu Ren <[email protected]>
tommymcm and others added 15 commits July 25, 2025 13:41
…ent (llvm#1748)

## Overview
Currently, getting the pointer to an element of an array requires a
pointer decay and a (possible) pointer stride. A similar pattern for
records has been eliminated with the `cir.get_member` operation. This PR
provides a similar level of abstraction for arrays with the
`get_element` operation.
`get_element` replaces the above pattern with a single operation, which
takes a pointer to an array and an index, and produces a pointer to the
element at that index.
There are many places in CIR analysis and lowering where the
`ptr_stride(array_to_ptrdecay(x), i)` pattern is handled as a special
case. By subsuming the special case pattern with an explicit operation,
we make these analyses and lowering more robust.

## Changes
Adds the `cir.get_element` operation.
Extends CIRGen to emit `cir.get_element` for array subscript
expressions.
Updated LifetimeCheck to handle `get_element` operation, subsuming
special case analysis of `cir.ptr_stride` operation (did not remove the
special case).
Extends CIR-to-LLVM lowering to lower `cir.get_element` to
`llvm.getelementptr`
Extends CIR-to-MLIR lowering to lower `cir.get_element` to `memref`
operations, matching existing special case `cir.ptr_stride` lowering.

## Additional Notes
Currently, 47.6% of `cir.ptr_stride` operations in the llvm-test-suite
(SingleSource and MultiSource) can be replaced by `cir.get_element`
operations.

### Operator Breakdown (current)
name | count | %
-- | -- | --
cir.load | 825221 | 22.27%
cir.br | 429822 | 11.60%
cir.const | 380381 | 10.26%
cir.cast | 325646 | 8.79%
cir.store | 309586 | 8.35%
cir.get_member | 226895 | 6.12%
cir.get_global | 186851 | 5.04%
cir.ptr_stride | 158094 | 4.27%
cir.call | 144522 | 3.90%
cir.binop | 141142 | 3.81%
cir.alloca | 134346 | 3.63%
cir.brcond | 112864 | 3.05%
cir.cmp | 83532 | 2.25%

### Operator Breakdown (with `get_element`)
name | count | %
-- | -- | --
cir.load | 825221 | 22.74%
cir.br | 429822 | 11.84%
cir.const | 380381 | 10.48%
cir.store | 309586 | 8.53%
cir.cast | 248645 | 6.85%
cir.get_member | 226895 | 6.25%
cir.get_global | 186851 | 5.15%
cir.call | 144522 | 3.98%
cir.binop | 141142 | 3.89%
cir.alloca | 134346 | 3.70%
cir.brcond | 112864 | 3.11%
cir.cmp | 83532 | 2.30%
cir.ptr_stride | 81093 | 2.23%
cir.get_elem | 77001 | 2.12%

---------

Co-authored-by: Andy Kaylor <[email protected]>
Co-authored-by: Henrich Lauko <[email protected]>
Implemented `noexcept` expression handling in CIR generation. 
Added a `noexcept.cpp` test based on cppreference. There was no OG test to base it off of, so I used the example code from [cppreference](https://en.cppreference.com/w/cpp/language/noexcept.html).
I think this one is self-explanatory, so I will not write much 🙂‍

Adding this attribute helps in optimizations like
[llvm#1653](llvm#1653), and using the
attribute it's easy to create operations like
`cir.std.vector.ctor`/`cir.std.vector.dtor` by just modifying
`IdiomRecognizer` a bit. I believe it will also be useful for future
optimizations. Finally, I updated quite a number of tests so they now
reflect this attribute.

Please, let me know if you see any issues.
Implemented opportunistic vtable emission, which marks vtables as
`available_externally` to enable inlining if optimizations are enabled.
Added `GlobalOp` verifier support `available_externally` linkage type,
all cases are covered now, so I removed the `default` case.
Added the `vtable-available-externally` CIRGen test.
Fix lowering Complex to Complex cast, backported from
llvm/llvm-project#149717
…r` (llvm#1753)

Implemented CIR code generation for `CXXPseudoDestructorExpr`. 
Added a pseudo destructor test to `CIR/CodeGen/dtors.cpp`.
…trdecay` to `get_element` when possible (llvm#1761)

Extended the `CIRCanonicalizePass` with new rewrite rules: 
- Rewrite `ptr_stride (cast array_to_ptrdecay %base), %index` to
`get_element %base[%index]`
- Rewrite `ptr_stride (get_element %base[%index]), %stride` to
`get_element %base[%index + %stride]`
- Rewrite `cast array_to_ptrdecay %base, ptr<T>` to `get_element
%base[0], ptr<T>` if it is only used by `load %ptr : T`, `store %val :
T, %ptr`, or `get_member %ptr[field] : ptr<T> -> U`

Updated CodeGen tests, and extended CIR-to-CIR test.

---------

Co-authored-by: Henrich Lauko <[email protected]>
)

`cir::PointerType` was not included in the applicability guard for
`cir::VAArg` lowering during `LoweringPrepare`.
Since we don't have generic LLVM `cir::VAArgOp` (see [more
info](llvm#1088 (comment)))
this causes an NYI error during lowering that doesn't need to happen.
To fix this I added the missing `cir::PointerType` to the `isa`. 
There is probably a more comprehensive fix to this if someone is
interested, this check should be removed and let the (possible) error
occur at the actual NYI site.
- Replaces  dyn_cast<cir::ConstantOp>(v.getDefiningOp()) and similar with v.getDefiningOp<cir::ConstantOp>()
- Adds `getValueAttr`, `getIntValue` and `getBoolValue` methods to ConstantOp
…1747)

(Copied from my question on Discord)
 
I’ve been working on the vector to bit-mask related intrinsics for X86.
I’ve been stuck specifically on
`X86::BI__builtin_ia32_cvtb2mask128(_mm256_movepi16_mask`) and its
variations with different vector/mask sizes.

In this case, we perform a vector comparison of `vector<16xi16>` and
bitcast the resulting `vector<16xi1>` directly into a scalar integer
mask (i16).

I’m successfully able to lower to cir:
```
    ...
    %5 = cir.vec.cmp(lt, %3, %4) : !cir.vector<!s16i x 16>, !cir.vector<!cir.int<u, 1> x 16>
    %6 = cir.cast(bitcast, %5 : !cir.vector<!cir.int<u, 1> x 16>), !u16i
    ...
```

There's an issue arises when lowering this to LLVM, the error message
I'm getting is:

```
error: integer width of the output type is smaller or equal to the integer width of the input type
```

By looking at the test cases on the llvm dialect, this is related to the
sext / zext instruction.

This is the cir → llvm dialect lowered for the latter:

```
        ...
    %14 = "llvm.icmp"(%12, %13) <{predicate = 2 : i64}> : (vector<16xi16>, vector<16xi16>) -> vector<16xi1>
    %15 = "llvm.sext"(%14) : (vector<16xi1>) -> vector<16xi1>
    %16 = "llvm.bitcast"(%15) : (vector<16xi1>) -> i16
        ...
```

This is seems to be the cause:

```
 %15 = "llvm.sext"(%14) : (vector<16xi1>) -> vector<16xi1>
 ```
 
 **The fix**: Added a type check: if the result type does not differ from the expected type, we won't insert a sextOp
Implemented `CXXDeleteExpr` for concrete and virtual destructors.
NYI, global delete, i.e., `::delete`.
Added tests for both destructor types.
For these intrinsics there only seems to be one function where the IR
emmited seems to diverge:

for `_mm_load_sbh` loads a single 16-bit bfloat (__bf16) value from
memory into the lowest element of a 128-bit bfloat vector (__m128bh),
leaving the remaining lanes unchanged or filled with a passthrough
value. It is implemented using a masked load with only the first lane
enabled.

[source for intrinsics with similar
behaviour](https://gist.github.com/leopck/86799fee6ceb9649d0ebe32c1c6e5b85)

In the CIR lowering of `_mm_load_sbh`, we are currently emitting the
mask of intrinsic (`llvm.masked.load`) operand as an explicit constant
vector:

``` llvm
<8 x i1> <true, false, false, false, false, false, false, false>
```
whereas OG lowers:
```llvm
<8 x i1> bitcast (<1 x i8> splat (i8 1) to <8 x i1>)
```
I believe both things are semantically equal so:

Is it acceptable for CIR and OG to diverge in this way for masked loads,
or should we aim for parity in how the mask is represented, even if that
reduces readability in CIR?
Implement supporting for CK_LValueToRValueBitCast for ComplexType
Copy link
Member

@bcardosolopes bcardosolopes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While here, can we also add statistics counts (https://mlir.llvm.org/docs/PassManagement/#pass-statistics) to these? It's going to be super useful for finding opportunities soon. It's fine it comes in a next PR.

@@ -61,5 +61,11 @@ def CIR_IterBeginOp: CIR_StdOp<"begin",
def CIR_IterEndOp: CIR_StdOp<"end",
(ins CIR_AnyType:$container),
(outs CIR_AnyType:$result)>;
def CIR_StdVectorCtorOp: CIR_StdOp<"vector_cxx_ctor",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vector_cxx_ctor -> vector.ctor

def CIR_StdVectorCtorOp: CIR_StdOp<"vector_cxx_ctor",
(ins CIR_AnyType:$first),
(outs Optional<CIR_AnyType>:$result)>;
def CIR_StdVectorDtorOp: CIR_StdOp<"vector_cxx_dtor",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar for dtor!

namespace std {
template <typename T> class vector {
public:
vector() {} // expected-remark {{found call to std::vector_cxx_ctor()}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These remarks are great, thanks! I think we need to adjust the names a bit given std::vector_cxx_ctor() doesn't really exist. Perhaps found call to std::vector constructor and similar for dtor?

@bcardosolopes
Copy link
Member

It's also possible if in the future we want to generalize ctor/dtors in a different way, but I think this is a good start to get more data on how these things are going to be useful.

(cc @tommymcm)

@bcardosolopes
Copy link
Member

  • Do we get rid of the mlir::Type in both attributes now that we have clang::RecordDecl?

How hard would be to use mlir::Type to get the same information this PR gets from clang::RecordDecl? Would we need to augment cir.record much? We probably want to keep the record type lightweight, but it's a tradeoff.

@bcardosolopes
Copy link
Member

  • Should we also print/parse the clang::RecordDecl? For now, I think it's unnecessary, because it is pretty much the same information as the mlir::Type.

This is harder than it seems, because for that we'd want to use AST serialization/deserialization (and be able to roundtrip tests, etc) - this is a known limitation of keeping the AST around. As you noticed, it forces us into writing tests from source instead of CIR to CIR.

@bcardosolopes
Copy link
Member

(cc @andykaylor)

@tommymcm
Copy link
Contributor

tommymcm commented Aug 7, 2025

  • Do we get rid of the mlir::Type in both attributes now that we have clang::RecordDecl?

How hard would be to use mlir::Type to get the same information this PR gets from clang::RecordDecl? Would we need to augment cir.record much? We probably want to keep the record type lightweight, but it's a tradeoff.

Would it be better to just introduce a cir.std.vector type at this point? This would also let us clean up duplicated ops: instead of having a ctor/dtor op for each std type we care about, we could have a single cir.ctor with a type specifier , so cir.std.vector.ctor would become cir.ctor ... : cir.ptr<cir.std.vector<...>>.

I am interested in helping out with this if folks think its a good direction.

(I don't think that this is necessary for this PR, I agree with Bruno that having something working with statistics will help us know what needs to be changed/refined)

Copy link
Collaborator

@andykaylor andykaylor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize I'm jumping in to the middle of the design/implement cycle, but I have a series of nested questions about why this isn't more general.

Why do we need a vector-specific CIR_StdVectorCtorOp rather than a general CIR_StdCtrOp that takes the class type as an argument/attribute. Which leads me to, why does this need to be limited to std rather than just a general CIR_CallCtorOp with an attribute telling you what it's constructing in a way that the constructee could easily be recognized as a standard libary class. And then, why do we need a dedicated op at all, as opposed to just attaching the cxx_ctor attribute to the callsite?

The two primary advantages I see for the latter are (1) you wouldn't need any extra handling for call vs. invoke, and (2) any transformation that was trying to operate on calls would find these structors without needing special handling.

@@ -61,5 +61,11 @@ def CIR_IterBeginOp: CIR_StdOp<"begin",
def CIR_IterEndOp: CIR_StdOp<"end",
(ins CIR_AnyType:$container),
(outs CIR_AnyType:$result)>;
def CIR_StdVectorCtorOp: CIR_StdOp<"vector_cxx_ctor",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to see examples here in some form. A form that would put the example in the generated HTML pages for the dialect would be best, but at a minimum, some comment here would be helpful, especially if the tests aren't going to show the full CIR generated.

@bcardosolopes
Copy link
Member

Why do we need a vector-specific CIR_StdVectorCtorOp rather than a general CIR_StdCtrOp that takes the class type as an argument/attribute. Which leads me to, why does this need to be limited to std rather than just a general CIR_CallCtorOp with an attribute telling you what it's constructing in a way that the constructee could easily be recognized as a standard libary class.

I tend more towards the CIR_CallCtorOp design (with maybe richer input type that already decoded this-is-a-std-vector-class, like suggested by @tommymcm). It's worth noting that by using the more generic form CIR_CallCtorOp instead of CIR_StdVectorCtorOp we'd be basically shifting the idiom recognition of a std vector ctor to be done whenever we want to check for a specific ctor versus doing that logic only once during creation time of the operation.

The only existing use of ctor information so far is in the lifetime checker and it gets by easily by looking into the called function for the ctor attribute. @bruteforceboy should we wait until you introduce the uses so we can see how much we'd benefit from this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.