-
Notifications
You must be signed in to change notification settings - Fork 161
[CIR][WIP] Add ABI lowering pass #1471
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
The direct motivation for this new pass is the proper CallConvLowering of the cir.func @test(%arg0: !cir.method) {
// ...
} Following the current lowering approach, which keeps define dso_local @test({ i64, i64 } %0) {
; ...
} But we actually expect the following LLVM IR (note the differences on the function signatures): define dso_local @test(i64 %0, i64 %1) {
; ...
} To achieve this, I have 3 choices:
At the beginning I thought option 1 would be the easiest way. But as I dig through the rabbit hole I found some tricky stuff behind the scene. The problem comes from the CodeGen of function prologue and epilogue. In the prologue, each argument is assigned a stack slot and stored there. For an argument of type The problem of option 3 is that the LoweringPrepare pass is not a conversion pass, which could be really tricky if you want to do type conversion stuff in it. In my case I have to convert every appearances of Anyway, this PR is still very incomplete and under construction, I'd like to hear some early comments about this from the community. |
There are some subtleties here. This is the code in OG: ```cpp // note: this is different from default ABI if (!RetTy->isScalarType()) return ABIArgInfo::getDirect(); ``` which says we should return structs directly. It's correct, has have the same behaviour as `nvcc`, and it obeys the PTX ABI as well. The comment dates back to 2013 (see [this commit](llvm/llvm-project@f9329ff) -- it didn't provide any explanation either), so I believe it's outdated. I didn't include this comment in the PR.
…lvm#1486) The pattern `call {{.*}} i32` mismatches `call i32` due to double spaces surrounding `{{.*}}`. This patch removes the first space to fix the failure.
…1487) This PR resolves an assertion failure in `CIRGenTypes::isFuncParamTypeConvertible`, which is involved when trying to emit a vtable entry to a virtual function whose type includes a pointer-to-member-function.
Lower neon vabsd_s64
…lvm#1431) Implements `::verify` for operations cir.atomic.xchg and cir.atomic.cmp_xchg I believe the existing regression tests don't get to the CIR level type check failure and I was not able to implement a case that does. Most attempts of reproducing cir.atomic.xchg type check failure were along the lines of: ``` int a; long long b,c; __atomic_exchange(&a, &b, &c, memory_order_seq_cst); ``` And they seem to never trigger the failure on `::verify` because they fail earlier in function parameter checking: ``` exmp.cpp:7:27: error: cannot initialize a parameter of type 'int *' with an rvalue of type 'long long *' 7 | __atomic_exchange(&a, &b, &c, memory_order_seq_cst); | ^~ ``` Closes llvm#1378 .
Lower neon vcaled_f64
This PR adds a new boolean flag to the `cir.load` and the `cir.store` operation that distinguishes nontemporal loads and stores. Besides, this PR also adds support for the `__builtin_nontemporal_load` and the `__builtin_nontemporal_store` intrinsic function.
This PR adds a new boolean flag to the `cir.load` and the `cir.store` operation that distinguishes nontemporal loads and stores. Besides, this PR also adds support for the `__builtin_nontemporal_load` and the `__builtin_nontemporal_store` intrinsic function.
Lower vcales_f32
This PR adds an insertion guard for the try body scope for try-catch. Currently, the following code snippet fails during CodeGen: ``` void foo() { int r = 1; try { ++r; return; } catch (...) { } } ``` The insertion point doesn't get reset properly and the cleanup is being ran for a wrong/deleted block causing a segmentation fault. I also added a test.
The comments suggested that we should use TableGen to generate the recognizing functions. However, I think templates might be more suitable for generating them -- and I can't find any existing TableGen backends that let us generate arbitrary functions. My choice of design is to offer a template to match standard library functions: ```cpp // matches std::find with 3 arguments, and raise it into StdFindOp StdRecognizer<3, StdFindOp, StdFuncsID::Find> ``` I have to use a TableGen'd enum to map names to IDs, as we can't pass string literals to template arguments easily in C++17. This also constraints design of future `StdXXXOp`s: they must take operands the same way of StdFindOp, where the first one is the original function, and the rest are function arguments. I'm not sure if this approach is the best way. Please tell me if you have concerns or any alternative ways.
…was set explicitly (llvm#1482) This is backported from a change made in llvm/llvm-project#131181 --------- Co-authored-by: Morris Hafner <[email protected]>
…R attribute. (llvm#1467) Started decorating CUDA shadow variables with the shadow_name CIR attribute which will be used for registering the globals.
Lower neon vcaltd_f64
… target was set explicitly" (llvm#1509) Reverts llvm#1482 @mmha this is crashing on macos on asserts build: ``` FAIL: Clang :: CIR/Tools/cir-translate/warn-default-triple.cir (472 of 552) ******************** TEST 'Clang :: CIR/Tools/cir-translate/warn-default-triple.cir' FAILED ******************** Exit Code: 134 Command Output (stdout): -- Assertion failed: (!DataLayoutString.empty() && "Uninitialized DataLayout!"), function getDataLayoutString, file TargetInfo.h, line 1282. ``` Perhaps besides picking a default you maybe need to do some missing datalayout init?
Sub-issue of llvm#1192. Adds CIR_ASinOp and support for __builtin_elementwise_asin.
This un-xfails the 6 files in llvm#1497 related to variadic calls.
Lower vcalts_f32
The LoweringPrepare pass was generating the wrong condition for loops when lowering the ArrayCtor op, causing only one element in an array of objects to be constructed. This fixes that problem.
Backporting passing enum kind directly to complex cast helpers
…ent (llvm#1748) ## Overview Currently, getting the pointer to an element of an array requires a pointer decay and a (possible) pointer stride. A similar pattern for records has been eliminated with the `cir.get_member` operation. This PR provides a similar level of abstraction for arrays with the `get_element` operation. `get_element` replaces the above pattern with a single operation, which takes a pointer to an array and an index, and produces a pointer to the element at that index. There are many places in CIR analysis and lowering where the `ptr_stride(array_to_ptrdecay(x), i)` pattern is handled as a special case. By subsuming the special case pattern with an explicit operation, we make these analyses and lowering more robust. ## Changes Adds the `cir.get_element` operation. Extends CIRGen to emit `cir.get_element` for array subscript expressions. Updated LifetimeCheck to handle `get_element` operation, subsuming special case analysis of `cir.ptr_stride` operation (did not remove the special case). Extends CIR-to-LLVM lowering to lower `cir.get_element` to `llvm.getelementptr` Extends CIR-to-MLIR lowering to lower `cir.get_element` to `memref` operations, matching existing special case `cir.ptr_stride` lowering. ## Additional Notes Currently, 47.6% of `cir.ptr_stride` operations in the llvm-test-suite (SingleSource and MultiSource) can be replaced by `cir.get_element` operations. ### Operator Breakdown (current) name | count | % -- | -- | -- cir.load | 825221 | 22.27% cir.br | 429822 | 11.60% cir.const | 380381 | 10.26% cir.cast | 325646 | 8.79% cir.store | 309586 | 8.35% cir.get_member | 226895 | 6.12% cir.get_global | 186851 | 5.04% cir.ptr_stride | 158094 | 4.27% cir.call | 144522 | 3.90% cir.binop | 141142 | 3.81% cir.alloca | 134346 | 3.63% cir.brcond | 112864 | 3.05% cir.cmp | 83532 | 2.25% ### Operator Breakdown (with `get_element`) name | count | % -- | -- | -- cir.load | 825221 | 22.74% cir.br | 429822 | 11.84% cir.const | 380381 | 10.48% cir.store | 309586 | 8.53% cir.cast | 248645 | 6.85% cir.get_member | 226895 | 6.25% cir.get_global | 186851 | 5.15% cir.call | 144522 | 3.98% cir.binop | 141142 | 3.89% cir.alloca | 134346 | 3.70% cir.brcond | 112864 | 3.11% cir.cmp | 83532 | 2.30% cir.ptr_stride | 81093 | 2.23% cir.get_elem | 77001 | 2.12% --------- Co-authored-by: Andy Kaylor <[email protected]> Co-authored-by: Henrich Lauko <[email protected]>
Implemented `noexcept` expression handling in CIR generation. Added a `noexcept.cpp` test based on cppreference. There was no OG test to base it off of, so I used the example code from [cppreference](https://en.cppreference.com/w/cpp/language/noexcept.html).
I think this one is self-explanatory, so I will not write much 🙂 Adding this attribute helps in optimizations like [llvm#1653](llvm#1653), and using the attribute it's easy to create operations like `cir.std.vector.ctor`/`cir.std.vector.dtor` by just modifying `IdiomRecognizer` a bit. I believe it will also be useful for future optimizations. Finally, I updated quite a number of tests so they now reflect this attribute. Please, let me know if you see any issues.
Implemented opportunistic vtable emission, which marks vtables as `available_externally` to enable inlining if optimizations are enabled. Added `GlobalOp` verifier support `available_externally` linkage type, all cases are covered now, so I removed the `default` case. Added the `vtable-available-externally` CIRGen test.
Fix lowering Complex to Complex cast, backported from llvm/llvm-project#149717
…r` (llvm#1753) Implemented CIR code generation for `CXXPseudoDestructorExpr`. Added a pseudo destructor test to `CIR/CodeGen/dtors.cpp`.
…trdecay` to `get_element` when possible (llvm#1761) Extended the `CIRCanonicalizePass` with new rewrite rules: - Rewrite `ptr_stride (cast array_to_ptrdecay %base), %index` to `get_element %base[%index]` - Rewrite `ptr_stride (get_element %base[%index]), %stride` to `get_element %base[%index + %stride]` - Rewrite `cast array_to_ptrdecay %base, ptr<T>` to `get_element %base[0], ptr<T>` if it is only used by `load %ptr : T`, `store %val : T, %ptr`, or `get_member %ptr[field] : ptr<T> -> U` Updated CodeGen tests, and extended CIR-to-CIR test. --------- Co-authored-by: Henrich Lauko <[email protected]>
) `cir::PointerType` was not included in the applicability guard for `cir::VAArg` lowering during `LoweringPrepare`. Since we don't have generic LLVM `cir::VAArgOp` (see [more info](llvm#1088 (comment))) this causes an NYI error during lowering that doesn't need to happen. To fix this I added the missing `cir::PointerType` to the `isa`. There is probably a more comprehensive fix to this if someone is interested, this check should be removed and let the (possible) error occur at the actual NYI site.
- Replaces dyn_cast<cir::ConstantOp>(v.getDefiningOp()) and similar with v.getDefiningOp<cir::ConstantOp>() - Adds `getValueAttr`, `getIntValue` and `getBoolValue` methods to ConstantOp
…Op()) (NFC) (llvm#1765) This applies similar changes to llvm/llvm-project#150428
…1747) (Copied from my question on Discord) I’ve been working on the vector to bit-mask related intrinsics for X86. I’ve been stuck specifically on `X86::BI__builtin_ia32_cvtb2mask128(_mm256_movepi16_mask`) and its variations with different vector/mask sizes. In this case, we perform a vector comparison of `vector<16xi16>` and bitcast the resulting `vector<16xi1>` directly into a scalar integer mask (i16). I’m successfully able to lower to cir: ``` ... %5 = cir.vec.cmp(lt, %3, %4) : !cir.vector<!s16i x 16>, !cir.vector<!cir.int<u, 1> x 16> %6 = cir.cast(bitcast, %5 : !cir.vector<!cir.int<u, 1> x 16>), !u16i ... ``` There's an issue arises when lowering this to LLVM, the error message I'm getting is: ``` error: integer width of the output type is smaller or equal to the integer width of the input type ``` By looking at the test cases on the llvm dialect, this is related to the sext / zext instruction. This is the cir → llvm dialect lowered for the latter: ``` ... %14 = "llvm.icmp"(%12, %13) <{predicate = 2 : i64}> : (vector<16xi16>, vector<16xi16>) -> vector<16xi1> %15 = "llvm.sext"(%14) : (vector<16xi1>) -> vector<16xi1> %16 = "llvm.bitcast"(%15) : (vector<16xi1>) -> i16 ... ``` This is seems to be the cause: ``` %15 = "llvm.sext"(%14) : (vector<16xi1>) -> vector<16xi1> ``` **The fix**: Added a type check: if the result type does not differ from the expected type, we won't insert a sextOp
Implemented `CXXDeleteExpr` for concrete and virtual destructors. NYI, global delete, i.e., `::delete`. Added tests for both destructor types.
For these intrinsics there only seems to be one function where the IR emmited seems to diverge: for `_mm_load_sbh` loads a single 16-bit bfloat (__bf16) value from memory into the lowest element of a 128-bit bfloat vector (__m128bh), leaving the remaining lanes unchanged or filled with a passthrough value. It is implemented using a masked load with only the first lane enabled. [source for intrinsics with similar behaviour](https://gist.github.com/leopck/86799fee6ceb9649d0ebe32c1c6e5b85) In the CIR lowering of `_mm_load_sbh`, we are currently emitting the mask of intrinsic (`llvm.masked.load`) operand as an explicit constant vector: ``` llvm <8 x i1> <true, false, false, false, false, false, false, false> ``` whereas OG lowers: ```llvm <8 x i1> bitcast (<1 x i8> splat (i8 1) to <8 x i1>) ``` I believe both things are semantically equal so: Is it acceptable for CIR and OG to diverge in this way for masked loads, or should we aim for parity in how the mask is represented, even if that reduces readability in CIR?
Implement supporting for CK_LValueToRValueBitCast for ComplexType
This patch adds a new pass cir-abi-lowering to the CIR dialect. This pass runs before the CallConvLowering pass, and it expands all ABI-dependent types and operations inside a function to their ABI-independent equivalences according to the ABI specification. This patch also moves the lowering code of the following types and operations from the LLVM lowering conversion to the new pass: - The pointer-to-data-member type `cir.data_member`; - The pointer-to-member-function type `cir.method`; - All operations working on operands of the above types.
352959f
to
fc67ccf
Compare
Sorry for the delay here, I finally got the bandwidth to update this PR. The primary goal of this PR is to introduce the cir-abi-lowering pass which performs ABI lowering work for globals and inside each function. The cir-abi-lowering pass converts ABI dependent types to more "fundamental" CIR types, and replaces operations that act on these ABI dependent types with more "fundamental" CIR operations. The idea of cir-abi-lowering is to avoid mixing such ABI lowering logic in LLVM lowering code, which could be confusing and make the code more complex. With the introduction of cir-abi-lowering, we got the following benefits:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the direction overall, thanks for working on this. I feel it's a bit confusing to have both cir-abi-lowering
and cir-call-conv-lowering
being distinct passes. Can we merge cir-call-conv-lowering
into the former while keeping a switch to enable/disable the call conv lowering part (should be default to off for now)? It's fine if you do that in a follow up PR, but I'd like to have a plan before we land this.
That makes sense to me, and I'll do that in a follow-up PR. Actually in short term I also plan to migrate part of LoweringPrepare as well, as some ABI lowering code is put there for now (e.g. the handling of |
Yea, since we have no need for postponing some of the ABI decisions we just do it right away, avoiding duplicating AST information to only when necessary |
This PR attempts to add a new pass
cir-abi-lowering
to the CIR dialect. This pass runs before the CallConvLowering pass, and it expands all ABI-dependent types and operations inside a function to their ABI-independent equivalences according to the ABI specification.The patch also moves the lowering code of the following types and operations from the LLVM lowering conversion to the new pass:
cir.data_member
;cir.method
;