-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[flang] Simplify the comparison of characters #154593
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Because character comparison appends spaces to the shorter character, calls to trim() that are used only in the comparison can be eliminated. Example: `trim(x) == trim(y)` can be simplified to `x == y` This makes 527.cam4_r about 3% faster, measured on Neoverse V2. This patch implements the optimization above in a new pass: ExpressionSimplification. Although no other expression simplifications are planned at the moment, they could be easily added to the new pass. The ExpressionSimplification pass runs early in the HLFIR pipeline, to make it easy to identify expressions before any transformations occur.
|
@llvm/pr-subscribers-flang-driver @llvm/pr-subscribers-flang-fir-hlfir Author: Leandro Lupori (luporl) ChangesBecause character comparison appends spaces to the shorter character, Example: This makes 527.cam4_r about 3% faster, measured on Neoverse V2. This patch implements the optimization above in a new pass: The ExpressionSimplification pass runs early in the HLFIR pipeline, to Patch is 25.09 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/154593.diff 6 Files Affected:
diff --git a/flang/include/flang/Optimizer/HLFIR/Passes.td b/flang/include/flang/Optimizer/HLFIR/Passes.td
index 04d7aec5fe489..43646c6c05ead 100644
--- a/flang/include/flang/Optimizer/HLFIR/Passes.td
+++ b/flang/include/flang/Optimizer/HLFIR/Passes.td
@@ -61,6 +61,11 @@ def SimplifyHLFIRIntrinsics : Pass<"simplify-hlfir-intrinsics"> {
"the hlfir.matmul.">];
}
+def ExpressionSimplification
+ : Pass<"expression-simplification", "::mlir::ModuleOp"> {
+ let summary = "Simplify Fortran expressions";
+}
+
def InlineElementals : Pass<"inline-elementals"> {
let summary = "Inline chained hlfir.elemental operations";
}
diff --git a/flang/lib/Optimizer/HLFIR/Transforms/CMakeLists.txt b/flang/lib/Optimizer/HLFIR/Transforms/CMakeLists.txt
index cc74273d9c5d9..28153076df0fb 100644
--- a/flang/lib/Optimizer/HLFIR/Transforms/CMakeLists.txt
+++ b/flang/lib/Optimizer/HLFIR/Transforms/CMakeLists.txt
@@ -3,6 +3,7 @@ get_property(dialect_libs GLOBAL PROPERTY MLIR_DIALECT_LIBS)
add_flang_library(HLFIRTransforms
BufferizeHLFIR.cpp
ConvertToFIR.cpp
+ ExpressionSimplification.cpp
InlineElementals.cpp
InlineHLFIRAssign.cpp
InlineHLFIRCopyIn.cpp
diff --git a/flang/lib/Optimizer/HLFIR/Transforms/ExpressionSimplification.cpp b/flang/lib/Optimizer/HLFIR/Transforms/ExpressionSimplification.cpp
new file mode 100644
index 0000000000000..d9e73482020f8
--- /dev/null
+++ b/flang/lib/Optimizer/HLFIR/Transforms/ExpressionSimplification.cpp
@@ -0,0 +1,294 @@
+#include "flang/Optimizer/Builder/FIRBuilder.h"
+#include "flang/Optimizer/HLFIR/HLFIROps.h"
+#include "flang/Optimizer/HLFIR/Passes.h"
+#include "flang/Runtime/entry-names.h"
+#include "llvm/Support/DebugLog.h"
+
+namespace hlfir {
+#define GEN_PASS_DEF_EXPRESSIONSIMPLIFICATION
+#include "flang/Optimizer/HLFIR/Passes.h.inc"
+} // namespace hlfir
+
+#define DEBUG_TYPE "expression-simplification"
+
+#define INDENT(n) std::string((n) * 2, ' ')
+
+static void removeOperands(mlir::Operation *op, int nestLevel);
+
+static void removeOp(mlir::Operation *op, int parentUses, int nestLevel) {
+ int opUses = std::distance(op->getUses().begin(), op->getUses().end());
+
+ if (opUses <= parentUses) {
+ LDBG() << INDENT(nestLevel) << "remove: " << *op;
+ removeOperands(op, nestLevel);
+ op->dropAllReferences();
+ op->dropAllUses();
+ op->erase();
+ }
+}
+
+static void removeOp(mlir::Operation *op) {
+ removeOp(op, /*parentUses=*/0, /*nestLevel=*/0);
+ LDBG();
+}
+
+static void removeOperands(mlir::Operation *op, int nestLevel) {
+ for (mlir::Value operand : op->getOperands()) {
+ if (!operand)
+ // Already removed.
+ continue;
+ if (mlir::Operation *operandOp = operand.getDefiningOp()) {
+ int uses = 0;
+ for (mlir::Operation *user : operandOp->getUsers())
+ if (user == op)
+ ++uses;
+ removeOp(operandOp, uses, nestLevel + 1);
+ }
+ }
+}
+
+template <typename UserOp>
+static UserOp getFirstUser(mlir::Operation *op) {
+ auto it = op->user_begin(), end = op->user_end(), prev = it;
+ for (; it != end; prev = it++)
+ ;
+ if (prev != end)
+ if (auto userOp = mlir::dyn_cast<UserOp>(*prev))
+ return userOp;
+ return {};
+}
+
+template <typename UserOp>
+static UserOp getLastUser(mlir::Operation *op) {
+ if (!op->getUsers().empty())
+ if (auto userOp = mlir::dyn_cast<UserOp>(*op->user_begin()))
+ return userOp;
+ return {};
+}
+
+template <typename UserOp>
+static UserOp getPreviousUser(mlir::Operation *op, mlir::Operation *curUser) {
+ for (auto user = op->user_begin(), end = op->user_end(); user != end;
+ ++user) {
+ if (*user == curUser) {
+ if (++user == end)
+ break;
+ if (auto userOp = mlir::dyn_cast<UserOp>(*user))
+ return userOp;
+ break;
+ }
+ }
+ return {};
+}
+
+// Check if operation has the expected number of uses.
+static bool expectUses(mlir::Operation *op, int expUses) {
+ int uses = std::distance(op->use_begin(), op->use_end());
+ if (uses != expUses) {
+ LDBG() << "expectUses: expected " << expUses << ", got " << uses;
+ for (mlir::Operation *user : op->getUsers())
+ LDBG() << "\t" << *user;
+ }
+ return uses == expUses;
+}
+
+template <typename Op>
+static Op expectOp(mlir::Value val) {
+ if (Op op = mlir::dyn_cast<Op>(val.getDefiningOp())) {
+ LDBG() << op;
+ return op;
+ }
+ return {};
+}
+
+static mlir::Value findBoxDef(mlir::Value val) {
+ if (auto op = expectOp<fir::ConvertOp>(val)) {
+ assert(op->getOperands().size() != 0);
+ if (auto boxOp = expectOp<fir::EmboxOp>(op->getOperand(0)))
+ return boxOp.getResult();
+ }
+ return {};
+}
+
+namespace {
+
+// This class analyzes a trimmed character and removes the call to trim() (and
+// its dependencies) if its result is not used elsewhere.
+class TrimRemover {
+public:
+ TrimRemover(fir::FirOpBuilder &builder, mlir::Value charVal,
+ mlir::Value charLenVal)
+ : builder(builder), charVal(charVal), charLenVal(charLenVal) {}
+ TrimRemover(const TrimRemover &) = delete;
+
+ bool charWasTrimmed();
+ void removeTrim();
+
+private:
+ // Class inputs.
+ fir::FirOpBuilder &builder;
+ mlir::Value charVal;
+ mlir::Value charLenVal;
+ // Operations found while analyzing inputs, that are needed when removing
+ // the trim call.
+ hlfir::DeclareOp charDeclOp; // Trim input character.
+ fir::CallOp trimCallOp; // Trim call.
+ hlfir::EndAssociateOp endAssocOp; // Trim result association.
+ hlfir::DestroyOp destroyExprOp; // Trim result expression.
+ fir::AllocaOp allocaOp; // Trim result alloca.
+};
+
+bool TrimRemover::charWasTrimmed() {
+ LDBG() << "\ncharWasTrimmed: " << charVal;
+
+ // Get the declare and expression operations associated to `charVal`, that
+ // correspond to the result of trim.
+ auto charCvtOp = expectOp<fir::ConvertOp>(charVal);
+ auto charLenCvtOp = expectOp<fir::ConvertOp>(charLenVal);
+ if (!charCvtOp || !charLenCvtOp || !expectUses(charCvtOp, 1) ||
+ !expectUses(charLenCvtOp, 1))
+ return false;
+ auto assocOp = expectOp<hlfir::AssociateOp>(charCvtOp.getOperand());
+ if (!assocOp || !expectUses(assocOp, 3)) // end_associate uses assocOp twice
+ return false;
+ endAssocOp = getLastUser<hlfir::EndAssociateOp>(assocOp);
+ if (!endAssocOp)
+ return false;
+ auto asExprOp = expectOp<hlfir::AsExprOp>(assocOp.getOperand(0));
+ if (!asExprOp || !expectUses(asExprOp, 2))
+ return false;
+ destroyExprOp = getLastUser<hlfir::DestroyOp>(asExprOp);
+ if (!destroyExprOp)
+ return false;
+ auto declOp = expectOp<hlfir::DeclareOp>(asExprOp.getOperand(0));
+ if (!declOp || !expectUses(declOp, 1))
+ return false;
+
+ // Get associated box and alloca.
+ auto boxAddrOp = expectOp<fir::BoxAddrOp>(declOp.getMemref());
+ if (!boxAddrOp || !expectUses(boxAddrOp, 1))
+ return false;
+ auto loadOp = expectOp<fir::LoadOp>(boxAddrOp.getOperand());
+ if (!loadOp || !getFirstUser<fir::BoxEleSizeOp>(loadOp) ||
+ !expectUses(loadOp, 2))
+ return false;
+ // The allocaOp is initialized by a store.
+ // Besides its use by the store and loadOp, it's also converted and used by
+ // the trim call.
+ allocaOp = expectOp<fir::AllocaOp>(loadOp.getMemref());
+ if (!allocaOp || !getFirstUser<fir::StoreOp>(allocaOp) ||
+ !expectUses(allocaOp, 3))
+ return false;
+
+ // Find the trim call that uses the allocaOp.
+ if (auto userOp = getPreviousUser<fir::ConvertOp>(allocaOp, loadOp))
+ if (userOp->hasOneUse())
+ trimCallOp = mlir::dyn_cast<fir::CallOp>(*userOp->user_begin());
+ if (!trimCallOp)
+ return false;
+ LDBG() << "call: " << trimCallOp;
+ llvm::StringRef calleeName =
+ trimCallOp.getCalleeAttr().getLeafReference().getValue();
+ LDBG() << "callee: " << calleeName;
+ if (calleeName != RTNAME_STRING(Trim))
+ return false;
+
+ // Get trim input character.
+ auto chrEmboxOp =
+ expectOp<fir::EmboxOp>(findBoxDef(trimCallOp.getOperand(1)));
+ if (!chrEmboxOp)
+ return false;
+ charDeclOp = expectOp<hlfir::DeclareOp>(chrEmboxOp.getMemref());
+ if (!charDeclOp)
+ return false;
+
+ // Found everything as expected.
+ return true;
+}
+
+void TrimRemover::removeTrim() {
+ LDBG() << "\nremoveTrim:";
+
+ auto charCvtOp = expectOp<fir::ConvertOp>(charVal);
+ auto charLenCvtOp = expectOp<fir::ConvertOp>(charLenVal);
+ assert(charCvtOp && charLenCvtOp);
+
+ // Replace trim output char with its input.
+ mlir::Location loc = charVal.getLoc();
+ auto cvtOp = fir::ConvertOp::create(builder, loc, charCvtOp.getType(),
+ charDeclOp.getOriginalBase());
+ charCvtOp.replaceAllUsesWith(cvtOp.getResult());
+
+ // Replace trim output length with its input.
+ mlir::Value chrLen = charDeclOp.getTypeparams().back();
+ auto cvtLenOp =
+ fir::ConvertOp::create(builder, loc, charLenCvtOp.getType(), chrLen);
+ charLenCvtOp.replaceAllUsesWith(cvtLenOp.getResult());
+
+ // Remove trim call and old conversions.
+ removeOp(charCvtOp);
+ removeOp(charLenCvtOp);
+ removeOp(trimCallOp);
+ // Remove association and expression.
+ removeOp(endAssocOp);
+ removeOp(destroyExprOp);
+ // The only remaining use of allocaOp should be its initialization.
+ // Remove the store and alloca operations.
+ if (auto userOp = getLastUser<fir::StoreOp>(allocaOp))
+ removeOp(userOp);
+}
+
+} // namespace
+
+namespace {
+
+class ExpressionSimplification
+ : public hlfir::impl::ExpressionSimplificationBase<
+ ExpressionSimplification> {
+public:
+ using ExpressionSimplificationBase<
+ ExpressionSimplification>::ExpressionSimplificationBase;
+
+ void runOnOperation() override;
+
+private:
+ // Simplify character comparisons.
+ // Because character comparison appends spaces to the shorter character,
+ // calls to trim() that are used only in the comparison can be eliminated.
+ //
+ // Example:
+ // `trim(x) == trim(y)`
+ // can be simplified to
+ // `x == y`
+ void simplifyCharCompare(fir::CallOp call, const fir::KindMapping &kindMap);
+};
+
+void ExpressionSimplification::simplifyCharCompare(
+ fir::CallOp call, const fir::KindMapping &kindMap) {
+ fir::FirOpBuilder builder{call, kindMap};
+ mlir::Operation::operand_range args = call.getArgs();
+ TrimRemover lhsTrimRem(builder, args[0], args[2]);
+ TrimRemover rhsTrimRem(builder, args[1], args[3]);
+
+ if (lhsTrimRem.charWasTrimmed())
+ lhsTrimRem.removeTrim();
+ if (rhsTrimRem.charWasTrimmed())
+ rhsTrimRem.removeTrim();
+}
+
+void ExpressionSimplification::runOnOperation() {
+ mlir::ModuleOp module = getOperation();
+ fir::KindMapping kindMap = fir::getKindMapping(module);
+ module.walk([&](mlir::Operation *op) {
+ if (auto call = mlir::dyn_cast<fir::CallOp>(op)) {
+ if (mlir::SymbolRefAttr callee = call.getCalleeAttr()) {
+ mlir::StringRef funcName = callee.getLeafReference().getValue();
+ if (funcName.starts_with(RTNAME_STRING(CharacterCompareScalar))) {
+ simplifyCharCompare(call, kindMap);
+ }
+ }
+ }
+ });
+}
+
+} // namespace
diff --git a/flang/lib/Optimizer/Passes/Pipelines.cpp b/flang/lib/Optimizer/Passes/Pipelines.cpp
index ca8e820608688..3b8f3e3add6e9 100644
--- a/flang/lib/Optimizer/Passes/Pipelines.cpp
+++ b/flang/lib/Optimizer/Passes/Pipelines.cpp
@@ -244,6 +244,9 @@ void createDefaultFIROptimizerPassPipeline(mlir::PassManager &pm,
/// passes pipeline
void createHLFIRToFIRPassPipeline(mlir::PassManager &pm, bool enableOpenMP,
llvm::OptimizationLevel optLevel) {
+ if (optLevel.getSizeLevel() > 0 || optLevel.getSpeedupLevel() > 0) {
+ pm.addPass(hlfir::createExpressionSimplification());
+ }
if (optLevel.isOptimizingForSpeed()) {
addCanonicalizerPassWithoutRegionSimplification(pm);
addNestedPassToAllTopLevelOperations<PassConstructor>(
diff --git a/flang/test/Driver/mlir-pass-pipeline.f90 b/flang/test/Driver/mlir-pass-pipeline.f90
index 0bcd055a84b87..3d19a0a5700f5 100644
--- a/flang/test/Driver/mlir-pass-pipeline.f90
+++ b/flang/test/Driver/mlir-pass-pipeline.f90
@@ -15,6 +15,7 @@
! ALL: Pass statistics report
! ALL: Fortran::lower::VerifierPass
+! O2-NEXT: ExpressionSimplification
! O2-NEXT: Canonicalizer
! ALL: Pipeline Collection : ['fir.global', 'func.func', 'omp.declare_reduction', 'omp.private']
! ALL-NEXT:'fir.global' Pipeline
diff --git a/flang/test/HLFIR/expression-simplification.fir b/flang/test/HLFIR/expression-simplification.fir
new file mode 100644
index 0000000000000..28e17de381461
--- /dev/null
+++ b/flang/test/HLFIR/expression-simplification.fir
@@ -0,0 +1,173 @@
+// RUN: fir-opt %s --expression-simplification | FileCheck %s
+
+// Test removal of trim() calls.
+
+// logical function cmp(x, y)
+// character(*) :: x, y
+// cmp = trim(x) == trim(y)
+// end function
+
+func.func @test_char_cmp(%arg0: !fir.boxchar<1> {fir.bindc_name = "x"},
+ %arg1: !fir.boxchar<1> {fir.bindc_name = "y"}) -> !fir.logical<4> {
+ %0 = fir.alloca !fir.box<!fir.heap<!fir.char<1,?>>>
+ %1 = fir.alloca !fir.box<!fir.heap<!fir.char<1,?>>>
+ %2 = fir.dummy_scope : !fir.dscope
+ %3 = fir.alloca !fir.logical<4> {bindc_name = "cmp", uniq_name = "_QFcmpEcmp"}
+ %4:2 = hlfir.declare %3 {uniq_name = "_QFcmpEcmp"} : (!fir.ref<!fir.logical<4>>) -> (!fir.ref<!fir.logical<4>>, !fir.ref<!fir.logical<4>>)
+ %5:2 = fir.unboxchar %arg0 : (!fir.boxchar<1>) -> (!fir.ref<!fir.char<1,?>>, index)
+ %6:2 = hlfir.declare %5#0 typeparams %5#1 dummy_scope %2 {uniq_name = "_QFcmpEx"} : (!fir.ref<!fir.char<1,?>>, index, !fir.dscope) -> (!fir.boxchar<1>, !fir.ref<!fir.char<1,?>>)
+ %7:2 = fir.unboxchar %arg1 : (!fir.boxchar<1>) -> (!fir.ref<!fir.char<1,?>>, index)
+ %8:2 = hlfir.declare %7#0 typeparams %7#1 dummy_scope %2 {uniq_name = "_QFcmpEy"} : (!fir.ref<!fir.char<1,?>>, index, !fir.dscope) -> (!fir.boxchar<1>, !fir.ref<!fir.char<1,?>>)
+ %9 = fir.embox %6#1 typeparams %5#1 : (!fir.ref<!fir.char<1,?>>, index) -> !fir.box<!fir.char<1,?>>
+ %10 = fir.zero_bits !fir.heap<!fir.char<1,?>>
+ %c0 = arith.constant 0 : index
+ %11 = fir.embox %10 typeparams %c0 : (!fir.heap<!fir.char<1,?>>, index) -> !fir.box<!fir.heap<!fir.char<1,?>>>
+ fir.store %11 to %1 : !fir.ref<!fir.box<!fir.heap<!fir.char<1,?>>>>
+ %12 = fir.address_of(@_QQclX710b72634dc58109795123e3cbc3b5ed) : !fir.ref<!fir.char<1,65>>
+ %c5_i32 = arith.constant 5 : i32
+ %13 = fir.convert %1 : (!fir.ref<!fir.box<!fir.heap<!fir.char<1,?>>>>) -> !fir.ref<!fir.box<none>>
+ %14 = fir.convert %9 : (!fir.box<!fir.char<1,?>>) -> !fir.box<none>
+ %15 = fir.convert %12 : (!fir.ref<!fir.char<1,65>>) -> !fir.ref<i8>
+ fir.call @_FortranATrim(%13, %14, %15, %c5_i32) fastmath<contract> : (!fir.ref<!fir.box<none>>, !fir.box<none>, !fir.ref<i8>, i32) -> ()
+ %16 = fir.load %1 : !fir.ref<!fir.box<!fir.heap<!fir.char<1,?>>>>
+ %17 = fir.box_elesize %16 : (!fir.box<!fir.heap<!fir.char<1,?>>>) -> index
+ %18 = fir.box_addr %16 : (!fir.box<!fir.heap<!fir.char<1,?>>>) -> !fir.heap<!fir.char<1,?>>
+ %19:2 = hlfir.declare %18 typeparams %17 {uniq_name = ".tmp.intrinsic_result"} : (!fir.heap<!fir.char<1,?>>, index) -> (!fir.boxchar<1>, !fir.heap<!fir.char<1,?>>)
+ %true = arith.constant true
+ %20 = hlfir.as_expr %19#0 move %true : (!fir.boxchar<1>, i1) -> !hlfir.expr<!fir.char<1,?>>
+ %21 = fir.embox %8#1 typeparams %7#1 : (!fir.ref<!fir.char<1,?>>, index) -> !fir.box<!fir.char<1,?>>
+ %22 = fir.zero_bits !fir.heap<!fir.char<1,?>>
+ %c0_0 = arith.constant 0 : index
+ %23 = fir.embox %22 typeparams %c0_0 : (!fir.heap<!fir.char<1,?>>, index) -> !fir.box<!fir.heap<!fir.char<1,?>>>
+ fir.store %23 to %0 : !fir.ref<!fir.box<!fir.heap<!fir.char<1,?>>>>
+ %24 = fir.address_of(@_QQclX710b72634dc58109795123e3cbc3b5ed) : !fir.ref<!fir.char<1,65>>
+ %c5_i32_1 = arith.constant 5 : i32
+ %25 = fir.convert %0 : (!fir.ref<!fir.box<!fir.heap<!fir.char<1,?>>>>) -> !fir.ref<!fir.box<none>>
+ %26 = fir.convert %21 : (!fir.box<!fir.char<1,?>>) -> !fir.box<none>
+ %27 = fir.convert %24 : (!fir.ref<!fir.char<1,65>>) -> !fir.ref<i8>
+ fir.call @_FortranATrim(%25, %26, %27, %c5_i32_1) fastmath<contract> : (!fir.ref<!fir.box<none>>, !fir.box<none>, !fir.ref<i8>, i32) -> ()
+ %28 = fir.load %0 : !fir.ref<!fir.box<!fir.heap<!fir.char<1,?>>>>
+ %29 = fir.box_elesize %28 : (!fir.box<!fir.heap<!fir.char<1,?>>>) -> index
+ %30 = fir.box_addr %28 : (!fir.box<!fir.heap<!fir.char<1,?>>>) -> !fir.heap<!fir.char<1,?>>
+ %31:2 = hlfir.declare %30 typeparams %29 {uniq_name = ".tmp.intrinsic_result"} : (!fir.heap<!fir.char<1,?>>, index) -> (!fir.boxchar<1>, !fir.heap<!fir.char<1,?>>)
+ %true_2 = arith.constant true
+ %32 = hlfir.as_expr %31#0 move %true_2 : (!fir.boxchar<1>, i1) -> !hlfir.expr<!fir.char<1,?>>
+ %33:3 = hlfir.associate %20 typeparams %17 {adapt.valuebyref} : (!hlfir.expr<!fir.char<1,?>>, index) -> (!fir.boxchar<1>, !fir.ref<!fir.char<1,?>>, i1)
+ %34:3 = hlfir.associate %32 typeparams %29 {adapt.valuebyref} : (!hlfir.expr<!fir.char<1,?>>, index) -> (!fir.boxchar<1>, !fir.ref<!fir.char<1,?>>, i1)
+ %35 = fir.convert %33#1 : (!fir.ref<!fir.char<1,?>>) -> !fir.ref<i8>
+ %36 = fir.convert %34#1 : (!fir.ref<!fir.char<1,?>>) -> !fir.ref<i8>
+ %37 = fir.convert %17 : (index) -> i64
+ %38 = fir.convert %29 : (index) -> i64
+ %39 = fir.call @_FortranACharacterCompareScalar1(%35, %36, %37, %38) fastmath<contract> : (!fir.ref<i8>, !fir.ref<i8>, i64, i64) -> i32
+ %c0_i32 = arith.constant 0 : i32
+ %40 = arith.cmpi eq, %39, %c0_i32 : i32
+ hlfir.end_associate %33#1, %33#2 : !fir.ref<!fir.char<1,?>>, i1
+ hlfir.end_associate %34#1, %34#2 : !fir.ref<!fir.char<1,?>>, i1
+ %41 = fir.convert %40 : (i1) -> !fir.logical<4>
+ hlfir.assign %41 to %4#0 : !fir.logical<4>, !fir.ref<!fir.logical<4>>
+ hlfir.destroy %32 : !hlfir.expr<!fir.char<1,?>>
+ hlfir.destroy %20 : !hlfir.expr<!fir.char<1,?>>
+ %42 = fir.load %4#0 : !fir.ref<!fir.logical<4>>
+ return %42 : !fir.logical<4>
+}
+
+// CHECK-LABEL: func.func @test_char_cmp(
+// CHECK-SAME: %[[VAL_4:.*]]: !fir.boxchar<1> {fir.bindc_name = "x"},
+// CHECK-SAME: %[[VAL_7:.*]]: !fir.boxchar<1> {fir.bindc_name = "y"}) -> !fir.logical<4> {
+// CHECK: %[[VAL_0:.*]] = fir.dummy_scope : !fir.dscope
+// CHECK: %[[VAL_1:.*]] = fir.alloca !fir.logical<4> {bindc_name = "cmp", uniq_name = "_QFcmpEcmp"}
+// CHECK: %[[VAL_2:.*]]:2 = hlfir.declare %[[VAL_1]] {uniq_name = "_QFcmpEcmp"} : (!fir.ref<!fir.logical<4>>) -> (!fir.ref<!fir.logical<4>>, !fir.ref<!fir.logical<4>>)
+// CHECK: %[[VAL_3:.*]]:2 = fir.unboxchar %[[VAL_4]] : (!fir.boxchar<1>) -> (!fir.ref<!fir.char<1,?>>, index)
+// CHECK: %[[VAL_5:.*]]:2 = hlfir.declare %[[VAL_3]]#0 typeparams %[[VAL_3]]#1 dummy_scope %[[VAL_0]] {uniq_name = "_QFcmpEx"} : (!fir.ref<!fir.char<1,?>>, index, !fir.dscope) -> (!fir.boxchar<1>, !fir.ref<!fir.char<1,?>>)
+// CHECK: %[[VAL_6:.*]]:2 = fir.unboxchar %[[VAL_7]] : (!fir.boxchar<1>) -> (!fir.ref<!fir.char<1,?>>, index)
+// CHECK: %[[VAL_8:.*]]:2 = hlfir.declare %[[VAL_6]]#0 typeparams %[[VAL_6]]#1 dummy_scope %[[VAL_0]] {uniq_name = "_QFcmpEy"} : (!fir.ref<!fir.char<1,?>>, index, !fir.dscope) -> (!fir.boxchar<1>, !fir.ref<!fir.char<1,?>>)
+// CHECK: %[[VAL_9:.*]] = fir.convert %[[VAL_5]]#1 : (!fir.ref<!fir.char<1,?>>) -> !fir.ref<i8>
+// CHECK: %[[VAL_10:.*]] = fir.convert %[[VAL_3]]#1 : (index) -> i64
+// CHECK: %[[VAL_11:.*]] = fir.convert %[[VAL_8]]#1 : (!fir.ref<!fir.char<1,?>>) -> !fir.ref<i8>
+// CHECK: %[[VAL_12:.*]] = fir.convert %[[VAL_6]]#1 : (index) -> i64
+// CHECK: %[[VAL_13:.*]] = fir.call @_FortranACharacterCompareScalar1(%[[VAL_9]], %[[VAL_11]], %[[VAL_10]], %[[VAL_12]]) fastmath<contract> : (!fir.ref<i8>, !fir.ref<i8>, i64, i64) -> i32
+// CHECK: %[[VAL_14:.*]] = arith.constant 0 : i32
+// CHECK: %[[VAL_15:.*]] = arith.cmpi eq, %[[VAL_13]], %[[VAL_14]] : i32
+// CHECK: %[[VAL_16:.*]] = fir.convert %[[VAL_15]] : (i1) -> !fir.logical<4>
+// CHECK: hlfir.assign %[[VAL_16]] to %[[VAL_2]]#0 : !fir.logical<4>, !fir.ref<!fir.logical<4>>
+// CHECK: %[[VAL_17:.*]] = fir.load %[[VAL_2]]#0 : !fir.ref<!fir.logical<4>>
+// CHECK: return %[[VAL_17]] : !fir.logical<4>
+// CHECK: }
+
+// Check that trim() is not removed when its result is stored.
+
+// logical function eq_use3(x, y) result(res)
+// char...
[truncated]
|
vzakhari
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the changes!
I think it is better to introduce an hlfir.char_trim operation and then recognize the same pattern using the new operation rather than recognizing the runtime calls.
FYI, @valerydmit is wroking on adding an HLFIR operation for scalar character comparison, so we won't have _FortranACharacterCompareScalar1 soon.
|
Thanks for the quick review and suggestions! I'll try to add an |
|
Sounds good. Thank you! |
flang/lib/Optimizer/HLFIR/Transforms/ExpressionSimplification.cpp
Outdated
Show resolved
Hide resolved
flang/lib/Optimizer/HLFIR/Transforms/ExpressionSimplification.cpp
Outdated
Show resolved
Hide resolved
Fortran character trim is currently lowered directly into a runtime call, which makes it more complex to simplify expressions using it. With this patch trim is first lowered into an hlfir.char_trim operation, that is only later transformed into a runtime call. This makes it easier to remove unnecessary calls to trim, as proposed in llvm#154593.
Fortran character trim is currently lowered directly into a runtime call, which makes it more complex to simplify expressions using it. With this patch trim is first lowered into an hlfir.char_trim operation, that is only later transformed into a runtime call. This makes it easier to remove unnecessary calls to trim, as proposed in #154593.
vzakhari
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the update! It looks much clearer now!
I have a couple of suggestions.
flang/lib/Optimizer/HLFIR/Transforms/ExpressionSimplification.cpp
Outdated
Show resolved
Hide resolved
|
|
||
| void ExpressionSimplification::runOnOperation() { | ||
| mlir::ModuleOp module = getOperation(); | ||
| module.walk([&](hlfir::CmpCharOp cmpChar) { simplifyCmpChar(cmpChar); }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would recommend walking all trim operations instead of the comparisons, e.g. consider the case where the same trim result is used in multiple character comparisons. We may not have such cases now, but the CSE may be able to optimize it in codes like (trim(x) == trim(y)).and.(trim(x) == trim(z)).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, thanks for the suggestion, it simplified the code even further.
At least for now the pass is able to erase all trims in (trim(x) == trim(y)).and.(trim(x) == trim(z)).
But for each trim an hlfir.char_trim is generated, no result is currently reused.
If this becomes the case in the future, we would need to check all users to see if they are composed only of hlfir.cmpchars and one hlfir.destroy. To keep the code simple, I have not made this change.
Let me know if you think it would be better to implement it already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I missed the notification. I will review now.
tblah
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very clean now. Good stuff.
vzakhari
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, with one comment for the LIT test.
jeanPerier
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, looks great.
|
Thank you all for the reviews. |
Because character comparison appends spaces to the shorter character, calls to trim() that are used only in the comparison can be eliminated. Example: `trim(x) == trim(y)` can be simplified to `x == y` This makes 527.cam4_r about 3% faster, measured on Neoverse V2. This patch implements the optimization above in a new pass: ExpressionSimplification. Although no other expression simplifications are planned at the moment, they could be easily added to the new pass. The ExpressionSimplification pass runs early in the HLFIR pipeline, to make it easy to identify expressions before any transformations occur.
Fortran character trim is currently lowered directly into a runtime call, which makes it more complex to simplify expressions using it. With this patch trim is first lowered into an hlfir.char_trim operation, that is only later transformed into a runtime call. This makes it easier to remove unnecessary calls to trim, as proposed in llvm#154593.
Because character comparison appends spaces to the shorter character, calls to trim() that are used only in the comparison can be eliminated. Example: `trim(x) == trim(y)` can be simplified to `x == y` This makes 527.cam4_r about 3% faster, measured on Neoverse V2. This patch implements the optimization above in a new pass: ExpressionSimplification. Although no other expression simplifications are planned at the moment, they could be easily added to the new pass. The ExpressionSimplification pass runs early in the HLFIR pipeline, to make it easy to identify expressions before any transformations occur.
Because character comparison appends spaces to the shorter character,
calls to trim() that are used only in the comparison can be eliminated.
Example:
trim(x) == trim(y)can be simplified to
x == yThis makes 527.cam4_r about 3% faster, measured on Neoverse V2.
This patch implements the optimization above in a new pass:
ExpressionSimplification. Although no other expression simplifications
are planned at the moment, they could be easily added to the new pass.
The ExpressionSimplification pass runs early in the HLFIR pipeline, to
make it easy to identify expressions before any transformations occur.