Skip to content

Conversation

@PhilippRados
Copy link
Contributor

This pull request aims to fix #117853.

General idea

It extends the existing MergeICmps pass to not only merge comparisons like: a.a == b.a && a.b == b.b but also comparisons with arbitrary constants such as a.a == 245 && a.b == -1.

Changes

Since the original pass only worked under the assumption that a single comparison could happen per basic block this had to be altered to allow multiple comparisons in a single basic block. This is because constant comparisons get flattened into a single block using a select instruction before the MergeICmps pass is run.

How it works

Whenever a matching comparison is encountered it adds it to the cmp-chain. Then when all comparisons have been found it
sorts them meaning all const-comparisons are followed by all bce-comparisons (depends on which was first). Then it goes through all comparisons in the chain and merges the ones adjacent to each other. Comparisons inside a flattened select-block can only be merged if every comparison in that block is merged (this is a rather defensive approach).

The const merging works by building a global constant struct for every merge. This needs to be a global-const in order to be constant folded by the expand-memcmp pass where it is then also removed.

Example

A single comparison chain can now be made up of both BCE-comparisons (two offsets to the same base) and const-comparisons (contiguous offsets to the same base with a constant).
This means that the expression:

struct S {
    int a;
    unsigned char b;
    unsigned char c;
    uint_16_t d;
    int e;
    int f;
    int g;
};
bool cmp(S& a, S& b) {
    return a.e == 255 && a.a == b.a && a.c == b.c && a.g == 100 && a.b == b.b && a.f == 200;
}

can be turned into:

// simplified representation, for exact implementation see llvm/test/Transforms/MergeICmps/X86/mixed-comparisons.ll
@memcmp_const_op = private constant <{ i32, i32, i32 }> <{ i32 255, i32 200, i32 100 }>
define @cmp(...) {
BB1:
   memcmp(a,b,6);
   br BB2, BB_end
BB2:
   offset = gep ptr %a, 8
   memcmp(offset, memcmp_const_op, 12)
   br BB_end
...
}

Issues in the current implementation, waiting for feedback on this

  • This implementation currently doesn't handle single select block like the one mentioned in the issue above. The only question I have is when to launch it since for this all instructions would have to be traversed again which is a slowdown for all functions running -O3. I think a good tradeoff would be to only check if the branch condition is a select and then start the optimization from there instead of checking every single instruction.
  • This pattern is pretty strict, it only works when the function returns a bool and the parameters have to dereferenceable. These restrictions basically render this optimization obsolete in C. Otherwise this implementation could also merge vectors/arrays.
  • Some testcases fail alive2 when the memory is accessed differently. I think these are known issues though ([MergeICmps] Correctness related to eq-chain comparisons #62459 and MergeICmps reorders comparisons and introduces UB #51187)

PhilippRados and others added 23 commits January 30, 2025 17:25
…; Is deleted when folded during expand-memcmp pass
@PhilippRados
Copy link
Contributor Author

@nikic Can you review this PR (since you commented on the original issue) or can you add someone else to look at it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Missed optimization: per-byte comparisons are not always combined

1 participant