Inefficient code generated for "if (Operator *O = dyn_cast<Operator>(V)) { foo(O); }"

|  |  |
| --- | --- |
| Bugzilla Link | [28430](https://llvm.org/bz28430) |
| Version | trunk |
| OS | All |
| CC | @zmodem,@hfinkel,@slacka,@rnk,@rotateright |

## Extended Description 
I would expect the following two functions to compile to (roughly) the same
x86-64 machine code.
```c++
  void foo(Operator *O);

  void test1(Value *V) {
    if (Operator *O = dyn_cast<Operator>(V))
      foo(O);
  }   
  void test2(Value *V) {
    if (isa<Operator>(V))
      foo(cast<Operator>(V));
  }
```
But they don't:
```asm
_Z5test1PN4llvm5ValueE:
        mov     cl, byte ptr [rdi + 24]
        cmp     cl, 23
        seta    al
        cmp     cl, 10
        setne   cl
        test    rdi, rdi
        je      .LBB0_2
        xor     al, cl
        jne     .LBB0_2
        jmp     _Z3fooPN4llvm8OperatorE@PLT # TAILCALL
.LBB0_2:
        ret

_Z5test2PN4llvm5ValueE:
        mov     al, byte ptr [rdi + 24]
        cmp     al, 23
        ja      .LBB0_3
        cmp     al, 10
        je      .LBB0_3
        ret
.LBB0_3:
        jmp     _Z3fooPN4llvm8OperatorE@PLT # TAILCALL
```
It seems to me that there are two problems with the code generated for test1:

First, the null check on `rdi` shouldn't be necessary. The `isa` call inside
`dyn_cast` dereferences the pointer, so (I think?) we should be able to assume
that it's not null.  (Adding an explicit `__builtin_assume` to the implementatin
of `dyn_cast` gets rid of this check.)

Second, although I'm not enough of an x86 expert to say whether the two-jump
version (as in `test2`) or one-jump version (as in `test1`) is preferable, since
they do the same thing, presumably we should prefer one over the other, if only
due to code size differences.  (I also wouldn't be surprised if there were
performance differences.)

gcc 4.8.4 emits the same code for both functions, and it's basically identical
to the code clang generates for `test2`:
```asm
        movzx   eax, BYTE PTR 24[rdi]
        cmp     al, 23
        ja      .L9
        cmp     al, 10
        je      .L9
        rep ret
.L9:
        jmp     _Z3fooPN4llvm8OperatorE@PLT
```
Here are the `-print-after-all -debug` logs for the two functions:

`test1`: https://gist.github.com/fbbabc5299aca5a275b2b38a6d9d217f
`test2`: https://gist.github.com/24dee5133a003c12b0c127e5e5f2a3c9

Since we use this idiom everywhere in clang and llvm, it seems like a good bet
that we'd get a performance improvement from improving the codegen here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Inefficient code generated for "if (Operator *O = dyn_cast<Operator>(V)) { foo(O); }" #28804

Extended Description

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development


Bugzilla Link	28430
Version	trunk
OS	All
CC	@zmodem,@hfinkel,@slacka,@rnk,@rotateright

Inefficient code generated for "if (Operator *O = dyn_cast<Operator>(V)) { foo(O); }" #28804

Description

Extended Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions