-
Notifications
You must be signed in to change notification settings - Fork 15.4k
Description
| Bugzilla Link | 22893 |
| Version | trunk |
| OS | All |
| CC | @chandlerc,@jmolloy |
Extended Description
Consider the following code:
int f (int n, int **g_h1) {
int *h1 = *g_h1;
int i1, s1=0;
*h1++ = 2;
for (i1 = 2; i1 < n; ++i1) {
s1 += *h1++;
}
*g_h1 = h1;
return s1;
}
We end up with a PHI recurrence for h1:
entry:
h1_next = gep h1, 1
loop:
h1_ind = phi h1_next, h1_ind_next
...
h1_ind_next = gep h1_ind, 1
br loop
GEP merging sees this, and converts it into:
entry:
h1_next = gep h1, 1
loop:
h1_prev = phi h1_this, h1
h1_this = phi h1_ind_next, h1_next
...
h1_ind_next = gep h1_prev, 2
An extra PHI node has been created to hold the value of h1 in the previous iteration, and that is being used to calculate the next iteration. This is obviously crazy.
The faulty transform is in InstructionCombining.cpp:1392. I see two things iffy here:
- The "Is GEP similar?" logic allows two GEPs that differ only by the base operand. I can't imagine that's particularly useful especially for trivial GEPs.
- The log message clearly says it's trying to deal with merge PHIs / diamond PHIs, rather than loop backedge PHIs, but it acts on loop backedge PHIs too.
Perhaps (2) above is wrong, and the cost model "is this newly created GEP going to be merged efficiently?" is wrong? I'm not sure.
Generated IR shown below for posterity.
define i32 @f(i32 %n, i32** nocapture %g_h1, i32** nocapture readnone %g_h2, i32** nocapture readnone %g_c) #0 {
entry:
%0 = load i32*, i32** %g_h1, align 8, !tbaa !2
store i32 2, i32* %0, align 4, !tbaa !6
%h1.06 = getelementptr inbounds i32, i32* %0, i64 1
%cmp7 = icmp sgt i32 %n, 2
br i1 %cmp7, label %for.body.preheader, label %for.end
for.body.preheader: ; preds = %entry
br label %for.body
for.body: ; preds = %for.body.preheader, %for.body
%1 = phi i32* [ %h1.010, %for.body ], [ %0, %for.body.preheader ]
%h1.010 = phi i32* [ %h1.0, %for.body ], [ %h1.06, %for.body.preheader ]
%s1.09 = phi i32 [ %add, %for.body ], [ 0, %for.body.preheader ]
%i1.08 = phi i32 [ %inc, %for.body ], [ 2, %for.body.preheader ]
%2 = load i32, i32* %h1.010, align 4, !tbaa !6
%add = add nsw i32 %2, %s1.09
%inc = add nuw nsw i32 %i1.08, 1
%h1.0 = getelementptr inbounds i32, i32* %1, i64 2
%exitcond = icmp eq i32 %inc, %n
br i1 %exitcond, label %for.end.loopexit, label %for.body
for.end.loopexit: ; preds = %for.body
%h1.0.lcssa15 = phi i32* [ %h1.0, %for.body ]
%add.lcssa = phi i32 [ %add, %for.body ]
br label %for.end
for.end: ; preds = %for.end.loopexit, %entry
%h1.0.lcssa = phi i32* [ %h1.06, %entry ], [ %h1.0.lcssa15, %for.end.loopexit ]
%s1.0.lcssa = phi i32 [ 0, %entry ], [ %add.lcssa, %for.end.loopexit ]
store i32* %h1.0.lcssa, i32** %g_h1, align 8, !tbaa !2
ret i32 %s1.0.lcssa
}