Skip to content

Conversation

cabbaken
Copy link
Contributor

@cabbaken cabbaken commented Jul 15, 2025

This patch provide some basic numeric label support for IR-printing pass(StandardInstrumentations).

The name based design of this pass may be unsuitable here. Some classes in this pass are highly relying on the label name
When BB's entry has numberic lable, generateFunctionData() can't create FuncDataT with a correct EntryBlockName.
And DotCfgDiffDisplayGraph may add Nodes (in createDisplayGraph()) which don't have labels, causing the failure of label parse when emitting dot (in DotCfgDiffNode::getBodyContent()).

In BlockDataT, the Body should include the Label.
There are two possiblities:

  1. Label is the beginning of Body(Most of the time)
    such as Label = "entry:", Body = "entry:\n %a = add i32 2, 3\n ret i32 %a"
  2. Label is not the beginning of Body(When the BB has a numberic label)
    such as Label = "1:", Body = "%a = add i32 2, 3\n ret i32 %a"

With the 2nd circumstance, getNodeLabel() will return a BB without label. then the result tends to be:
image
It will lose its label and first instruction.

Fixes: #148579

Remove unnecessary comparison of Before and After `DCData`,
which will cause an error when create `DotCfgDiff`.

Signed-off-by: Ruoyu Qiu <[email protected]>
@cabbaken
Copy link
Contributor Author

I'm sorry I forgot to include a test. I will add it later.

Copy link

github-actions bot commented Jul 22, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@cabbaken cabbaken changed the title Fix clang crash with -print-changed=dot-cfg [Clang]Fix clang crash with -print-changed=dot-cfg Jul 22, 2025
Signed-off-by: Ruoyu Qiu <[email protected]>
@cabbaken cabbaken changed the title [Clang]Fix clang crash with -print-changed=dot-cfg [StandardInstrumentations]Fix clang crash with -print-changed=dot-cfg Jul 22, 2025
@cabbaken cabbaken changed the title [StandardInstrumentations]Fix clang crash with -print-changed=dot-cfg [StandardInstrumentations]Add support for numeric label Jul 22, 2025
Copy link
Contributor

@jamieschmeiser jamieschmeiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, forgot to add this before completing my review

StringRef Label = BS1.take_until([](char C) { return C == ':'; });
// drop predecessors as they can be big and are redundant
BS1 = BS1.drop_until([](char C) { return C == '\n'; }).drop_front();
if (BS1.str().find(Label) != std::string::npos)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some questions:

  1. why call str() when you can just use find on the StringRef?
  2. why are you adding a condition to dropping the predecessors? They are dropped since they can be large which can clutter up the diagram.
  3. Are you going to find Label if it has been manufactured with a number? In this case, won't you then preserve the predecessors, leading to a cluttered diagram, or is that the whole point of storing the label -- so that you know when to retain predecessors since there is no label?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I was not familiar with StringRef that time, I'm sorry I didn't look for the native implementation of find. I will correct it.
  2. With the line BS1 = BS1.drop_front(1);, we have erase all leading '\n', the line here is intend to erase the label line. To the BB without label, it will erase the first line of code. So we need to check if there is a label line here.
  3. In fact, the label number assign happens only in both saveIRBeforePass and handleIRAfterPass, called before and after a pass. The sequence of BB decides the value of label(I know the sequence may be changed in some passes, but I haven't thought of a better way yet)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. +1
  2. this will only remove the first nl, if present, not all of them.
    IIRC (and also based on the comment), the line
    BS1 = BS1.drop_until([](char C) { return C == '\n'; }).drop_front(); was intended to remove the line listing the predecessor blocks. Consider a block that has many predecessors (eg 10 predecessors). The graph shows 10 arrows in so you can easily determine all predecessors. However, now you are going to get a comment listing the 10 predecessors, making this particular node graphically large, making it difficult to see the other nodes in the graph. Plus the comment is redundant since you can graphically see the predecessors or you would be able to if the comment weren't there. Therefore, I removed it. This line would drop the comment by dropping everything until a nl is reached.
    If there were no label, I suspect that there might also not be a comment with the predecessors, resulting in the line erasing the first line of code, as you state. That is why I find this test troubling...it is dropping the line if there is a label, which is testing a symptom rather than the intention. With an added label, is there still the comment with the predecessors? If not, something else might still be dropped.
    Would it work if you instead tested for a comment since the point of the line is to remove the comment with predecessors? I think this would be clearer.
  3. Okay...Here is an idea and I don't know if would work or if it is worth the effort... Since the label is artificial and only seen here (right?), then a different ordering could trigger a spurious reporting of a change (since the label is different). Would it work if you used a negative integer for this initial label (with a comment stating it is artificial) and then ignored a difference of a negative initial label when checking if a change occurred? If I understand correctly, you would only need to check the initial label for this special case so it shouldn't be too expensive. Of course, if this label persists or is visible elsewhere, then this would not work. I wouldn't do that in this PR however.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Could you clarify what the comment here refers to? I previously thought that the in-memory LLVM IR does not have a representation for comments.
    And I'm sorry about the misunderstanding of drop_until.
  2. It's right that different ordering will cause a bad output.
    The passes do not record the label change history, so it is difficult to locate the corresponding non-label BB before and after a pass. In fact, in my patch, we will apply the numeric label to the BBs before and after the passes. So the negative integer may not work properly in this context.
    This issue may need to be addressed in a separate PR.

Copy link
Contributor

@jamieschmeiser jamieschmeiser Sep 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Consider the following program with a loop:
extern "C" int printf(const char*, ...);
int main() {
  for (int i = 0; i <= 5; ++i)
    printf("%d\n", i);
  return 0;
}

compiled with clang++ -mllvm -print-changed -O f.C. I'm using print-changed since the comment with predecessors is removed in the dot-cfg representation. Here is abbreviated final IR that I got:

bb.0.entry:
  successors: %bb.1(0x80000000); %bb.1(100.00%)
  liveins: $r29, $r30, $r31
  $r0 = MFLR implicit $lr
  ...

bb.1.for.body (align 32):
; predecessors: %bb.0, %bb.1
  successors: %bb.1(0x7c000000), %bb.2(0x04000000); %bb.1(96.88%), %bb.2(3.12%)
 ...

bb.2.for.cond.cleanup:
; predecessors: %bb.1

  $r31 = LWZ 76, $r1 :: (load (s32) from %fixed-stack.0)
...

The comment I am referring to is the ; predecessors:... line in bb.1.for.body and bb.2.for.code.cleanup. This comment is there to indicate flow of control in the IR but it is redundant in the graph since the flow of control between the blocks is shown graphically.
3. I'm not surprised that it doesn't work...It was just an idea that came to me as I was reading your comments. I agree that that should be dealt with in a different PR, if at all; there is a trade-off between effort and result.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested the code by compiling it with clang++ -mllvm -print-changed -O t.c, and there were indeed some comments in the output.
Then, I added some instrumentation:

diff --git a/llvm/lib/Passes/StandardInstrumentations.cpp b/llvm/lib/Passes/StandardInstrumentations.cpp
index f67d4de4b..307efd8f6 100644
--- a/llvm/lib/Passes/StandardInstrumentations.cpp
+++ b/llvm/lib/Passes/StandardInstrumentations.cpp
@@ -1959,8 +1960,10 @@ std::string DotCfgDiffNode::getBodyContent() const {
   if (BS.front() == '\n')
     BS1 = BS1.drop_front(1);
   // drop predecessors as they can be big and are redundant
+  dbgs() << "Before_BS1:\n" << BS1 << "\n";
   BS1 = BS1.drop_until([](char C) { return C == '\n'; }).drop_front();
+  dbgs() << "After_BS1:\n" << BS1 << "\n";

When I compiled with clang++ -mllvm -print-changed -O t.c, this code path was not triggered.
However, when compiling with clang++ -mllvm -print-changed=dot-cfg -O t.c, the output was:

Before_BS1:
for.cond.cleanup:                                 ; preds = %for.cond
  call void @llvm.lifetime.end.p0(i64 4, ptr %i) #3
  br label %for.end

After_BS1:
  call void @llvm.lifetime.end.p0(i64 4, ptr %i) #3
  br label %for.end

There is no comment here.
Another reason I think the line is label is from the original code:

  // Get label.
  StringRef Label = BS1.take_until([](char C) { return C == ':'; });
  // drop predecessors as they can be big and are redundant
  BS1 = BS1.drop_until([](char C) { return C == '\n'; }).drop_front();

It gets the label before erase the line. And the way it gets the label may indicate the line is the label.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this code would only be triggered by -print-changed=dot-cfg; I was just showing comments in the IR. There is a comment there in the before version, it is on the same line as the label ; preds = %for.cond
I do not recall the code intentionally removing the label. As I recall, the IR used to be of the form

<optional nl>
; preds = ...
label:
  <code>

and the code was designed to remove the predecessors comment. I suspect that the form of the IR has changed with the comment being moved and the code needs fixing...either way, the point of the line was to remove ; preds = %for.cond since it is redundant.
It appears that the pass PowerPC DAG->DAG Pattern Instruction Selection (ppc-isel) is changing the placement of the predecessors comment by injecting a nl, so it appears that the IR is morphing and it isn't surprising that this code has become broken...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, you mean that the comment should be removed while keeping the label.
Would it be acceptable to remove all such comments here as a solution?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, remove the comment while keeping the label. Perhaps removing a comment of the form ; preds = ...\n within the first couple of lines would work but be careful about scanning too far looking for it as that could be expensive. That said, this is probably a debug run anyway and efficiency is probably not the top priority here as this output is already expensive...

Add additional label check to avoid mis-deletion
of the first line of code.

Signed-off-by: Ruoyu Qiu <[email protected]>
Signed-off-by: Ruoyu Qiu <[email protected]>
Signed-off-by: Ruoyu Qiu <[email protected]>
Signed-off-by: Ruoyu Qiu <[email protected]>
Copy link
Contributor

@jamieschmeiser jamieschmeiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks.

@cabbaken
Copy link
Contributor Author

cabbaken commented Oct 4, 2025

Could you please help me merge this? I don’t have merge access.
@jamieschmeiser

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Clang] Clang crashed when compiling with -mllvm -print-changed=dot-cfg

2 participants