Skip to content
28 changes: 20 additions & 8 deletions llvm/lib/Passes/StandardInstrumentations.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -727,8 +727,13 @@ template <typename T>
template <typename FunctionT>
bool IRComparer<T>::generateFunctionData(IRDataT<T> &Data, const FunctionT &F) {
if (shouldGenerateData(F)) {
FuncDataT<T> FD(F.front().getName().str());
int I = 0;
// Basic block with numberic label will be empty here.
std::string FDEntryBlockName = F.front().getName().str();
if (FDEntryBlockName.empty())
FDEntryBlockName = formatv("{0}", I);

FuncDataT<T> FD(FDEntryBlockName);
for (const auto &B : F) {
std::string BBName = B.getName().str();
if (BBName.empty()) {
Expand Down Expand Up @@ -1791,10 +1796,10 @@ class DotCfgDiffNode {
// Create a node in Dot difference graph \p G representing the basic block
// represented by \p BD with colour \p Colour (where it exists).
DotCfgDiffNode(DotCfgDiff &G, unsigned N, const BlockDataT<DCData> &BD,
StringRef Colour)
: Graph(G), N(N), Data{&BD, nullptr}, Colour(Colour) {}
StringRef Label, StringRef Colour)
: Graph(G), N(N), Data{&BD, nullptr}, Label(Label), Colour(Colour) {}
DotCfgDiffNode(const DotCfgDiffNode &DN)
: Graph(DN.Graph), N(DN.N), Data{DN.Data[0], DN.Data[1]},
: Graph(DN.Graph), N(DN.N), Data{DN.Data[0], DN.Data[1]}, Label(DN.Label),
Colour(DN.Colour), EdgesMap(DN.EdgesMap), Children(DN.Children),
Edges(DN.Edges) {}

Expand All @@ -1803,6 +1808,8 @@ class DotCfgDiffNode {
// The label of the basic block
StringRef getLabel() const {
assert(Data[0] && "Expected Data[0] to be set.");
assert((Data[0]->getLabel().empty() || Data[0]->getLabel() == Label) &&
"Unexpected label");
return Data[0]->getLabel();
}
// Return the colour for this block
Expand Down Expand Up @@ -1840,6 +1847,7 @@ class DotCfgDiffNode {
DotCfgDiff &Graph;
const unsigned N;
const BlockDataT<DCData> *Data[2];
StringRef Label;
StringRef Colour;
std::map<const unsigned, std::pair<std::string, StringRef>> EdgesMap;
std::vector<unsigned> Children;
Expand Down Expand Up @@ -1888,7 +1896,7 @@ class DotCfgDiff {

void createNode(StringRef Label, const BlockDataT<DCData> &BD, StringRef C) {
unsigned Pos = Nodes.size();
Nodes.emplace_back(*this, Pos, BD, C);
Nodes.emplace_back(*this, Pos, BD, Label, C);
NodePosition.insert({Label, Pos});
}

Expand Down Expand Up @@ -1950,10 +1958,9 @@ std::string DotCfgDiffNode::getBodyContent() const {
// Drop leading newline, if present.
if (BS.front() == '\n')
BS1 = BS1.drop_front(1);
// Get label.
StringRef Label = BS1.take_until([](char C) { return C == ':'; });
// drop predecessors as they can be big and are redundant
BS1 = BS1.drop_until([](char C) { return C == '\n'; }).drop_front();
if (BS1.str().find(Label) != std::string::npos)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some questions:

  1. why call str() when you can just use find on the StringRef?
  2. why are you adding a condition to dropping the predecessors? They are dropped since they can be large which can clutter up the diagram.
  3. Are you going to find Label if it has been manufactured with a number? In this case, won't you then preserve the predecessors, leading to a cluttered diagram, or is that the whole point of storing the label -- so that you know when to retain predecessors since there is no label?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I was not familiar with StringRef that time, I'm sorry I didn't look for the native implementation of find. I will correct it.
  2. With the line BS1 = BS1.drop_front(1);, we have erase all leading '\n', the line here is intend to erase the label line. To the BB without label, it will erase the first line of code. So we need to check if there is a label line here.
  3. In fact, the label number assign happens only in both saveIRBeforePass and handleIRAfterPass, called before and after a pass. The sequence of BB decides the value of label(I know the sequence may be changed in some passes, but I haven't thought of a better way yet)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. +1
  2. this will only remove the first nl, if present, not all of them.
    IIRC (and also based on the comment), the line
    BS1 = BS1.drop_until([](char C) { return C == '\n'; }).drop_front(); was intended to remove the line listing the predecessor blocks. Consider a block that has many predecessors (eg 10 predecessors). The graph shows 10 arrows in so you can easily determine all predecessors. However, now you are going to get a comment listing the 10 predecessors, making this particular node graphically large, making it difficult to see the other nodes in the graph. Plus the comment is redundant since you can graphically see the predecessors or you would be able to if the comment weren't there. Therefore, I removed it. This line would drop the comment by dropping everything until a nl is reached.
    If there were no label, I suspect that there might also not be a comment with the predecessors, resulting in the line erasing the first line of code, as you state. That is why I find this test troubling...it is dropping the line if there is a label, which is testing a symptom rather than the intention. With an added label, is there still the comment with the predecessors? If not, something else might still be dropped.
    Would it work if you instead tested for a comment since the point of the line is to remove the comment with predecessors? I think this would be clearer.
  3. Okay...Here is an idea and I don't know if would work or if it is worth the effort... Since the label is artificial and only seen here (right?), then a different ordering could trigger a spurious reporting of a change (since the label is different). Would it work if you used a negative integer for this initial label (with a comment stating it is artificial) and then ignored a difference of a negative initial label when checking if a change occurred? If I understand correctly, you would only need to check the initial label for this special case so it shouldn't be too expensive. Of course, if this label persists or is visible elsewhere, then this would not work. I wouldn't do that in this PR however.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Could you clarify what the comment here refers to? I previously thought that the in-memory LLVM IR does not have a representation for comments.
    And I'm sorry about the misunderstanding of drop_until.
  2. It's right that different ordering will cause a bad output.
    The passes do not record the label change history, so it is difficult to locate the corresponding non-label BB before and after a pass. In fact, in my patch, we will apply the numeric label to the BBs before and after the passes. So the negative integer may not work properly in this context.
    This issue may need to be addressed in a separate PR.

Copy link
Contributor

@jamieschmeiser jamieschmeiser Sep 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Consider the following program with a loop:
extern "C" int printf(const char*, ...);
int main() {
  for (int i = 0; i <= 5; ++i)
    printf("%d\n", i);
  return 0;
}

compiled with clang++ -mllvm -print-changed -O f.C. I'm using print-changed since the comment with predecessors is removed in the dot-cfg representation. Here is abbreviated final IR that I got:

bb.0.entry:
  successors: %bb.1(0x80000000); %bb.1(100.00%)
  liveins: $r29, $r30, $r31
  $r0 = MFLR implicit $lr
  ...

bb.1.for.body (align 32):
; predecessors: %bb.0, %bb.1
  successors: %bb.1(0x7c000000), %bb.2(0x04000000); %bb.1(96.88%), %bb.2(3.12%)
 ...

bb.2.for.cond.cleanup:
; predecessors: %bb.1

  $r31 = LWZ 76, $r1 :: (load (s32) from %fixed-stack.0)
...

The comment I am referring to is the ; predecessors:... line in bb.1.for.body and bb.2.for.code.cleanup. This comment is there to indicate flow of control in the IR but it is redundant in the graph since the flow of control between the blocks is shown graphically.
3. I'm not surprised that it doesn't work...It was just an idea that came to me as I was reading your comments. I agree that that should be dealt with in a different PR, if at all; there is a trade-off between effort and result.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested the code by compiling it with clang++ -mllvm -print-changed -O t.c, and there were indeed some comments in the output.
Then, I added some instrumentation:

diff --git a/llvm/lib/Passes/StandardInstrumentations.cpp b/llvm/lib/Passes/StandardInstrumentations.cpp
index f67d4de4b..307efd8f6 100644
--- a/llvm/lib/Passes/StandardInstrumentations.cpp
+++ b/llvm/lib/Passes/StandardInstrumentations.cpp
@@ -1959,8 +1960,10 @@ std::string DotCfgDiffNode::getBodyContent() const {
   if (BS.front() == '\n')
     BS1 = BS1.drop_front(1);
   // drop predecessors as they can be big and are redundant
+  dbgs() << "Before_BS1:\n" << BS1 << "\n";
   BS1 = BS1.drop_until([](char C) { return C == '\n'; }).drop_front();
+  dbgs() << "After_BS1:\n" << BS1 << "\n";

When I compiled with clang++ -mllvm -print-changed -O t.c, this code path was not triggered.
However, when compiling with clang++ -mllvm -print-changed=dot-cfg -O t.c, the output was:

Before_BS1:
for.cond.cleanup:                                 ; preds = %for.cond
  call void @llvm.lifetime.end.p0(i64 4, ptr %i) #3
  br label %for.end

After_BS1:
  call void @llvm.lifetime.end.p0(i64 4, ptr %i) #3
  br label %for.end

There is no comment here.
Another reason I think the line is label is from the original code:

  // Get label.
  StringRef Label = BS1.take_until([](char C) { return C == ':'; });
  // drop predecessors as they can be big and are redundant
  BS1 = BS1.drop_until([](char C) { return C == '\n'; }).drop_front();

It gets the label before erase the line. And the way it gets the label may indicate the line is the label.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this code would only be triggered by -print-changed=dot-cfg; I was just showing comments in the IR. There is a comment there in the before version, it is on the same line as the label ; preds = %for.cond
I do not recall the code intentionally removing the label. As I recall, the IR used to be of the form

<optional nl>
; preds = ...
label:
  <code>

and the code was designed to remove the predecessors comment. I suspect that the form of the IR has changed with the comment being moved and the code needs fixing...either way, the point of the line was to remove ; preds = %for.cond since it is redundant.
It appears that the pass PowerPC DAG->DAG Pattern Instruction Selection (ppc-isel) is changing the placement of the predecessors comment by injecting a nl, so it appears that the IR is morphing and it isn't surprising that this code has become broken...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, you mean that the comment should be removed while keeping the label.
Would it be acceptable to remove all such comments here as a solution?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, remove the comment while keeping the label. Perhaps removing a comment of the form ; preds = ...\n within the first couple of lines would work but be careful about scanning too far looking for it as that could be expensive. That said, this is probably a debug run anyway and efficiency is probably not the top priority here as this output is already expensive...

BS1 = BS1.drop_until([](char C) { return C == '\n'; }).drop_front();

std::string S = "<FONT COLOR=\"" + Colour.str() + "\">" + Label.str() + ":";

Expand Down Expand Up @@ -1995,6 +2002,11 @@ DotCfgDiff::DotCfgDiff(StringRef Title, const FuncDataT<DCData> &Before,
EdgesMap.insert({Key, BeforeColour});
}
}
if (Before == After) {
for (auto &I : Nodes)
I.finalize(*this);
return;
}

// Handle each basic block in the after IR
for (auto &A : After.getData()) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -110,7 +110,7 @@
; RUN: FileCheck %s -input-file=%t/passes.html --check-prefix=CHECK-DOT-CFG-QUIET-MULT-PASSES-FILTER-FUNC

define i32 @g() {
entry:
1:
%a = add i32 2, 3
ret i32 %a
}
Expand Down