Skip to content

Conversation

@aaupov
Copy link
Contributor

@aaupov aaupov commented Apr 7, 2025

For indirect jumps with unknown control flow, BOLT will not set a tail
call annotation, which results in skipping such sites in call graph
construction, even if the jump has CallProfile annotation.

Missing call graph edges in turn hurt function ordering, especially when
they are hot.

Example from mkl_blas_xsgemm:

jmpq *%rax # UNKNOWN CONTROL FLOW # CallProfile: 506787 (0 misses) :
                                  { mkl_blas_avx512_xsgemm: 506787 (0 misses) }

Explicitly check the presence of a CallProfile annotation to avoid
missing call graph edge.

Test Plan: added unknown-cf-call-graph.s

Created using spr 1.3.4
Copy link
Contributor

@maksfb maksfb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Please add a comment in the code.

Created using spr 1.3.4
@aaupov aaupov marked this pull request as ready for review April 8, 2025 20:50
@llvmbot llvmbot added the BOLT label Apr 8, 2025
@llvmbot
Copy link
Member

llvmbot commented Apr 8, 2025

@llvm/pr-subscribers-bolt

Author: Amir Ayupov (aaupov)

Changes

For indirect jumps with unknown control flow, BOLT will not set a tail
call annotation, which results in skipping such sites in call graph
construction, even if the jump has CallProfile annotation.

Missing call graph edges in turn hurt function ordering, especially when
they are hot.

Example from mkl_blas_xsgemm:

jmpq *%rax # UNKNOWN CONTROL FLOW # CallProfile: 506787 (0 misses) :
                                  { mkl_blas_avx512_xsgemm: 506787 (0 misses) }

Explicitly check the presence of a CallProfile annotation to avoid
missing call graph edge.

Test Plan: added unknown-cf-call-graph.s


Full diff: https://github.com/llvm/llvm-project/pull/134733.diff

2 Files Affected:

  • (modified) bolt/lib/Core/BinaryFunctionCallGraph.cpp (+4-1)
  • (added) bolt/test/X86/unknown-cf-call-graph.s (+67)
diff --git a/bolt/lib/Core/BinaryFunctionCallGraph.cpp b/bolt/lib/Core/BinaryFunctionCallGraph.cpp
index b4b7897aa426a..5f99d008ebfe1 100644
--- a/bolt/lib/Core/BinaryFunctionCallGraph.cpp
+++ b/bolt/lib/Core/BinaryFunctionCallGraph.cpp
@@ -252,7 +252,10 @@ buildCallGraph(BinaryContext &BC, CgFilterFunction Filter, bool CgFromPerfData,
 
         for (MCInst &Inst : *BB) {
           // Find call instructions and extract target symbols from each one.
-          if (BC.MIB->isCall(Inst)) {
+          // Check CallProfile annotation if the instruction is not recognized
+          // as a call, e.g. unknown control flow indirect jump.
+          if (BC.MIB->isCall(Inst) ||
+              BC.MIB->hasAnnotation(Inst, "CallProfile")) {
             const CallInfoTy CallInfo = getCallInfo(BB, Inst);
 
             if (!CallInfo.empty()) {
diff --git a/bolt/test/X86/unknown-cf-call-graph.s b/bolt/test/X86/unknown-cf-call-graph.s
new file mode 100644
index 0000000000000..07b8ce335fbde
--- /dev/null
+++ b/bolt/test/X86/unknown-cf-call-graph.s
@@ -0,0 +1,67 @@
+## Check that indirect jumps with unknown control flow and set call profile are
+## handled in call graph construction.
+
+# RUN: split-file %s %t
+# RUN: llvm-mc -filetype=obj -triple x86_64-unknown-unknown %t/src -o %t.o
+# RUN: %clang %cflags %t.o -o %t.exe -Wl,-q
+# RUN: llvm-bolt %t.exe -o %t.out --lite=0 -print-cfg -data %t/yaml \
+# RUN:   --print-only=main --reorder-functions=cdsort --dump-cg=%t.dot \
+# RUN:   --profile-ignore-hash | FileCheck %s
+# RUN: FileCheck --input-file %t.dot --check-prefix=CHECK-CG %s
+# CHECK-CG:      digraph g {
+# CHECK-CG-NEXT: f0 [label="main
+# CHECK-CG-NEXT: f1 [label="foo
+# CHECK-CG-NEXT: f0 -> f1 [label="normWgt=0.000,weight=1000,callOffset=0.0"];
+# CHECK-CG-NEXT: }
+#--- src
+  .globl main
+  .type main, %function
+  .p2align 2
+main:
+   jmpq *%rax
+# CHECK: jmpq *%rax # UNKNOWN CONTROL FLOW # CallProfile: 1000 (0 misses) :
+# CHECK-NEXT:                              { foo: 1000 (0 misses) }
+   jmpq *%rbx
+.size main, .-main
+
+  .globl foo
+  .type foo, %function
+  .p2align 2
+foo:
+   ud2
+.size foo, .-foo
+.reloc 0, R_X86_64_NONE
+#--- yaml
+---
+header:
+  profile-version: 1
+  binary-name:     'test'
+  binary-build-id: 0
+  profile-flags:   [ lbr ]
+  profile-origin:  perf data aggregator
+  profile-events:  ''
+  dfs-order:       false
+  hash-func:       xxh3
+functions:
+  - name:    main
+    fid:     1
+    hash:    0x1
+    exec:    1000
+    nblocks: 1
+    blocks:
+      - bid:   0
+        insns: 1
+        hash:  0x1
+        exec:  1000
+        calls: [ { off: 0x0, fid: 2, cnt: 1000 } ]
+  - name:    foo
+    fid:     2
+    hash:    0x2
+    exec:    1000
+    nblocks: 1
+    blocks:
+      - bid:   0
+        insns: 1
+        hash:  0x1
+        exec:  1000
+...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants