Skip to content

Commit 8b715ab

Browse files
[GR-2092] Add perf-based profiling support for runtime-compiled methods.
PullRequest: graal/21851
2 parents 138c3cb + 418f70d commit 8b715ab

File tree

21 files changed

+1625
-254
lines changed

21 files changed

+1625
-254
lines changed

docs/reference-manual/native-image/PerfProfiling.md

Lines changed: 124 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -69,11 +69,61 @@ The following command assumes that `native-image` is on the system path and avai
6969
If it is not installed, refer to the [Getting Started](README.md).
7070

7171
```bash
72-
native-image -g <entry_class>
72+
native-image -g -H:+PreserveFramePointer <entry_class>
7373
```
7474

7575
The `-g` option instructs Native Image to produce debug information for the generated binary.
7676
`perf` can use this debug information, for example, to provide proper names for types and methods in traces.
77+
The `-H:+PreserveFramePointer` option instructs Native Image to save frame pointers on the stack.
78+
This allows `perf` to reliably unwind stack frames and reconstruct the call hierarchy.
79+
80+
### Profiling of Runtime-Compiled Methods
81+
82+
Native Image can generate detailed runtime compilation metadata for perf in the [jitdump](https://github.com/torvalds/linux/blob/46a51f4f5edade43ba66b3c151f0e25ec8b69cb6/tools/perf/Documentation/jitdump-specification.txt) format.
83+
This enables perf profiling of runtime compiled methods, for example for Truffle compilations.
84+
85+
#### jitdump
86+
87+
The jitdump format stores detailed metadata for runtime compiled code.
88+
This requires post-processing of the perf data to inject the runtime compilation metadata.
89+
90+
1. Build with jitdump support:
91+
92+
```bash
93+
native-image -g -H:+PreserveFramePointer -H:+RuntimeDebugInfo -H:RuntimeDebugInfoFormat=jitdump ...
94+
```
95+
96+
At image-runtime, the jitdump file _<jitdump_dir>/jit-<pid>.dump_ is created, and runtime compilation metadata is written to it.
97+
The output directory can be configured with `-R:RuntimeJitdumpDir=<jitdump_dir>` (defaults to _./jitdump_).
98+
99+
2. Record with perf:
100+
101+
When recording profiling data, use the `-k 1` option to ensure time-based events are ordered correctly for injection:
102+
103+
```bash
104+
perf record -k 1 -o perf.data <your-application>
105+
```
106+
107+
If the perf data was not recorded with `-k 1`, injecting runtime compilation metadata from a jitdump file will fail.
108+
109+
3. Inject jitdump into perf data:
110+
111+
```bash
112+
perf inject -j -i perf.data -o perf.jit.data
113+
```
114+
115+
This step:
116+
- Locates the jitdump file.
117+
- Generates a _.so_ file for each runtime compilation entry in the jitdump file.
118+
- Injects runtime compilation metadata into the profiling data and stores it in _perf.jit.data_.
119+
120+
4. Inspect profiling data:
121+
122+
```bash
123+
perf report -i perf.jit.data
124+
```
125+
126+
Symbols from the jitdump file appear as coming from _jitted-<pid>-<code_id>.so_, where `code_id` is the index of a compilation entry in the jitdump file.
77127

78128
## Basic Operations
79129

@@ -141,6 +191,79 @@ The `-g` option instructs Native Image to produce debug information for the gene
141191

142192
This command generates a script that can be used for analyzing the recorded trace data.
143193

194+
## Generating Flame Graphs from Profiling Data
195+
196+
[FlameGraph](https://github.com/brendangregg/FlameGraph) is a tool written in Perl that can be used to produce flame graphs from perf profiling data.
197+
Flame graphs generated by this tool visualize stack samples as interactive SVGs, making it easy to identify hot code paths in an application.
198+
199+
1. Download the tool and record profiling data as described in [Basic Operations](#basic-operations).
200+
201+
Make sure the profiling data was recorded with `-g` to capture call graphs, otherwise the flame graph will be flat.
202+
203+
2. Fold stacks:
204+
```bash
205+
perf script -i perf.data | ./stackcollapse-perf.pl > perf.data.folded
206+
```
207+
208+
3. Render an SVG:
209+
```bash
210+
./flamegraph.pl perf.data.folded > perf.data.svg
211+
```
212+
213+
4. Open the flame graph:
214+
215+
Use an application to view the generated SVG file (for example, `firefox`, `chromium`).
216+
```bash
217+
firefox perf.data.svg
218+
```
219+
220+
### Highlighting Runtime-Compiled Methods
221+
222+
If the native image supports [profiling of runtime-compiled methods](#profiling-of-runtime-compiled-methods), it is possible to highlight runtime-compiled symbols in the flame graph.
223+
224+
1. Build the native image with jitdump support, record profiling data and inject the jitdump information as described in [jitdump](#jitdump).
225+
226+
2. Fold stacks:
227+
228+
This involves folding the stacks for the non-jitdump-injected _perf.data_ and the jitdump-injected _perf.jit.data_.
229+
```bash
230+
perf script -i perf.data | ./stackcollapse-perf.pl > perf.data.folded
231+
perf script -i perf.jit.data | ./stackcollapse-perf.pl > perf.jit.data.folded
232+
```
233+
234+
3. Generate a consistent color palette map:
235+
236+
Use the non-jitdump-injected _perf.data.folded_ to create a consistent palette map in _palette.map_ for events in _perf.data_.
237+
The first call with `--cp` will create the map while subsequent calls with `--cp` reuse the map for consistent coloring of known events.
238+
This also produces a flame graph for the non-jitdump-injected data.
239+
```bash
240+
./flamegraph.pl --cp perf.data.folded > perf.data.svg
241+
```
242+
243+
4. Reuse the color palette map:
244+
245+
Use the consistent palette for already known events with `--cp` for the jitdump-injected _perf.jit.data.folded_.
246+
This is, events already seen in the non-jitdump-injected _perf.data.folded_ get a fixed coloring.
247+
New events get a random coloring from the palette selected with the `--color` option (e.g. `mem`).
248+
```bash
249+
./flamegraph.pl --cp --color mem perf.jit.data.folded > perf.jit.data.svg
250+
```
251+
252+
5. Open the flame graph:
253+
```bash
254+
firefox perf.jit.data.svg
255+
```
256+
257+
### Generate an Invocation-Time-Ordered Flame Graph
258+
259+
Generate a stack-reversed flame graph with the topmost frames shown at the bottom of the flame graph in order of invocation time.
260+
Calls appear left-to-right in chronological order, with stack frames in each call arranged top-to-bottom from oldest to newest.
261+
Events from all threads contributing to the profiling data are shown interleaved.
262+
```bash
263+
./flamegraph.pl --reverse perf.data.folded > perf.data.svg
264+
firefox perf.data.svg
265+
```
266+
144267
### Related Documentation
145268

146269
* [Debug Information](DebugInfo.md)

substratevm/CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@ This changelog summarizes major changes to GraalVM Native Image.
66
* (GR-44384) Add size warnings for bundles when individual or cumulative file sizes exceed limits. Configure with options `size-warning-file-limit` and `size-warning-total-limit` to `bundle-create`, sizes in MiB.
77
* (GR-43070) Add a new API flag `-Werror` to treat warnings as errors.
88
* (GR-69280) Allow use of the `graal.` prefix for options without issuing a warning.
9+
* (GR-2092) Add jitdump support for recording run-time compilation metadata for perf (see PerfProfiling.md). Can be enabled with `-g -H:+RuntimeDebugInfo -H:RuntimeDebugInfoFormat=jitdump`.
910

1011
## GraalVM 25
1112
* (GR-52276) (GR-61959) Add support for Arena.ofShared().

substratevm/debug/include/gdb_jit_compilation_interface.h

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,9 @@
2727
#define SVM_NATIVE_GDBJITCOMPILATIONINTERFACE_H
2828

2929
// This header specifies the types used by the GDB JIT compilation interface (see https://sourceware.org/gdb/current/onlinedocs/gdb.html/Declarations.html#Declarations)
30-
// The implementation of the JIT compilation interface is located in com.oracle.svm.core.debug.GdbJitInterface.
30+
// The implementation of the JIT compilation interface is located in com.oracle.svm.core.debug.gdb.GdbJitInterface.
31+
32+
#ifdef __linux__
3133

3234
#include <stdint.h>
3335

@@ -56,4 +58,6 @@ struct jit_descriptor
5658
struct jit_code_entry *first_entry;
5759
};
5860

61+
#endif /* __linux__ */
62+
5963
#endif
Lines changed: 91 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,91 @@
1+
/*
2+
* Copyright (c) 2025, 2025, Oracle and/or its affiliates. All rights reserved.
3+
* DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
4+
*
5+
* This code is free software; you can redistribute it and/or modify it
6+
* under the terms of the GNU General Public License version 2 only, as
7+
* published by the Free Software Foundation. Oracle designates this
8+
* particular file as subject to the "Classpath" exception as provided
9+
* by Oracle in the LICENSE file that accompanied this code.
10+
*
11+
* This code is distributed in the hope that it will be useful, but WITHOUT
12+
* ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
13+
* FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
14+
* version 2 for more details (a copy is included in the LICENSE file that
15+
* accompanied this code).
16+
*
17+
* You should have received a copy of the GNU General Public License version
18+
* 2 along with this work; if not, write to the Free Software Foundation,
19+
* Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA.
20+
*
21+
* Please contact Oracle, 500 Oracle Parkway, Redwood Shores, CA 94065 USA
22+
* or visit www.oracle.com if you need additional information or have any
23+
* questions.
24+
*/
25+
26+
#ifndef SVM_NATIVE_JITDUMPRECORDS_H
27+
#define SVM_NATIVE_JITDUMPRECORDS_H
28+
29+
#ifdef __linux__
30+
31+
// This header specifies the jitdump entries described in the Jitdump specification (see https://github.com/torvalds/linux/blob/46a51f4f5edade43ba66b3c151f0e25ec8b69cb6/tools/perf/Documentation/jitdump-specification.txt)
32+
// The implementation of the Jitdump provider generating these entries is located in com.oracle.svm.core.posix.debug.jitdump.JitdumpProvider.
33+
34+
#include <stdint.h>
35+
36+
typedef enum
37+
{
38+
JIT_CODE_LOAD = 0,
39+
JIT_CODE_MOVE = 1,
40+
JIT_CODE_DEBUG_INFO = 2,
41+
JIT_CODE_CLOSE = 3,
42+
JIT_CODE_UNWINDING_INFO = 4
43+
} record_type;
44+
45+
struct file_header
46+
{
47+
uint32_t magic;
48+
uint32_t version;
49+
uint32_t total_size;
50+
uint32_t elf_mach;
51+
uint32_t pad1;
52+
uint32_t pid;
53+
uint64_t timestamp;
54+
uint64_t flags;
55+
};
56+
57+
struct record_header
58+
{
59+
uint32_t id;
60+
uint32_t total_size;
61+
uint64_t timestamp;
62+
};
63+
64+
struct code_load_record
65+
{
66+
struct record_header header;
67+
uint32_t pid;
68+
uint32_t tid;
69+
uint64_t vma;
70+
uint64_t code_addr;
71+
uint64_t code_size;
72+
uint64_t code_index;
73+
};
74+
75+
struct debug_entry
76+
{
77+
uint64_t code_addr;
78+
uint32_t line;
79+
uint32_t discrim;
80+
};
81+
82+
struct debug_info_record
83+
{
84+
struct record_header header;
85+
uint64_t code_addr;
86+
uint64_t nr_entry;
87+
};
88+
89+
#endif /* __linux__ */
90+
91+
#endif

substratevm/mx.substratevm/mx_substratevm.py

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -793,9 +793,10 @@ def jvm_unittest(args):
793793
return mx_unittest.unittest(['--suite', 'substratevm'] + args)
794794

795795

796-
def js_image_test(jslib, bench_location, name, warmup_iterations, iterations, timeout=None, bin_args=None):
796+
def js_image_test(jslib, bench_location, name, warmup_iterations, iterations, timeout=None, bin_args=None, pre_args=None):
797797
bin_args = bin_args if bin_args is not None else []
798-
jsruncmd = [get_js_launcher(jslib)] + bin_args + [join(bench_location, 'harness.js'), '--', join(bench_location, name + '.js'),
798+
pre_args = pre_args if pre_args is not None else []
799+
jsruncmd = pre_args + [get_js_launcher(jslib)] + bin_args + [join(bench_location, 'harness.js'), '--', join(bench_location, name + '.js'),
799800
'--', '--warmup-time=' + str(15_000),
800801
'--warmup-iterations=' + str(warmup_iterations),
801802
'--iterations=' + str(iterations)]
@@ -833,10 +834,10 @@ def build_js_lib(native_image):
833834
def get_js_launcher(jslib):
834835
return os.path.join(os.path.dirname(jslib), "..", "bin", "js")
835836

836-
def test_js(js, benchmarks, bin_args=None):
837+
def test_js(js, benchmarks, bin_args=None, pre_args=None):
837838
bench_location = join(suite.dir, '..', '..', 'js-benchmarks')
838839
for benchmark_name, warmup_iterations, iterations, timeout in benchmarks:
839-
js_image_test(js, bench_location, benchmark_name, warmup_iterations, iterations, timeout, bin_args=bin_args)
840+
js_image_test(js, bench_location, benchmark_name, warmup_iterations, iterations, timeout, bin_args=bin_args, pre_args=pre_args)
840841

841842
def test_run(cmds, expected_stdout, timeout=10, env=None):
842843
stdoutdata = []
@@ -1296,6 +1297,7 @@ def _runtimedebuginfotest(native_image, output_path, with_isolates_only, args=No
12961297
'-H:DebugInfoSourceSearchPath=' + test_source_path,
12971298
'-H:+SourceLevelDebug',
12981299
'-H:+RuntimeDebugInfo',
1300+
# We rely on '-H:RuntimeDebugInfoFormat' to default to 'objfile', which is required for this test
12991301
]) + args
13001302

13011303
mx.log(f"native-image {' '.join(build_args)}")
@@ -1316,6 +1318,7 @@ def run_js_test(eager: bool = False):
13161318
'-H:+SourceLevelDebug',
13171319
'-H:+RuntimeDebugInfo',
13181320
'-H:-LazyDeoptimization' if eager else '-H:+LazyDeoptimization',
1321+
# We rely on '-H:RuntimeDebugInfoFormat' to default to 'objfile', which is required for this test
13191322
]) +
13201323
['-g', '--macro:jsvm-library']
13211324
))

substratevm/mx.substratevm/suite.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1968,6 +1968,7 @@
19681968
"com.oracle.objectfile.debugentry",
19691969
"com.oracle.objectfile.debugentry.range",
19701970
"com.oracle.objectfile.macho",
1971+
"com.oracle.objectfile.elf",
19711972
],
19721973

19731974
"requiresConcealed" : {

0 commit comments

Comments
 (0)