Skip to content

Commit 2fcd5af

Browse files
committed
tracing/Documentation: Start a document on how to debug with tracing
Add a new document Documentation/trace/debugging.rst that will hold various ways to debug tracing. This initial version mentions trace_printk and how to create persistent buffers that can last across bootups. Cc: Masami Hiramatsu <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Vincent Donnefort <[email protected]> Cc: Joel Fernandes <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vineeth Pillai <[email protected]> Cc: Beau Belgrave <[email protected]> Cc: Alexander Graf <[email protected]> Cc: Baoquan He <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: "Paul E. McKenney" <[email protected]> Cc: David Howells <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Tony Luck <[email protected]> Cc: Guenter Roeck <[email protected]> Cc: Ross Zwisler <[email protected]> Cc: Kees Cook <[email protected]> Cc: Alexander Aring <[email protected]> Cc: "Luis Claudio R. Goncalves" <[email protected]> Cc: Tomas Glozar <[email protected]> Cc: John Kacur <[email protected]> Cc: Clark Williams <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: "Jonathan Corbet" <[email protected]> Link: https://lore.kernel.org/[email protected] Signed-off-by: Steven Rostedt (Google) <[email protected]>
1 parent ef2bd81 commit 2fcd5af

File tree

2 files changed

+161
-0
lines changed

2 files changed

+161
-0
lines changed

Documentation/admin-guide/kernel-parameters.txt

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6785,6 +6785,8 @@
67856785

67866786
reserve_mem=12M:4096:trace trace_instance=boot_map^traceoff^traceprintk@trace,sched,irq
67876787

6788+
See also Documentation/trace/debugging.rst
6789+
67886790

67896791
trace_options=[option-list]
67906792
[FTRACE] Enable or disable tracer options at boot.

Documentation/trace/debugging.rst

Lines changed: 159 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,159 @@
1+
==============================
2+
Using the tracer for debugging
3+
==============================
4+
5+
Copyright 2024 Google LLC.
6+
7+
:Author: Steven Rostedt <[email protected]>
8+
:License: The GNU Free Documentation License, Version 1.2
9+
(dual licensed under the GPL v2)
10+
11+
- Written for: 6.12
12+
13+
Introduction
14+
------------
15+
The tracing infrastructure can be very useful for debugging the Linux
16+
kernel. This document is a place to add various methods of using the tracer
17+
for debugging.
18+
19+
First, make sure that the tracefs file system is mounted::
20+
21+
$ sudo mount -t tracefs tracefs /sys/kernel/tracing
22+
23+
24+
Using trace_printk()
25+
--------------------
26+
27+
trace_printk() is a very lightweight utility that can be used in any context
28+
inside the kernel, with the exception of "noinstr" sections. It can be used
29+
in normal, softirq, interrupt and even NMI context. The trace data is
30+
written to the tracing ring buffer in a lockless way. To make it even
31+
lighter weight, when possible, it will only record the pointer to the format
32+
string, and save the raw arguments into the buffer. The format and the
33+
arguments will be post processed when the ring buffer is read. This way the
34+
trace_printk() format conversions are not done during the hot path, where
35+
the trace is being recorded.
36+
37+
trace_printk() is meant only for debugging, and should never be added into
38+
a subsystem of the kernel. If you need debugging traces, add trace events
39+
instead. If a trace_printk() is found in the kernel, the following will
40+
appear in the dmesg::
41+
42+
**********************************************************
43+
** NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE **
44+
** **
45+
** trace_printk() being used. Allocating extra memory. **
46+
** **
47+
** This means that this is a DEBUG kernel and it is **
48+
** unsafe for production use. **
49+
** **
50+
** If you see this message and you are not debugging **
51+
** the kernel, report this immediately to your vendor! **
52+
** **
53+
** NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE **
54+
**********************************************************
55+
56+
Debugging kernel crashes
57+
------------------------
58+
There is various methods of acquiring the state of the system when a kernel
59+
crash occurs. This could be from the oops message in printk, or one could
60+
use kexec/kdump. But these just show what happened at the time of the crash.
61+
It can be very useful in knowing what happened up to the point of the crash.
62+
The tracing ring buffer, by default, is a circular buffer than will
63+
overwrite older events with newer ones. When a crash happens, the content of
64+
the ring buffer will be all the events that lead up to the crash.
65+
66+
There are several kernel command line parameters that can be used to help in
67+
this. The first is "ftrace_dump_on_oops". This will dump the tracing ring
68+
buffer when a oops occurs to the console. This can be useful if the console
69+
is being logged somewhere. If a serial console is used, it may be prudent to
70+
make sure the ring buffer is relatively small, otherwise the dumping of the
71+
ring buffer may take several minutes to hours to finish. Here's an example
72+
of the kernel command line::
73+
74+
ftrace_dump_on_oops trace_buf_size=50K
75+
76+
Note, the tracing buffer is made up of per CPU buffers where each of these
77+
buffers is broken up into sub-buffers that are by default PAGE_SIZE. The
78+
above trace_buf_size option above sets each of the per CPU buffers to 50K,
79+
so, on a machine with 8 CPUs, that's actually 400K total.
80+
81+
Persistent buffers across boots
82+
-------------------------------
83+
If the system memory allows it, the tracing ring buffer can be specified at
84+
a specific location in memory. If the location is the same across boots and
85+
the memory is not modified, the tracing buffer can be retrieved from the
86+
following boot. There's two ways to reserve memory for the use of the ring
87+
buffer.
88+
89+
The more reliable way (on x86) is to reserve memory with the "memmap" kernel
90+
command line option and then use that memory for the trace_instance. This
91+
requires a bit of knowledge of the physical memory layout of the system. The
92+
advantage of using this method, is that the memory for the ring buffer will
93+
always be the same::
94+
95+
memmap==12M$0x284500000 trace_instance=boot_map@0x284500000:12M
96+
97+
The memmap above reserves 12 megabytes of memory at the physical memory
98+
location 0x284500000. Then the trace_instance option will create a trace
99+
instance "boot_map" at that same location with the same amount of memory
100+
reserved. As the ring buffer is broke up into per CPU buffers, the 12
101+
megabytes will be broken up evenly between those CPUs. If you have 8 CPUs,
102+
each per CPU ring buffer will be 1.5 megabytes in size. Note, that also
103+
includes meta data, so the amount of memory actually used by the ring buffer
104+
will be slightly smaller.
105+
106+
Another more generic but less robust way to allocate a ring buffer mapping
107+
at boot is with the "reserve_mem" option::
108+
109+
reserve_mem=12M:4096:trace trace_instance=boot_map@trace
110+
111+
The reserve_mem option above will find 12 megabytes that are available at
112+
boot up, and align it by 4096 bytes. It will label this memory as "trace"
113+
that can be used by later command line options.
114+
115+
The trace_instance option creates a "boot_map" instance and will use the
116+
memory reserved by reserve_mem that was labeled as "trace". This method is
117+
more generic but may not be as reliable. Due to KASLR, the memory reserved
118+
by reserve_mem may not be located at the same location. If this happens,
119+
then the ring buffer will not be from the previous boot and will be reset.
120+
121+
Sometimes, by using a larger alignment, it can keep KASLR from moving things
122+
around in such a way that it will move the location of the reserve_mem. By
123+
using a larger alignment, you may find better that the buffer is more
124+
consistent to where it is placed::
125+
126+
reserve_mem=12M:0x2000000:trace trace_instance=boot_map@trace
127+
128+
On boot up, the memory reserved for the ring buffer is validated. It will go
129+
through a series of tests to make sure that the ring buffer contains valid
130+
data. If it is, it will then set it up to be available to read from the
131+
instance. If it fails any of the tests, it will clear the entire ring buffer
132+
and initialize it as new.
133+
134+
The layout of this mapped memory may not be consistent from kernel to
135+
kernel, so only the same kernel is guaranteed to work if the mapping is
136+
preserved. Switching to a different kernel version may find a different
137+
layout and mark the buffer as invalid.
138+
139+
Using trace_printk() in the boot instance
140+
-----------------------------------------
141+
By default, the content of trace_printk() goes into the top level tracing
142+
instance. But this instance is never preserved across boots. To have the
143+
trace_printk() content, and some other internal tracing go to the preserved
144+
buffer (like dump stacks), either set the instance to be the trace_printk()
145+
destination from the kernel command line, or set it after boot up via the
146+
trace_printk_dest option.
147+
148+
After boot up::
149+
150+
echo 1 > /sys/kernel/tracing/instances/boot_map/options/trace_printk_dest
151+
152+
From the kernel command line::
153+
154+
reserve_mem=12M:4096:trace trace_instance=boot_map^traceprintk^traceoff@trace
155+
156+
If setting it from the kernel command line, it is recommended to also
157+
disable tracing with the "traceoff" flag, and enable tracing after boot up.
158+
Otherwise the trace from the most recent boot will be mixed with the trace
159+
from the previous boot, and may make it confusing to read.

0 commit comments

Comments
 (0)