Skip to content

Commit b47d1fc

Browse files
Linu CherianSuzuki K Poulose
authored andcommitted
Documentation: coresight: Panic support
Add documentation on using coresight during panic and watchdog. Signed-off-by: Linu Cherian <[email protected]> Signed-off-by: Suzuki K Poulose <[email protected]> Link: https://lore.kernel.org/r/[email protected]
1 parent 4b7e626 commit b47d1fc

File tree

1 file changed

+362
-0
lines changed

1 file changed

+362
-0
lines changed
Lines changed: 362 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,362 @@
1+
===================================================
2+
Using Coresight for Kernel panic and Watchdog reset
3+
===================================================
4+
5+
Introduction
6+
------------
7+
This documentation is about using Linux coresight trace support to
8+
debug kernel panic and watchdog reset scenarios.
9+
10+
Coresight trace during Kernel panic
11+
-----------------------------------
12+
From the coresight driver point of view, addressing the kernel panic
13+
situation has four main requirements.
14+
15+
a. Support for allocation of trace buffer pages from reserved memory area.
16+
Platform can advertise this using a new device tree property added to
17+
relevant coresight nodes.
18+
19+
b. Support for stopping coresight blocks at the time of panic
20+
21+
c. Saving required metadata in the specified format
22+
23+
d. Support for reading trace data captured at the time of panic
24+
25+
Allocation of trace buffer pages from reserved RAM
26+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
27+
A new optional device tree property "memory-region" is added to the
28+
Coresight TMC device nodes, that would give the base address and size of trace
29+
buffer.
30+
31+
Static allocation of trace buffers would ensure that both IOMMU enabled
32+
and disabled cases are handled. Also, platforms that support persistent
33+
RAM will allow users to read trace data in the subsequent boot without
34+
booting the crashdump kernel.
35+
36+
Note:
37+
For ETR sink devices, this reserved region will be used for both trace
38+
capture and trace data retrieval.
39+
For ETF sink devices, internal SRAM would be used for trace capture,
40+
and they would be synced to reserved region for retrieval.
41+
42+
43+
Disabling coresight blocks at the time of panic
44+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
45+
In order to avoid the situation of losing relevant trace data after a
46+
kernel panic, it would be desirable to stop the coresight blocks at the
47+
time of panic.
48+
49+
This can be achieved by configuring the comparator, CTI and sink
50+
devices as below::
51+
52+
Trigger on panic
53+
Comparator --->External out --->CTI -->External In---->ETR/ETF stop
54+
55+
Saving metadata at the time of kernel panic
56+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
57+
Coresight metadata involves all additional data that are required for a
58+
successful trace decode in addition to the trace data. This involves
59+
ETR/ETF/ETB register snapshot etc.
60+
61+
A new optional device property "memory-region" is added to
62+
the ETR/ETF/ETB device nodes for this.
63+
64+
Reading trace data captured at the time of panic
65+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
66+
Trace data captured at the time of panic, can be read from rebooted kernel
67+
or from crashdump kernel using a special device file /dev/crash_tmc_xxx.
68+
This device file is created only when there is a valid crashdata available.
69+
70+
General flow of trace capture and decode incase of kernel panic
71+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
72+
1. Enable source and sink on all the cores using the sysfs interface.
73+
ETR sinks should have trace buffers allocated from reserved memory,
74+
by selecting "resrv" buffer mode from sysfs.
75+
76+
2. Run relevant tests.
77+
78+
3. On a kernel panic, all coresight blocks are disabled, necessary
79+
metadata is synced by kernel panic handler.
80+
81+
System would eventually reboot or boot a crashdump kernel.
82+
83+
4. For platforms that supports crashdump kernel, raw trace data can be
84+
dumped using the coresight sysfs interface from the crashdump kernel
85+
itself. Persistent RAM is not a requirement in this case.
86+
87+
5. For platforms that supports persistent RAM, trace data can be dumped
88+
using the coresight sysfs interface in the subsequent Linux boot.
89+
Crashdump kernel is not a requirement in this case. Persistent RAM
90+
ensures that trace data is intact across reboot.
91+
92+
Coresight trace during Watchdog reset
93+
-------------------------------------
94+
The main difference between addressing the watchdog reset and kernel panic
95+
case are below,
96+
97+
a. Saving coresight metadata need to be taken care by the
98+
SCP(system control processor) firmware in the specified format,
99+
instead of kernel.
100+
101+
b. Reserved memory region given by firmware for trace buffer and metadata
102+
has to be in persistent RAM.
103+
Note: This is a requirement for watchdog reset case but optional
104+
in kernel panic case.
105+
106+
Watchdog reset can be supported only on platforms that meet the above
107+
two requirements.
108+
109+
Sample commands for testing a Kernel panic case with ETR sink
110+
-------------------------------------------------------------
111+
112+
1. Boot Linux kernel with "crash_kexec_post_notifiers" added to the kernel
113+
bootargs. This is mandatory if the user would like to read the tracedata
114+
from the crashdump kernel.
115+
116+
2. Enable the preloaded ETM configuration::
117+
118+
#echo 1 > /sys/kernel/config/cs-syscfg/configurations/panicstop/enable
119+
120+
3. Configure CTI using sysfs interface::
121+
122+
#./cti_setup.sh
123+
124+
#cat cti_setup.sh
125+
126+
127+
cd /sys/bus/coresight/devices/
128+
129+
ap_cti_config () {
130+
#ETM trig out[0] trigger to Channel 0
131+
echo 0 4 > channels/trigin_attach
132+
}
133+
134+
etf_cti_config () {
135+
#ETF Flush in trigger from Channel 0
136+
echo 0 1 > channels/trigout_attach
137+
echo 1 > channels/trig_filter_enable
138+
}
139+
140+
etr_cti_config () {
141+
#ETR Flush in from Channel 0
142+
echo 0 1 > channels/trigout_attach
143+
echo 1 > channels/trig_filter_enable
144+
}
145+
146+
ctidevs=`find . -name "cti*"`
147+
148+
for i in $ctidevs
149+
do
150+
cd $i
151+
152+
connection=`find . -name "ete*"`
153+
if [ ! -z "$connection" ]
154+
then
155+
echo "AP CTI config for $i"
156+
ap_cti_config
157+
fi
158+
159+
connection=`find . -name "tmc_etf*"`
160+
if [ ! -z "$connection" ]
161+
then
162+
echo "ETF CTI config for $i"
163+
etf_cti_config
164+
fi
165+
166+
connection=`find . -name "tmc_etr*"`
167+
if [ ! -z "$connection" ]
168+
then
169+
echo "ETR CTI config for $i"
170+
etr_cti_config
171+
fi
172+
173+
cd ..
174+
done
175+
176+
Note: CTI connections are SOC specific and hence the above script is
177+
added just for reference.
178+
179+
4. Choose reserved buffer mode for ETR buffer::
180+
181+
#echo "resrv" > /sys/bus/coresight/devices/tmc_etr0/buf_mode_preferred
182+
183+
5. Enable stop on flush trigger configuration::
184+
185+
#echo 1 > /sys/bus/coresight/devices/tmc_etr0/stop_on_flush
186+
187+
6. Start Coresight tracing on cores 1 and 2 using sysfs interface
188+
189+
7. Run some application on core 1::
190+
191+
#taskset -c 1 dd if=/dev/urandom of=/dev/null &
192+
193+
8. Invoke kernel panic on core 2::
194+
195+
#echo 1 > /proc/sys/kernel/panic
196+
#taskset -c 2 echo c > /proc/sysrq-trigger
197+
198+
9. From rebooted kernel or crashdump kernel, read crashdata::
199+
200+
#dd if=/dev/crash_tmc_etr0 of=/trace/cstrace.bin
201+
202+
10. Run opencsd decoder tools/scripts to generate the instruction trace.
203+
204+
Sample instruction trace dump
205+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
206+
207+
Core1 dump::
208+
209+
A etm4_enable_hw: ffff800008ae1dd4
210+
CONTEXT EL2 etm4_enable_hw: ffff800008ae1dd4
211+
I etm4_enable_hw: ffff800008ae1dd4:
212+
d503201f nop
213+
I etm4_enable_hw: ffff800008ae1dd8:
214+
d503201f nop
215+
I etm4_enable_hw: ffff800008ae1ddc:
216+
d503201f nop
217+
I etm4_enable_hw: ffff800008ae1de0:
218+
d503201f nop
219+
I etm4_enable_hw: ffff800008ae1de4:
220+
d503201f nop
221+
I etm4_enable_hw: ffff800008ae1de8:
222+
d503233f paciasp
223+
I etm4_enable_hw: ffff800008ae1dec:
224+
a9be7bfd stp x29, x30, [sp, #-32]!
225+
I etm4_enable_hw: ffff800008ae1df0:
226+
910003fd mov x29, sp
227+
I etm4_enable_hw: ffff800008ae1df4:
228+
a90153f3 stp x19, x20, [sp, #16]
229+
I etm4_enable_hw: ffff800008ae1df8:
230+
2a0003f4 mov w20, w0
231+
I etm4_enable_hw: ffff800008ae1dfc:
232+
900085b3 adrp x19, ffff800009b95000 <reserved_mem+0xc48>
233+
I etm4_enable_hw: ffff800008ae1e00:
234+
910f4273 add x19, x19, #0x3d0
235+
I etm4_enable_hw: ffff800008ae1e04:
236+
f8747a60 ldr x0, [x19, x20, lsl #3]
237+
E etm4_enable_hw: ffff800008ae1e08:
238+
b4000140 cbz x0, ffff800008ae1e30 <etm4_starting_cpu+0x50>
239+
I 149.039572921 etm4_enable_hw: ffff800008ae1e30:
240+
a94153f3 ldp x19, x20, [sp, #16]
241+
I 149.039572921 etm4_enable_hw: ffff800008ae1e34:
242+
52800000 mov w0, #0x0 // #0
243+
I 149.039572921 etm4_enable_hw: ffff800008ae1e38:
244+
a8c27bfd ldp x29, x30, [sp], #32
245+
246+
..snip
247+
248+
149.052324811 chacha_block_generic: ffff800008642d80:
249+
9100a3e0 add x0,
250+
I 149.052324811 chacha_block_generic: ffff800008642d84:
251+
b86178a2 ldr w2, [x5, x1, lsl #2]
252+
I 149.052324811 chacha_block_generic: ffff800008642d88:
253+
8b010803 add x3, x0, x1, lsl #2
254+
I 149.052324811 chacha_block_generic: ffff800008642d8c:
255+
b85fc063 ldur w3, [x3, #-4]
256+
I 149.052324811 chacha_block_generic: ffff800008642d90:
257+
0b030042 add w2, w2, w3
258+
I 149.052324811 chacha_block_generic: ffff800008642d94:
259+
b8217882 str w2, [x4, x1, lsl #2]
260+
I 149.052324811 chacha_block_generic: ffff800008642d98:
261+
91000421 add x1, x1, #0x1
262+
I 149.052324811 chacha_block_generic: ffff800008642d9c:
263+
f100443f cmp x1, #0x11
264+
265+
266+
Core 2 dump::
267+
268+
A etm4_enable_hw: ffff800008ae1dd4
269+
CONTEXT EL2 etm4_enable_hw: ffff800008ae1dd4
270+
I etm4_enable_hw: ffff800008ae1dd4:
271+
d503201f nop
272+
I etm4_enable_hw: ffff800008ae1dd8:
273+
d503201f nop
274+
I etm4_enable_hw: ffff800008ae1ddc:
275+
d503201f nop
276+
I etm4_enable_hw: ffff800008ae1de0:
277+
d503201f nop
278+
I etm4_enable_hw: ffff800008ae1de4:
279+
d503201f nop
280+
I etm4_enable_hw: ffff800008ae1de8:
281+
d503233f paciasp
282+
I etm4_enable_hw: ffff800008ae1dec:
283+
a9be7bfd stp x29, x30, [sp, #-32]!
284+
I etm4_enable_hw: ffff800008ae1df0:
285+
910003fd mov x29, sp
286+
I etm4_enable_hw: ffff800008ae1df4:
287+
a90153f3 stp x19, x20, [sp, #16]
288+
I etm4_enable_hw: ffff800008ae1df8:
289+
2a0003f4 mov w20, w0
290+
I etm4_enable_hw: ffff800008ae1dfc:
291+
900085b3 adrp x19, ffff800009b95000 <reserved_mem+0xc48>
292+
I etm4_enable_hw: ffff800008ae1e00:
293+
910f4273 add x19, x19, #0x3d0
294+
I etm4_enable_hw: ffff800008ae1e04:
295+
f8747a60 ldr x0, [x19, x20, lsl #3]
296+
E etm4_enable_hw: ffff800008ae1e08:
297+
b4000140 cbz x0, ffff800008ae1e30 <etm4_starting_cpu+0x50>
298+
I 149.046243445 etm4_enable_hw: ffff800008ae1e30:
299+
a94153f3 ldp x19, x20, [sp, #16]
300+
I 149.046243445 etm4_enable_hw: ffff800008ae1e34:
301+
52800000 mov w0, #0x0 // #0
302+
I 149.046243445 etm4_enable_hw: ffff800008ae1e38:
303+
a8c27bfd ldp x29, x30, [sp], #32
304+
I 149.046243445 etm4_enable_hw: ffff800008ae1e3c:
305+
d50323bf autiasp
306+
E 149.046243445 etm4_enable_hw: ffff800008ae1e40:
307+
d65f03c0 ret
308+
A ete_sysreg_write: ffff800008adfa18
309+
310+
..snip
311+
312+
I 149.05422547 panic: ffff800008096300:
313+
a90363f7 stp x23, x24, [sp, #48]
314+
I 149.05422547 panic: ffff800008096304:
315+
6b00003f cmp w1, w0
316+
I 149.05422547 panic: ffff800008096308:
317+
3a411804 ccmn w0, #0x1, #0x4, ne // ne = any
318+
N 149.05422547 panic: ffff80000809630c:
319+
540001e0 b.eq ffff800008096348 <panic+0xe0> // b.none
320+
I 149.05422547 panic: ffff800008096310:
321+
f90023f9 str x25, [sp, #64]
322+
E 149.05422547 panic: ffff800008096314:
323+
97fe44ef bl ffff8000080276d0 <panic_smp_self_stop>
324+
A panic: ffff80000809634c
325+
I 149.05422547 panic: ffff80000809634c:
326+
910102d5 add x21, x22, #0x40
327+
I 149.05422547 panic: ffff800008096350:
328+
52800020 mov w0, #0x1 // #1
329+
E 149.05422547 panic: ffff800008096354:
330+
94166b8b bl ffff800008631180 <bust_spinlocks>
331+
N 149.054225518 bust_spinlocks: ffff800008631180:
332+
340000c0 cbz w0, ffff800008631198 <bust_spinlocks+0x18>
333+
I 149.054225518 bust_spinlocks: ffff800008631184:
334+
f000a321 adrp x1, ffff800009a98000 <pbufs.0+0xbb8>
335+
I 149.054225518 bust_spinlocks: ffff800008631188:
336+
b9405c20 ldr w0, [x1, #92]
337+
I 149.054225518 bust_spinlocks: ffff80000863118c:
338+
11000400 add w0, w0, #0x1
339+
I 149.054225518 bust_spinlocks: ffff800008631190:
340+
b9005c20 str w0, [x1, #92]
341+
E 149.054225518 bust_spinlocks: ffff800008631194:
342+
d65f03c0 ret
343+
A panic: ffff800008096358
344+
345+
Perf based testing
346+
------------------
347+
348+
Starting perf session
349+
~~~~~~~~~~~~~~~~~~~~~
350+
ETF::
351+
352+
perf record -e cs_etm/panicstop,@tmc_etf1/ -C 1
353+
perf record -e cs_etm/panicstop,@tmc_etf2/ -C 2
354+
355+
ETR::
356+
357+
perf record -e cs_etm/panicstop,@tmc_etr0/ -C 1,2
358+
359+
Reading trace data after panic
360+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
361+
Same sysfs based method explained above can be used to retrieve and
362+
decode the trace data after the reboot on kernel panic.

0 commit comments

Comments
 (0)