|
| 1 | +=================================================== |
| 2 | +Using Coresight for Kernel panic and Watchdog reset |
| 3 | +=================================================== |
| 4 | + |
| 5 | +Introduction |
| 6 | +------------ |
| 7 | +This documentation is about using Linux coresight trace support to |
| 8 | +debug kernel panic and watchdog reset scenarios. |
| 9 | + |
| 10 | +Coresight trace during Kernel panic |
| 11 | +----------------------------------- |
| 12 | +From the coresight driver point of view, addressing the kernel panic |
| 13 | +situation has four main requirements. |
| 14 | + |
| 15 | +a. Support for allocation of trace buffer pages from reserved memory area. |
| 16 | + Platform can advertise this using a new device tree property added to |
| 17 | + relevant coresight nodes. |
| 18 | + |
| 19 | +b. Support for stopping coresight blocks at the time of panic |
| 20 | + |
| 21 | +c. Saving required metadata in the specified format |
| 22 | + |
| 23 | +d. Support for reading trace data captured at the time of panic |
| 24 | + |
| 25 | +Allocation of trace buffer pages from reserved RAM |
| 26 | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 27 | +A new optional device tree property "memory-region" is added to the |
| 28 | +Coresight TMC device nodes, that would give the base address and size of trace |
| 29 | +buffer. |
| 30 | + |
| 31 | +Static allocation of trace buffers would ensure that both IOMMU enabled |
| 32 | +and disabled cases are handled. Also, platforms that support persistent |
| 33 | +RAM will allow users to read trace data in the subsequent boot without |
| 34 | +booting the crashdump kernel. |
| 35 | + |
| 36 | +Note: |
| 37 | +For ETR sink devices, this reserved region will be used for both trace |
| 38 | +capture and trace data retrieval. |
| 39 | +For ETF sink devices, internal SRAM would be used for trace capture, |
| 40 | +and they would be synced to reserved region for retrieval. |
| 41 | + |
| 42 | + |
| 43 | +Disabling coresight blocks at the time of panic |
| 44 | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 45 | +In order to avoid the situation of losing relevant trace data after a |
| 46 | +kernel panic, it would be desirable to stop the coresight blocks at the |
| 47 | +time of panic. |
| 48 | + |
| 49 | +This can be achieved by configuring the comparator, CTI and sink |
| 50 | +devices as below:: |
| 51 | + |
| 52 | + Trigger on panic |
| 53 | + Comparator --->External out --->CTI -->External In---->ETR/ETF stop |
| 54 | + |
| 55 | +Saving metadata at the time of kernel panic |
| 56 | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 57 | +Coresight metadata involves all additional data that are required for a |
| 58 | +successful trace decode in addition to the trace data. This involves |
| 59 | +ETR/ETF/ETB register snapshot etc. |
| 60 | + |
| 61 | +A new optional device property "memory-region" is added to |
| 62 | +the ETR/ETF/ETB device nodes for this. |
| 63 | + |
| 64 | +Reading trace data captured at the time of panic |
| 65 | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 66 | +Trace data captured at the time of panic, can be read from rebooted kernel |
| 67 | +or from crashdump kernel using a special device file /dev/crash_tmc_xxx. |
| 68 | +This device file is created only when there is a valid crashdata available. |
| 69 | + |
| 70 | +General flow of trace capture and decode incase of kernel panic |
| 71 | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 72 | +1. Enable source and sink on all the cores using the sysfs interface. |
| 73 | + ETR sinks should have trace buffers allocated from reserved memory, |
| 74 | + by selecting "resrv" buffer mode from sysfs. |
| 75 | + |
| 76 | +2. Run relevant tests. |
| 77 | + |
| 78 | +3. On a kernel panic, all coresight blocks are disabled, necessary |
| 79 | + metadata is synced by kernel panic handler. |
| 80 | + |
| 81 | + System would eventually reboot or boot a crashdump kernel. |
| 82 | + |
| 83 | +4. For platforms that supports crashdump kernel, raw trace data can be |
| 84 | + dumped using the coresight sysfs interface from the crashdump kernel |
| 85 | + itself. Persistent RAM is not a requirement in this case. |
| 86 | + |
| 87 | +5. For platforms that supports persistent RAM, trace data can be dumped |
| 88 | + using the coresight sysfs interface in the subsequent Linux boot. |
| 89 | + Crashdump kernel is not a requirement in this case. Persistent RAM |
| 90 | + ensures that trace data is intact across reboot. |
| 91 | + |
| 92 | +Coresight trace during Watchdog reset |
| 93 | +------------------------------------- |
| 94 | +The main difference between addressing the watchdog reset and kernel panic |
| 95 | +case are below, |
| 96 | + |
| 97 | +a. Saving coresight metadata need to be taken care by the |
| 98 | + SCP(system control processor) firmware in the specified format, |
| 99 | + instead of kernel. |
| 100 | + |
| 101 | +b. Reserved memory region given by firmware for trace buffer and metadata |
| 102 | + has to be in persistent RAM. |
| 103 | + Note: This is a requirement for watchdog reset case but optional |
| 104 | + in kernel panic case. |
| 105 | + |
| 106 | +Watchdog reset can be supported only on platforms that meet the above |
| 107 | +two requirements. |
| 108 | + |
| 109 | +Sample commands for testing a Kernel panic case with ETR sink |
| 110 | +------------------------------------------------------------- |
| 111 | + |
| 112 | +1. Boot Linux kernel with "crash_kexec_post_notifiers" added to the kernel |
| 113 | + bootargs. This is mandatory if the user would like to read the tracedata |
| 114 | + from the crashdump kernel. |
| 115 | + |
| 116 | +2. Enable the preloaded ETM configuration:: |
| 117 | + |
| 118 | + #echo 1 > /sys/kernel/config/cs-syscfg/configurations/panicstop/enable |
| 119 | + |
| 120 | +3. Configure CTI using sysfs interface:: |
| 121 | + |
| 122 | + #./cti_setup.sh |
| 123 | + |
| 124 | + #cat cti_setup.sh |
| 125 | + |
| 126 | + |
| 127 | + cd /sys/bus/coresight/devices/ |
| 128 | + |
| 129 | + ap_cti_config () { |
| 130 | + #ETM trig out[0] trigger to Channel 0 |
| 131 | + echo 0 4 > channels/trigin_attach |
| 132 | + } |
| 133 | + |
| 134 | + etf_cti_config () { |
| 135 | + #ETF Flush in trigger from Channel 0 |
| 136 | + echo 0 1 > channels/trigout_attach |
| 137 | + echo 1 > channels/trig_filter_enable |
| 138 | + } |
| 139 | + |
| 140 | + etr_cti_config () { |
| 141 | + #ETR Flush in from Channel 0 |
| 142 | + echo 0 1 > channels/trigout_attach |
| 143 | + echo 1 > channels/trig_filter_enable |
| 144 | + } |
| 145 | + |
| 146 | + ctidevs=`find . -name "cti*"` |
| 147 | + |
| 148 | + for i in $ctidevs |
| 149 | + do |
| 150 | + cd $i |
| 151 | + |
| 152 | + connection=`find . -name "ete*"` |
| 153 | + if [ ! -z "$connection" ] |
| 154 | + then |
| 155 | + echo "AP CTI config for $i" |
| 156 | + ap_cti_config |
| 157 | + fi |
| 158 | + |
| 159 | + connection=`find . -name "tmc_etf*"` |
| 160 | + if [ ! -z "$connection" ] |
| 161 | + then |
| 162 | + echo "ETF CTI config for $i" |
| 163 | + etf_cti_config |
| 164 | + fi |
| 165 | + |
| 166 | + connection=`find . -name "tmc_etr*"` |
| 167 | + if [ ! -z "$connection" ] |
| 168 | + then |
| 169 | + echo "ETR CTI config for $i" |
| 170 | + etr_cti_config |
| 171 | + fi |
| 172 | + |
| 173 | + cd .. |
| 174 | + done |
| 175 | + |
| 176 | +Note: CTI connections are SOC specific and hence the above script is |
| 177 | +added just for reference. |
| 178 | + |
| 179 | +4. Choose reserved buffer mode for ETR buffer:: |
| 180 | + |
| 181 | + #echo "resrv" > /sys/bus/coresight/devices/tmc_etr0/buf_mode_preferred |
| 182 | + |
| 183 | +5. Enable stop on flush trigger configuration:: |
| 184 | + |
| 185 | + #echo 1 > /sys/bus/coresight/devices/tmc_etr0/stop_on_flush |
| 186 | + |
| 187 | +6. Start Coresight tracing on cores 1 and 2 using sysfs interface |
| 188 | + |
| 189 | +7. Run some application on core 1:: |
| 190 | + |
| 191 | + #taskset -c 1 dd if=/dev/urandom of=/dev/null & |
| 192 | + |
| 193 | +8. Invoke kernel panic on core 2:: |
| 194 | + |
| 195 | + #echo 1 > /proc/sys/kernel/panic |
| 196 | + #taskset -c 2 echo c > /proc/sysrq-trigger |
| 197 | + |
| 198 | +9. From rebooted kernel or crashdump kernel, read crashdata:: |
| 199 | + |
| 200 | + #dd if=/dev/crash_tmc_etr0 of=/trace/cstrace.bin |
| 201 | + |
| 202 | +10. Run opencsd decoder tools/scripts to generate the instruction trace. |
| 203 | + |
| 204 | +Sample instruction trace dump |
| 205 | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 206 | + |
| 207 | +Core1 dump:: |
| 208 | + |
| 209 | + A etm4_enable_hw: ffff800008ae1dd4 |
| 210 | + CONTEXT EL2 etm4_enable_hw: ffff800008ae1dd4 |
| 211 | + I etm4_enable_hw: ffff800008ae1dd4: |
| 212 | + d503201f nop |
| 213 | + I etm4_enable_hw: ffff800008ae1dd8: |
| 214 | + d503201f nop |
| 215 | + I etm4_enable_hw: ffff800008ae1ddc: |
| 216 | + d503201f nop |
| 217 | + I etm4_enable_hw: ffff800008ae1de0: |
| 218 | + d503201f nop |
| 219 | + I etm4_enable_hw: ffff800008ae1de4: |
| 220 | + d503201f nop |
| 221 | + I etm4_enable_hw: ffff800008ae1de8: |
| 222 | + d503233f paciasp |
| 223 | + I etm4_enable_hw: ffff800008ae1dec: |
| 224 | + a9be7bfd stp x29, x30, [sp, #-32]! |
| 225 | + I etm4_enable_hw: ffff800008ae1df0: |
| 226 | + 910003fd mov x29, sp |
| 227 | + I etm4_enable_hw: ffff800008ae1df4: |
| 228 | + a90153f3 stp x19, x20, [sp, #16] |
| 229 | + I etm4_enable_hw: ffff800008ae1df8: |
| 230 | + 2a0003f4 mov w20, w0 |
| 231 | + I etm4_enable_hw: ffff800008ae1dfc: |
| 232 | + 900085b3 adrp x19, ffff800009b95000 <reserved_mem+0xc48> |
| 233 | + I etm4_enable_hw: ffff800008ae1e00: |
| 234 | + 910f4273 add x19, x19, #0x3d0 |
| 235 | + I etm4_enable_hw: ffff800008ae1e04: |
| 236 | + f8747a60 ldr x0, [x19, x20, lsl #3] |
| 237 | + E etm4_enable_hw: ffff800008ae1e08: |
| 238 | + b4000140 cbz x0, ffff800008ae1e30 <etm4_starting_cpu+0x50> |
| 239 | + I 149.039572921 etm4_enable_hw: ffff800008ae1e30: |
| 240 | + a94153f3 ldp x19, x20, [sp, #16] |
| 241 | + I 149.039572921 etm4_enable_hw: ffff800008ae1e34: |
| 242 | + 52800000 mov w0, #0x0 // #0 |
| 243 | + I 149.039572921 etm4_enable_hw: ffff800008ae1e38: |
| 244 | + a8c27bfd ldp x29, x30, [sp], #32 |
| 245 | + |
| 246 | + ..snip |
| 247 | + |
| 248 | + 149.052324811 chacha_block_generic: ffff800008642d80: |
| 249 | + 9100a3e0 add x0, |
| 250 | + I 149.052324811 chacha_block_generic: ffff800008642d84: |
| 251 | + b86178a2 ldr w2, [x5, x1, lsl #2] |
| 252 | + I 149.052324811 chacha_block_generic: ffff800008642d88: |
| 253 | + 8b010803 add x3, x0, x1, lsl #2 |
| 254 | + I 149.052324811 chacha_block_generic: ffff800008642d8c: |
| 255 | + b85fc063 ldur w3, [x3, #-4] |
| 256 | + I 149.052324811 chacha_block_generic: ffff800008642d90: |
| 257 | + 0b030042 add w2, w2, w3 |
| 258 | + I 149.052324811 chacha_block_generic: ffff800008642d94: |
| 259 | + b8217882 str w2, [x4, x1, lsl #2] |
| 260 | + I 149.052324811 chacha_block_generic: ffff800008642d98: |
| 261 | + 91000421 add x1, x1, #0x1 |
| 262 | + I 149.052324811 chacha_block_generic: ffff800008642d9c: |
| 263 | + f100443f cmp x1, #0x11 |
| 264 | + |
| 265 | + |
| 266 | +Core 2 dump:: |
| 267 | + |
| 268 | + A etm4_enable_hw: ffff800008ae1dd4 |
| 269 | + CONTEXT EL2 etm4_enable_hw: ffff800008ae1dd4 |
| 270 | + I etm4_enable_hw: ffff800008ae1dd4: |
| 271 | + d503201f nop |
| 272 | + I etm4_enable_hw: ffff800008ae1dd8: |
| 273 | + d503201f nop |
| 274 | + I etm4_enable_hw: ffff800008ae1ddc: |
| 275 | + d503201f nop |
| 276 | + I etm4_enable_hw: ffff800008ae1de0: |
| 277 | + d503201f nop |
| 278 | + I etm4_enable_hw: ffff800008ae1de4: |
| 279 | + d503201f nop |
| 280 | + I etm4_enable_hw: ffff800008ae1de8: |
| 281 | + d503233f paciasp |
| 282 | + I etm4_enable_hw: ffff800008ae1dec: |
| 283 | + a9be7bfd stp x29, x30, [sp, #-32]! |
| 284 | + I etm4_enable_hw: ffff800008ae1df0: |
| 285 | + 910003fd mov x29, sp |
| 286 | + I etm4_enable_hw: ffff800008ae1df4: |
| 287 | + a90153f3 stp x19, x20, [sp, #16] |
| 288 | + I etm4_enable_hw: ffff800008ae1df8: |
| 289 | + 2a0003f4 mov w20, w0 |
| 290 | + I etm4_enable_hw: ffff800008ae1dfc: |
| 291 | + 900085b3 adrp x19, ffff800009b95000 <reserved_mem+0xc48> |
| 292 | + I etm4_enable_hw: ffff800008ae1e00: |
| 293 | + 910f4273 add x19, x19, #0x3d0 |
| 294 | + I etm4_enable_hw: ffff800008ae1e04: |
| 295 | + f8747a60 ldr x0, [x19, x20, lsl #3] |
| 296 | + E etm4_enable_hw: ffff800008ae1e08: |
| 297 | + b4000140 cbz x0, ffff800008ae1e30 <etm4_starting_cpu+0x50> |
| 298 | + I 149.046243445 etm4_enable_hw: ffff800008ae1e30: |
| 299 | + a94153f3 ldp x19, x20, [sp, #16] |
| 300 | + I 149.046243445 etm4_enable_hw: ffff800008ae1e34: |
| 301 | + 52800000 mov w0, #0x0 // #0 |
| 302 | + I 149.046243445 etm4_enable_hw: ffff800008ae1e38: |
| 303 | + a8c27bfd ldp x29, x30, [sp], #32 |
| 304 | + I 149.046243445 etm4_enable_hw: ffff800008ae1e3c: |
| 305 | + d50323bf autiasp |
| 306 | + E 149.046243445 etm4_enable_hw: ffff800008ae1e40: |
| 307 | + d65f03c0 ret |
| 308 | + A ete_sysreg_write: ffff800008adfa18 |
| 309 | + |
| 310 | + ..snip |
| 311 | + |
| 312 | + I 149.05422547 panic: ffff800008096300: |
| 313 | + a90363f7 stp x23, x24, [sp, #48] |
| 314 | + I 149.05422547 panic: ffff800008096304: |
| 315 | + 6b00003f cmp w1, w0 |
| 316 | + I 149.05422547 panic: ffff800008096308: |
| 317 | + 3a411804 ccmn w0, #0x1, #0x4, ne // ne = any |
| 318 | + N 149.05422547 panic: ffff80000809630c: |
| 319 | + 540001e0 b.eq ffff800008096348 <panic+0xe0> // b.none |
| 320 | + I 149.05422547 panic: ffff800008096310: |
| 321 | + f90023f9 str x25, [sp, #64] |
| 322 | + E 149.05422547 panic: ffff800008096314: |
| 323 | + 97fe44ef bl ffff8000080276d0 <panic_smp_self_stop> |
| 324 | + A panic: ffff80000809634c |
| 325 | + I 149.05422547 panic: ffff80000809634c: |
| 326 | + 910102d5 add x21, x22, #0x40 |
| 327 | + I 149.05422547 panic: ffff800008096350: |
| 328 | + 52800020 mov w0, #0x1 // #1 |
| 329 | + E 149.05422547 panic: ffff800008096354: |
| 330 | + 94166b8b bl ffff800008631180 <bust_spinlocks> |
| 331 | + N 149.054225518 bust_spinlocks: ffff800008631180: |
| 332 | + 340000c0 cbz w0, ffff800008631198 <bust_spinlocks+0x18> |
| 333 | + I 149.054225518 bust_spinlocks: ffff800008631184: |
| 334 | + f000a321 adrp x1, ffff800009a98000 <pbufs.0+0xbb8> |
| 335 | + I 149.054225518 bust_spinlocks: ffff800008631188: |
| 336 | + b9405c20 ldr w0, [x1, #92] |
| 337 | + I 149.054225518 bust_spinlocks: ffff80000863118c: |
| 338 | + 11000400 add w0, w0, #0x1 |
| 339 | + I 149.054225518 bust_spinlocks: ffff800008631190: |
| 340 | + b9005c20 str w0, [x1, #92] |
| 341 | + E 149.054225518 bust_spinlocks: ffff800008631194: |
| 342 | + d65f03c0 ret |
| 343 | + A panic: ffff800008096358 |
| 344 | + |
| 345 | +Perf based testing |
| 346 | +------------------ |
| 347 | + |
| 348 | +Starting perf session |
| 349 | +~~~~~~~~~~~~~~~~~~~~~ |
| 350 | +ETF:: |
| 351 | + |
| 352 | + perf record -e cs_etm/panicstop,@tmc_etf1/ -C 1 |
| 353 | + perf record -e cs_etm/panicstop,@tmc_etf2/ -C 2 |
| 354 | + |
| 355 | +ETR:: |
| 356 | + |
| 357 | + perf record -e cs_etm/panicstop,@tmc_etr0/ -C 1,2 |
| 358 | + |
| 359 | +Reading trace data after panic |
| 360 | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 361 | +Same sysfs based method explained above can be used to retrieve and |
| 362 | +decode the trace data after the reboot on kernel panic. |
0 commit comments