|
| 1 | +============================= |
| 2 | +Analyzing Cortex-M Hardfaults |
| 3 | +============================= |
| 4 | + |
| 5 | +.. epigraph:: |
| 6 | + |
| 7 | + > I have a build of PX4 (NuttX 6.29 with some patches) with new |
| 8 | + > lpc43xx chip files on 4337 chip running from FLASH (master |
| 9 | + > vanilla NuttX has no such problem). This gives me a hardfault |
| 10 | + > below if I stress NSH console (UART2) with some big output. |
| 11 | + > |
| 12 | + > I read some threads but can't get a clue how to analyze the |
| 13 | + > dump and where to look first: |
| 14 | + > |
| 15 | + > 1bXXX and 1aXXX addresses are FLASH. 100XXX addresses are RAM |
| 16 | + |
| 17 | +.. code-block:: console |
| 18 | +
|
| 19 | + Assertion failed at file:armv7-m/up_hardfault.c line: 184 task: hpwork |
| 20 | + sp: 10001eb4 |
| 21 | + IRQ stack: |
| 22 | + base: 10001f00 |
| 23 | + size: 000003fc |
| 24 | + 10001ea0: 1b02d961 1b03f07e 10001eb4 10005ed8 1a0312ab 1b03f600 000000b8 1b02d961 |
| 25 | + 10001ec0: 00000010 10001f40 00000003 00000000 1a03721d 1a037209 1b02d93b 00000000 |
| 26 | + 10001ee0: 1a0371f5 00000000 00000000 00000000 00000000 00000000 1a0314a5 10005d7c |
| 27 | + sp: 10005e50 |
| 28 | + User stack: |
| 29 | + base: 10005ed8 |
| 30 | + size: 00000f9c |
| 31 | + 10005e40: 00000000 00000000 00000000 1b02d587 10004900 00000000 005b8d7f 00000000 |
| 32 | + 10005e60: 1a030f2e 00000000 00000000 00001388 00000000 00000005 10001994 00000000 |
| 33 | + 10005e80: 00000000 00000000 00000000 1b02c359 00000000 00000000 00000000 004c4b40 |
| 34 | + 10005ea0: 000002ff 00000000 00000000 1a030f2f 00000000 00000000 00000000 00000000 |
| 35 | + 10005ec0: 00000000 1a030f41 00000000 1b02c2a5 00000000 00000000 ffffffff 00bdeb39 |
| 36 | + R0: ffffffff 00000000 00000016 00000000 00000000 00000000 00000000 00000000 |
| 37 | + R8: 100036d8 00000000 00000000 004c4b40 10001370 10005e50 1b02b20b 1b02d596 |
| 38 | + xPSR: 41000000 BASEPRI: 00000000 CONTROL: 00000000 |
| 39 | + EXC_RETURN: ffffffe9 |
| 40 | +
|
| 41 | +This question was asked in the old Yahoo! Group for NuttX, before the |
| 42 | +project joined the Apache Software Foundation. The old forum no longer |
| 43 | +exists, but the thread has been archived at |
| 44 | +`Narkive <https://nuttx.yahoogroups.narkive.com/QNbG3r5l/hardfault-help-analysing-where-to-start>`_ |
| 45 | +(third party external link). |
| 46 | + |
| 47 | +Analyzing the Register Dump |
| 48 | +=========================== |
| 49 | + |
| 50 | +First, in the register dump: |
| 51 | + |
| 52 | +.. code-block:: console |
| 53 | +
|
| 54 | + R0: ffffffff 00000000 00000016 00000000 00000000 00000000 00000000 00000000 |
| 55 | + R8: 100036d8 00000000 00000000 004c4b40 10001370 10005e50 1b02b20b 1b02d596 |
| 56 | + xPSR: 41000000 BASEPRI: 00000000 CONTROL: 00000000 |
| 57 | +
|
| 58 | +``R15`` is the PC at the time of the crash (``1b02d596``). In order to |
| 59 | +see where this is, I do this: |
| 60 | + |
| 61 | +.. code-block:: console |
| 62 | +
|
| 63 | + arm-none-eabi-objdump -d nuttx | vi - |
| 64 | +
|
| 65 | +Of course, you can use any editor you prefer. In any case, this will |
| 66 | +provide a full assembly language listing of your FLASH content along |
| 67 | +with complete symbolic information. |
| 68 | + |
| 69 | +**TIP:** Not comfortable with ARM assembly language? Try the |
| 70 | +``objdump --source`` (or just ``-S``) option. That will intermix the C |
| 71 | +and the assembly language code so that you can see which C statements |
| 72 | +the assembly language is implementing. |
| 73 | + |
| 74 | +Once you have the FLASH image in the editor, it is then a simple thing |
| 75 | +to do the search in order to find the instruction at ``1b02d596``. The |
| 76 | +symbolic information will show you exactly which function the address |
| 77 | +is in and also the context of the instruction that can be used to |
| 78 | +associate it to the exact line of code in the original C source file. |
| 79 | + |
| 80 | +You also have all of the register contents so it is pretty easy to see |
| 81 | +what happened (assuming you have some basic knowledge of Thumb2 |
| 82 | +assembly language and the ARM EABI). But it is usually not so easy to |
| 83 | +see why it happened. |
| 84 | + |
| 85 | +The rest of the instructions apply to finding out why the fault |
| 86 | +happened. |
| 87 | + |
| 88 | +``R14`` often contains the return address to the caller of the |
| 89 | +offending functions. Bit one is set in this return address, but ignore |
| 90 | +that (I.e., use ``1b02b20a`` instead of ``1b02b20b``). Use the objdump |
| 91 | +command above to see where that is. |
| 92 | + |
| 93 | +Sometimes, however, ``R14`` is not the caller of the offending |
| 94 | +function. If the offending functions calls some other function then |
| 95 | +``R14`` will be overwritten. But no problem, it will also then have |
| 96 | +pushed the return address on the stack where we can find it by |
| 97 | +analyzing the stack dump. |
| 98 | + |
| 99 | +Analyzing the Stack Dump |
| 100 | +======================== |
| 101 | + |
| 102 | +The Task Stack |
| 103 | +-------------- |
| 104 | + |
| 105 | +To go further back in the time, you have to analyze the stack. It is a |
| 106 | +push down stack so older events are at higher stack addresses; the |
| 107 | +most recent things that happened will be at lower stack addresses. |
| 108 | + |
| 109 | +Analyzing the stack is done in basically the same way: |
| 110 | + |
| 111 | +1. Start at the highest stack addresses (oldest) and work forward in |
| 112 | + time (lower addresses) |
| 113 | + |
| 114 | +2. Find interesting addresses, |
| 115 | + |
| 116 | +3. Use ``arm-none-eabi-objdump`` to determine where those addresses |
| 117 | + are in the code. |
| 118 | + |
| 119 | +An interesting address has these properties: |
| 120 | + |
| 121 | +1. It lies in FLASH in your architecture. In your case these are the |
| 122 | + addresses that begin with ``0x1a`` and ``0x1b``. Other |
| 123 | + architectures may have different FLASH addresses or even addresses |
| 124 | + in RAM. |
| 125 | + |
| 126 | +2. The interesting addresses are all odd for Cortex-M, that is, bit 0 |
| 127 | + will be set. This is because as the code progresses, the return |
| 128 | + address (``R14``) will be pushed on the stack. All of the return |
| 129 | + addresses will lie in FLASH and will be odd. |
| 130 | + |
| 131 | +Even FLASH addresses in the stack dump usually are references to |
| 132 | +``.rodata`` in FLASH but are sometimes of interest as well. Below are |
| 133 | +examples of interesting addresses (in brackets): |
| 134 | + |
| 135 | +.. code-block:: console |
| 136 | +
|
| 137 | + sp: 10005e50 |
| 138 | + User stack: |
| 139 | + base: 10005ed8 |
| 140 | + size: 00000f9c |
| 141 | + 10005e40: 00000000 00000000 00000000 [1b02d587] 10004900 00000000 005b8d7f 00000000 |
| 142 | + 10005e60: 1a030f2e 00000000 00000000 00001388 00000000 00000005 10001994 00000000 |
| 143 | + 10005e80: 00000000 00000000 00000000 [1b02c359] 00000000 00000000 00000000 004c4b40 |
| 144 | + 10005ea0: 000002ff 00000000 00000000 [1a030f2f] 00000000 00000000 00000000 00000000 |
| 145 | + 10005ec0: 00000000 [1a030f41] 00000000 [1b02c2a5] 00000000 00000000 ffffffff 00bdeb39 |
| 146 | +
|
| 147 | +That will give the full backtrace up to the point of the failure. |
| 148 | + |
| 149 | +The Interrupt Stack |
| 150 | +------------------- |
| 151 | + |
| 152 | +Note that in some cases there are two stacks listed. The interrupt |
| 153 | +stack will be present if (1) the interrupt stack is enabled, and (2) |
| 154 | +you are in an interrupt handler at the time that the failure occurred: |
| 155 | + |
| 156 | +.. code-block:: console |
| 157 | +
|
| 158 | + Assertion failed at file:armv7-m/up_hardfault.c line: 184 task: hpwork |
| 159 | + sp: 10001eb4 |
| 160 | + IRQ stack: |
| 161 | + base: 10001f00 |
| 162 | + size: 000003fc |
| 163 | + 10001ea0: [1b02d961] 1b03f07e 10001eb4 10005ed8 1a0312ab 1b03f600 000000b8 [1b02d961] |
| 164 | + 10001ec0: 00000010 10001f40 00000003 00000000 [1a03721d] [1a037209] [1b02d93b] 00000000 |
| 165 | + 10001ee0: [1a0371f5] 00000000 00000000 00000000 00000000 00000000 [1a0314a5] 10005d7c |
| 166 | +
|
| 167 | +(Interesting addresses again in brackets). |
| 168 | + |
| 169 | +The interrupt stack is sometimes interesting, for example when the |
| 170 | +interrupt was caused by logic operating at the interrupt level. In |
| 171 | +this case, it is probably not so interesting since fault was probably |
| 172 | +caused by normal task code and the interrupt stack probably just shows |
| 173 | +the normal operation of the interrupt handling logic. |
| 174 | + |
| 175 | +Full Stack Analysis |
| 176 | +------------------- |
| 177 | + |
| 178 | +What I have proposed here is just skimming through the stack, finding |
| 179 | +and interpreting interesting addresses. Sometimes you need more |
| 180 | +information and you need to analyze the stack in more detail. That is |
| 181 | +also possible because every word on the stack is there because of an |
| 182 | +explicit push instruction in the code (usually a push instruction on |
| 183 | +Cortex-M or an stmdb instruction in other ARM architectures). This is |
| 184 | +painstaking work but can also be done to provide a more detailed |
| 185 | +answer to "what happened?" |
| 186 | + |
| 187 | +Recovering State at the Time of the Hardfault |
| 188 | +============================================= |
| 189 | + |
| 190 | +Here is another tip from Mike Smith: |
| 191 | + |
| 192 | +.. epigraph:: |
| 193 | + |
| 194 | + "... for systems like NuttX where catching hardfaults is difficult, |
| 195 | + you can recover the faulting PC, LR and SP (by examining the |
| 196 | + exception stack), then write these values back into the appropriate |
| 197 | + processor registers (adjust the PC as necessary for the fault). |
| 198 | + |
| 199 | + "This will put you back in the application code at the point at |
| 200 | + which the fault occurred. Some local variables will show as having |
| 201 | + invalid values (because at the time of the fault they were live in |
| 202 | + registers and have been overwritten by the exception handler), but |
| 203 | + the stack frame, function arguments etc. should all show correctly." |
0 commit comments