|
| 1 | +--- |
| 2 | +title: Coredumps at Memfault Part 1 - Introduction to Linux Coredumps |
| 3 | +description: |
| 4 | + "The basics of Linux coredumps, how they're used at Memfault, and how they're |
| 5 | + captured." |
| 6 | +author: blake |
| 7 | +--- |
| 8 | + |
| 9 | +One of the core features of the Memfault Linux SDK is the ability to capture and |
| 10 | +analyze crashes. Since the inception of the SDK we've been slowly expanding our |
| 11 | +crash capture and analysis capabilities. Starting from the standard ELF |
| 12 | +coredump, we've added support for capturing only the stack memory, and even |
| 13 | +capturing just the stack trace with no registers and locals present. This |
| 14 | +article series will give you a high level overview of that journey, and give you |
| 15 | +a deeper understanding of how coredumps work on Linux.\*\*\*\* |
| 16 | + |
| 17 | +<!-- excerpt start --> |
| 18 | + |
| 19 | +In this article we'll start by taking a look at how a Linux coredump is |
| 20 | +formatted, how you capture them, and how we use them at Memfault. |
| 21 | + |
| 22 | +<!-- excerpt end --> |
| 23 | + |
| 24 | +{% include newsletter.html %} |
| 25 | + |
| 26 | +{% include toc.html %} |
| 27 | + |
| 28 | +## What is a Linux Coredump |
| 29 | + |
| 30 | +A linux coredump represents a snapshot of the crashing process' memory. It can |
| 31 | +be loaded into programs like GDB to inspect the state of the process at the time |
| 32 | +of crash. It is written as an ELF[^elf_format] file. The entirety of the ELF |
| 33 | +format is outside the scope of this article, but we will touch on a few of the |
| 34 | +more important bits when looking at an ELF core file. |
| 35 | + |
| 36 | +## What triggers a cordump |
| 37 | + |
| 38 | +Coredumps are triggered by certain signals generated by or sent to a program. |
| 39 | +The full list of signals can be found in the signal man page[^man_signal]. Here |
| 40 | +are the signals that cause a coredump: |
| 41 | + |
| 42 | +- SIGABRT: Abnormal termination of the program, such as a call to abort. |
| 43 | +- SIGBUS: Bus error (bad memory access). |
| 44 | +- SIGFPE: Floating-point exception. |
| 45 | +- SIGILL: Illegal instruction. |
| 46 | +- SIGQUIT: Quit from keyboard. |
| 47 | +- SIGSEGV: Invalid memory reference. |
| 48 | +- SIGSYS: Bad system call. |
| 49 | +- SIGTRAP: Trace/breakpoint trap. |
| 50 | + |
| 51 | +Of these the most common culprits you'll likely see are `SIGSEGV`, `SIGBUS`, and |
| 52 | +`SIGABRT`. These are signals that will be generated when a program tries to |
| 53 | +access memory that it doesn't have access to, tries to dereference a null |
| 54 | +pointer, or when the program calls `abort`. These typically indicate a fairly |
| 55 | +serious bug in either your program, or the libraries that it uses. |
| 56 | + |
| 57 | +Coredumps are very useful in these situations, as generally you're going to want |
| 58 | +to inspect the running state of the process a the time of crash. From the |
| 59 | +coredump you can get a backtrace of the crashing thread, the values of the |
| 60 | +registers at the time of crash, and the values of the local variables at each |
| 61 | +frame of the backtrace. |
| 62 | + |
| 63 | +## How are coredumps enabled/collected |
| 64 | + |
| 65 | +Enabling coredumps on your Linux device requires a few configuration options. To |
| 66 | +start with you'll need the following options enabled on your kernel at a |
| 67 | +minimum: |
| 68 | + |
| 69 | +```c |
| 70 | +CONFIG_COREDUMP=y |
| 71 | +CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS=y |
| 72 | +``` |
| 73 | + |
| 74 | +These settings will enable the kernel to generate coredumps, as well as set the |
| 75 | +default mappings that are present in the coredump. `man core`[^man_core] |
| 76 | +provides a good overview of the options available to you when configuring |
| 77 | +coredumps. |
| 78 | + |
| 79 | +### core_pattern |
| 80 | + |
| 81 | +The kernel provides an interface for controlling where and how coredumps are |
| 82 | +written. The `/proc/sys/kernel/core_pattern`[^man_core] file provides two |
| 83 | +methods for capturing coredumps from crashed processes. A coredump can be |
| 84 | +written directly to a file by providing a path directly to it. For example if we |
| 85 | +wanted to write the core file to our `/tmp` directory with both the process name |
| 86 | +and the pid we would write the following to `/proc/sys/kernel/core_pattern`. |
| 87 | + |
| 88 | +```bash |
| 89 | +/tmp/core.%e.%p |
| 90 | +``` |
| 91 | + |
| 92 | +In this example `%e` expands to the name of the crashing process, and `%p` |
| 93 | +expands to the PID of the crashing process. More information on the available |
| 94 | +expansions can be found in the `man core`[^man_core] page. |
| 95 | + |
| 96 | +We can also pipe a coredump directly to a program. This is useful when we want |
| 97 | +to modify the coredump in flight. The coredump is streamed to the provided |
| 98 | +program via `stdin`. The configuration is similar to saving directly to a file |
| 99 | +except the first character must be a `|`. This is how we capture coredumps in |
| 100 | +the Memfault SDK, and will be covered more in depth later in the article. |
| 101 | + |
| 102 | +#### `procfs` Shallow Dive |
| 103 | + |
| 104 | +An additional benefit to the `core_pattern` pipe interface is that until the |
| 105 | +program that is being piped to exits, we have access to the `procfs` of the |
| 106 | +crashing process. But what is `procfs`, and how does it help us with a coredump? |
| 107 | + |
| 108 | +`procfs` gives us direct, usually read-only, access to some of the kernel's data |
| 109 | +structures[^man_proc]. This can be system wide information, or information about |
| 110 | +individual processes. For our purposes we are interested mostly in the |
| 111 | +information about the process that is currently crashing. We can get direct read |
| 112 | +only access to all mapped memory by address through |
| 113 | +`/proc/<pid>/mem`[^man_proc_pid_mem], or look at the command line arguments of |
| 114 | +the process through `/proc/<pid>/cmdline`[^man_proc_pid_cmdline]. |
| 115 | + |
| 116 | +## Elf Core File Layout |
| 117 | + |
| 118 | +Linux coredumps use a subset of the ELF format. The coredump itself is a |
| 119 | +snapshot of the crashing process' memory, as well as some metadata to help |
| 120 | +debuggers understand the state of the process at the time of crash. We will |
| 121 | +touch on the most important aspects of the core file in this article. We will |
| 122 | +not be doing an exhaustive dive into the ELF format, however, if you are |
| 123 | +interested in learning more about the ELF format, the ELF File |
| 124 | +Format[^elf_format] is a great resource. |
| 125 | + |
| 126 | + |
| 127 | + |
| 128 | +### ELF Header |
| 129 | + |
| 130 | +The above image gives us a very high level view of the layout of a coredump. To |
| 131 | +start, the ELF header outlines the layout of the file and source of the file. We |
| 132 | +can see if the producing system was 32-bit or 64-bit, little or big endian, and |
| 133 | +the architecture of the system. Additionally it shows the offset to the program |
| 134 | +headers. Here is the layout of the ELF header[^elf_format]: |
| 135 | + |
| 136 | +```c |
| 137 | +typedef struct { |
| 138 | + unsigned char e_ident[EI_NIDENT]; |
| 139 | + Elf32_Half e_type; |
| 140 | + Elf32_Half e_machine; |
| 141 | + Elf32_Word e_version; |
| 142 | + Elf32_Addr e_entry; |
| 143 | + Elf32_Off e_phoff; |
| 144 | + Elf32_Off e_shoff; |
| 145 | + Elf32_Word e_flags; |
| 146 | + Elf32_Half e_ehsize; |
| 147 | + Elf32_Half e_phentsize; |
| 148 | + Elf32_Half e_phnum; |
| 149 | + Elf32_Half e_shentsize; |
| 150 | + Elf32_Half e_shnum; |
| 151 | + Elf32_Half e_shstrndx; |
| 152 | +} Elf32_Ehdr; |
| 153 | +``` |
| 154 | + |
| 155 | +There is a lot going on here, but the fields that are most important to our |
| 156 | +discussion are broken down below: |
| 157 | + |
| 158 | +- `e_ident`: This field is an array of bytes that identify the file as an ELF |
| 159 | + file. |
| 160 | +- `e_type`: This field tells us what type of file we are looking at. For our |
| 161 | + purposes this will always be `ET_CORE`. |
| 162 | +- `e_machine`: This field tells us the architecture of the system that produced |
| 163 | + the file. Common values here are |
| 164 | + [`EM_ARM`](https://github.com/torvalds/linux/blob/c45323b7560ec87c37c729b703c86ee65f136d75/include/uapi/linux/elf-em.h#L26) |
| 165 | + for 32 bit ARM, and |
| 166 | + [`EM_AARCH64`](https://github.com/torvalds/linux/blob/c45323b7560ec87c37c729b703c86ee65f136d75/include/uapi/linux/elf-em.h#L46) |
| 167 | + for aarch64. |
| 168 | +- `e_phoff`: This field tells us the offset to the program headers. |
| 169 | +- `e_phentsize`: This field tells us the size of each program header. |
| 170 | + |
| 171 | +### Program Headers and Segments |
| 172 | + |
| 173 | +The meat of our coredump exists in the program headers. There are a wide variety |
| 174 | +of program header types defined in the Elf File Format[^elf_format]. From the |
| 175 | +perspective of the coredump, however, we are primarily interested in the |
| 176 | +`PT_NOTE` and `PT_LOAD` program headers. |
| 177 | + |
| 178 | +Program headers have the following layout[^elf_format]: |
| 179 | + |
| 180 | +```c |
| 181 | +typedef struct { |
| 182 | + Elf32_Word p_type; |
| 183 | + Elf32_Off p_offset; |
| 184 | + Elf32_Addr p_vaddr; |
| 185 | + Elf32_Addr p_paddr; |
| 186 | + Elf32_Word p_filesz; |
| 187 | + Elf32_Word p_memsz; |
| 188 | + Elf32_Word p_flags; |
| 189 | + Elf32_Word p_align; |
| 190 | +} Elf32_Phdr; |
| 191 | +``` |
| 192 | + |
| 193 | +Here is a brief breakdown of the fields we care about in the program header: |
| 194 | + |
| 195 | +- `p_type`: This field tells us what type of segment we are looking at. For our |
| 196 | + purposes this will be either `PT_NOTE` or `PT_LOAD`. |
| 197 | +- `p_offset`: This field tells us the offset from the beginning of the file |
| 198 | + where the segment starts. |
| 199 | +- `p_vaddr`: This field tells us the virtual address where the segment is |
| 200 | + loaded. |
| 201 | +- `p_paddr`: This field tells us the physical address where the segment is |
| 202 | + loaded. |
| 203 | +- `p_filesz`: This field tells us the size of the segment in the file. |
| 204 | +- `p_memsz`: This field tells us the size of the segment in memory. |
| 205 | +- `p_align`: This field tells us the alignment of the segment. |
| 206 | + |
| 207 | +We'll start by taking a look at the format of the `PT_NOTE` segments. Below is |
| 208 | +the layout of a `PT_NOTE` segment. |
| 209 | + |
| 210 | + |
| 211 | + |
| 212 | +The first two fields of the segment are fairly self explanatory, they represent |
| 213 | +the size of both the name and the descriptor. The `name` field is a string that |
| 214 | +represents the type of note. The `desc` field is a structure that contains the |
| 215 | +actual data of the note. The `type` field tells us what type of note we are |
| 216 | +looking at. It is an unsigned integer that represents the type of note. It's |
| 217 | +worth noting that the `name` field works as a kind of namespace for the type |
| 218 | +field. Two notes with the same type field can be differentiated by their name |
| 219 | +field. |
| 220 | + |
| 221 | +The `PT_LOAD` segment is a bit more straightforward. This represents a segment |
| 222 | +of memory that was loaded into the process at the time of crash. These can |
| 223 | +represent either the stack, heap, or any other segment of memory that was loaded |
| 224 | +into the process. |
| 225 | + |
| 226 | +## Coredumps at Memfault: Rev. 1 |
| 227 | + |
| 228 | +Our first crack at coredumps at Memfault had one goal: leveraging existing tools |
| 229 | +to capture info about a crashing process. To have feature parity with our |
| 230 | +offering on MCU and Android, we needed a few basic things: |
| 231 | + |
| 232 | +- A symbolicated backtrace for each running thread in the crashing process |
| 233 | +- The values of registers at the time of crash |
| 234 | +- Symbolicated local variables at each frame |
| 235 | + |
| 236 | +Based on what we've learned about Linux core files so far, they are an obvious |
| 237 | +fit for these requirements. We can use an established system to route |
| 238 | +information about crashed processes, add metadata that helps gives us |
| 239 | +information the device in question, and do all of this without making any source |
| 240 | +modifications to anything running on the system. For this reason our first pass |
| 241 | +at coredumps leave them largely untouched from what the kernel provides. The |
| 242 | +only addition is a note that contains the metadata we use to identify devices |
| 243 | +and the version of software they're running on. This takes advantage of the fact |
| 244 | +that the `PT_NOTE` segment is a free form segment that can be used to add any |
| 245 | +metadata we want to the coredump. |
| 246 | + |
| 247 | +This allows us to gather additional information about the process that crashed, |
| 248 | +and more easily stream memory to avoid unnecessary allocations or memory usage. |
| 249 | + |
| 250 | +Now that we've covered all the background information we can start to dive into |
| 251 | +the innards of the `memfault-core-handler`. First we use the pipe operation that |
| 252 | +was outlined earlier. |
| 253 | +[Here](https://github.com/memfault/memfault-linux-sdk/blob/49adfe0ce0cb6082360012b0f0092a31e8030048/meta-memfault/recipes-memfault/memfaultd/files/memfaultd/src/coredump/mod.rs#L14) |
| 254 | +is the pattern we write to `/proc/sys/kernel/core_pattern` to pipe the coredump |
| 255 | +to our handler: |
| 256 | + |
| 257 | +```bash |
| 258 | +|/usr/sbin/memfault-core-handler -c /path/to/config %P %e %I %s |
| 259 | +``` |
| 260 | + |
| 261 | +This tells the kernel to pipe the coredump to our handler, and provides the |
| 262 | +handler with the PID of the crashing process (`%P`), the name of the crashing |
| 263 | +process (%e), the UID of the crashing process (`%I`), and the signal that caused |
| 264 | +the crash (`%s`). |
| 265 | + |
| 266 | +When a crash occurs the kernel will write the coredump to the `stdin` of the |
| 267 | +handler. The handler will then read all the program headers into memory. This |
| 268 | +sets us up to do two things. First we'll read all of the `PT_NOTE` segments and |
| 269 | +save them in memory. For the first iteration of the handler, we won't do |
| 270 | +anything further with them until we write them to a file. They'll become more |
| 271 | +important in later articles as we get into more of the special sauce of the |
| 272 | +handler. |
| 273 | + |
| 274 | +The next thing the handler does is read all of the memory ranges for each |
| 275 | +`PT_LOAD` segment in the coredump. Instead of storing this in memory we'll |
| 276 | +stream it directly to the output file from `/proc/<pid>/mem`. This is done to |
| 277 | +reduce the memory footprint of the handler, and prevent any issues where we |
| 278 | +would potentially need to seek backwards in the stream. As mentioned before, |
| 279 | +`stdin` is a one way stream, and we can't seek backwards in it. |
| 280 | + |
| 281 | +After we've written all of the `PT_LOAD` segments to the output file we should |
| 282 | +have an ELF coredump that is largely the same as what the kernel would have |
| 283 | +written. The only difference is that we've added a note to the coredump, the |
| 284 | +contents of which we won't cover in this article, as it's not particularly |
| 285 | +interesting. |
| 286 | + |
| 287 | +Let's take a quick visual look at everything we've accomplished by annotating |
| 288 | +our previous ELF layout diagram with the changes we've made. |
| 289 | + |
| 290 | + |
| 291 | + |
| 292 | +And there we have it! We've copied our coredump over from `stdin` with a few |
| 293 | +minor changes. Now you're probably wondering, why did we go through all of this |
| 294 | +trouble to end up with a file that's largely the same as what the kernel would |
| 295 | +have produced? Well for one it allows us to add metadata to the coredump, but it |
| 296 | +also sets the stage for more advanced coredump handling in the future that we'll |
| 297 | +cover in the the next article. |
| 298 | + |
| 299 | +## Conclusion |
| 300 | + |
| 301 | +We've covered the basics of coredumps on Linux, and how they're used in the |
| 302 | +Memfault SDK. You should now have a pretty good idea of how things look under |
| 303 | +the hood. While the baseline coredumps are useful, and a known commodity, there |
| 304 | +are a few things that aren't great about them. The biggest issue is that they |
| 305 | +can be quite large for processes that have many threads, or do a large amount of |
| 306 | +memory allocation. This can be a large problem for embedded devices that may not |
| 307 | +have a lot of room to store large files. In the next article we'll take a look |
| 308 | +at the steps we've taken to reduce the size of coredumps. |
| 309 | + |
| 310 | +In the meantime, if you'd like to poke around the source code for the coredump |
| 311 | +handler you can find it |
| 312 | +[here](https://github.com/memfault/memfaultd/tree/main/memfaultd/src/cli/memfault_core_handler). |
| 313 | + |
| 314 | +<!-- Interrupt Keep START --> |
| 315 | + |
| 316 | +{% include newsletter.html %} |
| 317 | + |
| 318 | +{% include submit-pr.html %} |
| 319 | + |
| 320 | +<!-- Interrupt Keep END --> |
| 321 | + |
| 322 | +{:.no_toc} |
| 323 | + |
| 324 | +## References |
| 325 | + |
| 326 | +<!-- prettier-ignore-start --> |
| 327 | +[^elf_format]: [ELF File Format](https://refspecs.linuxfoundation.org/elf/elf.pdf) |
| 328 | +[^man_core]: [`man core`](https://man7.org/linux/man-pages/man5/core.5.html) |
| 329 | +[^man_proc]: [`man proc`](https://man7.org/linux/man-pages/man5/procfs.5.html) |
| 330 | +[^man_proc_pid_mem]: [`man proc_pid_mem`](https://man7.org/linux/man-pages/man5/proc_pid_mem.5.html) |
| 331 | +[^man_proc_pid_cmdline]: [`man proc_pid_cmdline`](https://man7.org/linux/man-pages/man5/proc_pid_cmdline.5.html) |
| 332 | +[^man_ulimit]: [`man ulimit`](https://man7.org/linux/man-pages/man3/ulimit.3.html) |
| 333 | +[^man_signal]: [`man signal`](https://www.man7.org/linux/man-pages/man7/signal.7.html) |
| 334 | +<!-- prettier-ignore-end --> |
0 commit comments