Skip to content

Commit 82a3948

Browse files
bahildebrandgminnfranc0is
authored
content: linux coredumps (#545)
### Summary Add an article covering the cause, layout, and use of Linux coredumps in `memfaultd`. --------- Co-authored-by: Gillian Minnehan <41022382+gminn@users.noreply.github.com> Co-authored-by: François Baldassari <franc0is@users.noreply.github.com>
1 parent c434575 commit 82a3948

File tree

4 files changed

+334
-0
lines changed

4 files changed

+334
-0
lines changed

_drafts/linux_coredump.md

Lines changed: 334 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,334 @@
1+
---
2+
title: Coredumps at Memfault Part 1 - Introduction to Linux Coredumps
3+
description:
4+
"The basics of Linux coredumps, how they're used at Memfault, and how they're
5+
captured."
6+
author: blake
7+
---
8+
9+
One of the core features of the Memfault Linux SDK is the ability to capture and
10+
analyze crashes. Since the inception of the SDK we've been slowly expanding our
11+
crash capture and analysis capabilities. Starting from the standard ELF
12+
coredump, we've added support for capturing only the stack memory, and even
13+
capturing just the stack trace with no registers and locals present. This
14+
article series will give you a high level overview of that journey, and give you
15+
a deeper understanding of how coredumps work on Linux.\*\*\*\*
16+
17+
<!-- excerpt start -->
18+
19+
In this article we'll start by taking a look at how a Linux coredump is
20+
formatted, how you capture them, and how we use them at Memfault.
21+
22+
<!-- excerpt end -->
23+
24+
{% include newsletter.html %}
25+
26+
{% include toc.html %}
27+
28+
## What is a Linux Coredump
29+
30+
A linux coredump represents a snapshot of the crashing process' memory. It can
31+
be loaded into programs like GDB to inspect the state of the process at the time
32+
of crash. It is written as an ELF[^elf_format] file. The entirety of the ELF
33+
format is outside the scope of this article, but we will touch on a few of the
34+
more important bits when looking at an ELF core file.
35+
36+
## What triggers a cordump
37+
38+
Coredumps are triggered by certain signals generated by or sent to a program.
39+
The full list of signals can be found in the signal man page[^man_signal]. Here
40+
are the signals that cause a coredump:
41+
42+
- SIGABRT: Abnormal termination of the program, such as a call to abort.
43+
- SIGBUS: Bus error (bad memory access).
44+
- SIGFPE: Floating-point exception.
45+
- SIGILL: Illegal instruction.
46+
- SIGQUIT: Quit from keyboard.
47+
- SIGSEGV: Invalid memory reference.
48+
- SIGSYS: Bad system call.
49+
- SIGTRAP: Trace/breakpoint trap.
50+
51+
Of these the most common culprits you'll likely see are `SIGSEGV`, `SIGBUS`, and
52+
`SIGABRT`. These are signals that will be generated when a program tries to
53+
access memory that it doesn't have access to, tries to dereference a null
54+
pointer, or when the program calls `abort`. These typically indicate a fairly
55+
serious bug in either your program, or the libraries that it uses.
56+
57+
Coredumps are very useful in these situations, as generally you're going to want
58+
to inspect the running state of the process a the time of crash. From the
59+
coredump you can get a backtrace of the crashing thread, the values of the
60+
registers at the time of crash, and the values of the local variables at each
61+
frame of the backtrace.
62+
63+
## How are coredumps enabled/collected
64+
65+
Enabling coredumps on your Linux device requires a few configuration options. To
66+
start with you'll need the following options enabled on your kernel at a
67+
minimum:
68+
69+
```c
70+
CONFIG_COREDUMP=y
71+
CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS=y
72+
```
73+
74+
These settings will enable the kernel to generate coredumps, as well as set the
75+
default mappings that are present in the coredump. `man core`[^man_core]
76+
provides a good overview of the options available to you when configuring
77+
coredumps.
78+
79+
### core_pattern
80+
81+
The kernel provides an interface for controlling where and how coredumps are
82+
written. The `/proc/sys/kernel/core_pattern`[^man_core] file provides two
83+
methods for capturing coredumps from crashed processes. A coredump can be
84+
written directly to a file by providing a path directly to it. For example if we
85+
wanted to write the core file to our `/tmp` directory with both the process name
86+
and the pid we would write the following to `/proc/sys/kernel/core_pattern`.
87+
88+
```bash
89+
/tmp/core.%e.%p
90+
```
91+
92+
In this example `%e` expands to the name of the crashing process, and `%p`
93+
expands to the PID of the crashing process. More information on the available
94+
expansions can be found in the `man core`[^man_core] page.
95+
96+
We can also pipe a coredump directly to a program. This is useful when we want
97+
to modify the coredump in flight. The coredump is streamed to the provided
98+
program via `stdin`. The configuration is similar to saving directly to a file
99+
except the first character must be a `|`. This is how we capture coredumps in
100+
the Memfault SDK, and will be covered more in depth later in the article.
101+
102+
#### `procfs` Shallow Dive
103+
104+
An additional benefit to the `core_pattern` pipe interface is that until the
105+
program that is being piped to exits, we have access to the `procfs` of the
106+
crashing process. But what is `procfs`, and how does it help us with a coredump?
107+
108+
`procfs` gives us direct, usually read-only, access to some of the kernel's data
109+
structures[^man_proc]. This can be system wide information, or information about
110+
individual processes. For our purposes we are interested mostly in the
111+
information about the process that is currently crashing. We can get direct read
112+
only access to all mapped memory by address through
113+
`/proc/<pid>/mem`[^man_proc_pid_mem], or look at the command line arguments of
114+
the process through `/proc/<pid>/cmdline`[^man_proc_pid_cmdline].
115+
116+
## Elf Core File Layout
117+
118+
Linux coredumps use a subset of the ELF format. The coredump itself is a
119+
snapshot of the crashing process' memory, as well as some metadata to help
120+
debuggers understand the state of the process at the time of crash. We will
121+
touch on the most important aspects of the core file in this article. We will
122+
not be doing an exhaustive dive into the ELF format, however, if you are
123+
interested in learning more about the ELF format, the ELF File
124+
Format[^elf_format] is a great resource.
125+
126+
![]({% img_url linux-coredump/elf-core-layout.png %})
127+
128+
### ELF Header
129+
130+
The above image gives us a very high level view of the layout of a coredump. To
131+
start, the ELF header outlines the layout of the file and source of the file. We
132+
can see if the producing system was 32-bit or 64-bit, little or big endian, and
133+
the architecture of the system. Additionally it shows the offset to the program
134+
headers. Here is the layout of the ELF header[^elf_format]:
135+
136+
```c
137+
typedef struct {
138+
unsigned char e_ident[EI_NIDENT];
139+
Elf32_Half e_type;
140+
Elf32_Half e_machine;
141+
Elf32_Word e_version;
142+
Elf32_Addr e_entry;
143+
Elf32_Off e_phoff;
144+
Elf32_Off e_shoff;
145+
Elf32_Word e_flags;
146+
Elf32_Half e_ehsize;
147+
Elf32_Half e_phentsize;
148+
Elf32_Half e_phnum;
149+
Elf32_Half e_shentsize;
150+
Elf32_Half e_shnum;
151+
Elf32_Half e_shstrndx;
152+
} Elf32_Ehdr;
153+
```
154+
155+
There is a lot going on here, but the fields that are most important to our
156+
discussion are broken down below:
157+
158+
- `e_ident`: This field is an array of bytes that identify the file as an ELF
159+
file.
160+
- `e_type`: This field tells us what type of file we are looking at. For our
161+
purposes this will always be `ET_CORE`.
162+
- `e_machine`: This field tells us the architecture of the system that produced
163+
the file. Common values here are
164+
[`EM_ARM`](https://github.com/torvalds/linux/blob/c45323b7560ec87c37c729b703c86ee65f136d75/include/uapi/linux/elf-em.h#L26)
165+
for 32 bit ARM, and
166+
[`EM_AARCH64`](https://github.com/torvalds/linux/blob/c45323b7560ec87c37c729b703c86ee65f136d75/include/uapi/linux/elf-em.h#L46)
167+
for aarch64.
168+
- `e_phoff`: This field tells us the offset to the program headers.
169+
- `e_phentsize`: This field tells us the size of each program header.
170+
171+
### Program Headers and Segments
172+
173+
The meat of our coredump exists in the program headers. There are a wide variety
174+
of program header types defined in the Elf File Format[^elf_format]. From the
175+
perspective of the coredump, however, we are primarily interested in the
176+
`PT_NOTE` and `PT_LOAD` program headers.
177+
178+
Program headers have the following layout[^elf_format]:
179+
180+
```c
181+
typedef struct {
182+
Elf32_Word p_type;
183+
Elf32_Off p_offset;
184+
Elf32_Addr p_vaddr;
185+
Elf32_Addr p_paddr;
186+
Elf32_Word p_filesz;
187+
Elf32_Word p_memsz;
188+
Elf32_Word p_flags;
189+
Elf32_Word p_align;
190+
} Elf32_Phdr;
191+
```
192+
193+
Here is a brief breakdown of the fields we care about in the program header:
194+
195+
- `p_type`: This field tells us what type of segment we are looking at. For our
196+
purposes this will be either `PT_NOTE` or `PT_LOAD`.
197+
- `p_offset`: This field tells us the offset from the beginning of the file
198+
where the segment starts.
199+
- `p_vaddr`: This field tells us the virtual address where the segment is
200+
loaded.
201+
- `p_paddr`: This field tells us the physical address where the segment is
202+
loaded.
203+
- `p_filesz`: This field tells us the size of the segment in the file.
204+
- `p_memsz`: This field tells us the size of the segment in memory.
205+
- `p_align`: This field tells us the alignment of the segment.
206+
207+
We'll start by taking a look at the format of the `PT_NOTE` segments. Below is
208+
the layout of a `PT_NOTE` segment.
209+
210+
![]({% img_url linux-coredump/elf-note-layout.png %})
211+
212+
The first two fields of the segment are fairly self explanatory, they represent
213+
the size of both the name and the descriptor. The `name` field is a string that
214+
represents the type of note. The `desc` field is a structure that contains the
215+
actual data of the note. The `type` field tells us what type of note we are
216+
looking at. It is an unsigned integer that represents the type of note. It's
217+
worth noting that the `name` field works as a kind of namespace for the type
218+
field. Two notes with the same type field can be differentiated by their name
219+
field.
220+
221+
The `PT_LOAD` segment is a bit more straightforward. This represents a segment
222+
of memory that was loaded into the process at the time of crash. These can
223+
represent either the stack, heap, or any other segment of memory that was loaded
224+
into the process.
225+
226+
## Coredumps at Memfault: Rev. 1
227+
228+
Our first crack at coredumps at Memfault had one goal: leveraging existing tools
229+
to capture info about a crashing process. To have feature parity with our
230+
offering on MCU and Android, we needed a few basic things:
231+
232+
- A symbolicated backtrace for each running thread in the crashing process
233+
- The values of registers at the time of crash
234+
- Symbolicated local variables at each frame
235+
236+
Based on what we've learned about Linux core files so far, they are an obvious
237+
fit for these requirements. We can use an established system to route
238+
information about crashed processes, add metadata that helps gives us
239+
information the device in question, and do all of this without making any source
240+
modifications to anything running on the system. For this reason our first pass
241+
at coredumps leave them largely untouched from what the kernel provides. The
242+
only addition is a note that contains the metadata we use to identify devices
243+
and the version of software they're running on. This takes advantage of the fact
244+
that the `PT_NOTE` segment is a free form segment that can be used to add any
245+
metadata we want to the coredump.
246+
247+
This allows us to gather additional information about the process that crashed,
248+
and more easily stream memory to avoid unnecessary allocations or memory usage.
249+
250+
Now that we've covered all the background information we can start to dive into
251+
the innards of the `memfault-core-handler`. First we use the pipe operation that
252+
was outlined earlier.
253+
[Here](https://github.com/memfault/memfault-linux-sdk/blob/49adfe0ce0cb6082360012b0f0092a31e8030048/meta-memfault/recipes-memfault/memfaultd/files/memfaultd/src/coredump/mod.rs#L14)
254+
is the pattern we write to `/proc/sys/kernel/core_pattern` to pipe the coredump
255+
to our handler:
256+
257+
```bash
258+
|/usr/sbin/memfault-core-handler -c /path/to/config %P %e %I %s
259+
```
260+
261+
This tells the kernel to pipe the coredump to our handler, and provides the
262+
handler with the PID of the crashing process (`%P`), the name of the crashing
263+
process (%e), the UID of the crashing process (`%I`), and the signal that caused
264+
the crash (`%s`).
265+
266+
When a crash occurs the kernel will write the coredump to the `stdin` of the
267+
handler. The handler will then read all the program headers into memory. This
268+
sets us up to do two things. First we'll read all of the `PT_NOTE` segments and
269+
save them in memory. For the first iteration of the handler, we won't do
270+
anything further with them until we write them to a file. They'll become more
271+
important in later articles as we get into more of the special sauce of the
272+
handler.
273+
274+
The next thing the handler does is read all of the memory ranges for each
275+
`PT_LOAD` segment in the coredump. Instead of storing this in memory we'll
276+
stream it directly to the output file from `/proc/<pid>/mem`. This is done to
277+
reduce the memory footprint of the handler, and prevent any issues where we
278+
would potentially need to seek backwards in the stream. As mentioned before,
279+
`stdin` is a one way stream, and we can't seek backwards in it.
280+
281+
After we've written all of the `PT_LOAD` segments to the output file we should
282+
have an ELF coredump that is largely the same as what the kernel would have
283+
written. The only difference is that we've added a note to the coredump, the
284+
contents of which we won't cover in this article, as it's not particularly
285+
interesting.
286+
287+
Let's take a quick visual look at everything we've accomplished by annotating
288+
our previous ELF layout diagram with the changes we've made.
289+
290+
![]({% img_url linux-coredump/elf-core-layout-annotated.png %})
291+
292+
And there we have it! We've copied our coredump over from `stdin` with a few
293+
minor changes. Now you're probably wondering, why did we go through all of this
294+
trouble to end up with a file that's largely the same as what the kernel would
295+
have produced? Well for one it allows us to add metadata to the coredump, but it
296+
also sets the stage for more advanced coredump handling in the future that we'll
297+
cover in the the next article.
298+
299+
## Conclusion
300+
301+
We've covered the basics of coredumps on Linux, and how they're used in the
302+
Memfault SDK. You should now have a pretty good idea of how things look under
303+
the hood. While the baseline coredumps are useful, and a known commodity, there
304+
are a few things that aren't great about them. The biggest issue is that they
305+
can be quite large for processes that have many threads, or do a large amount of
306+
memory allocation. This can be a large problem for embedded devices that may not
307+
have a lot of room to store large files. In the next article we'll take a look
308+
at the steps we've taken to reduce the size of coredumps.
309+
310+
In the meantime, if you'd like to poke around the source code for the coredump
311+
handler you can find it
312+
[here](https://github.com/memfault/memfaultd/tree/main/memfaultd/src/cli/memfault_core_handler).
313+
314+
<!-- Interrupt Keep START -->
315+
316+
{% include newsletter.html %}
317+
318+
{% include submit-pr.html %}
319+
320+
<!-- Interrupt Keep END -->
321+
322+
{:.no_toc}
323+
324+
## References
325+
326+
<!-- prettier-ignore-start -->
327+
[^elf_format]: [ELF File Format](https://refspecs.linuxfoundation.org/elf/elf.pdf)
328+
[^man_core]: [`man core`](https://man7.org/linux/man-pages/man5/core.5.html)
329+
[^man_proc]: [`man proc`](https://man7.org/linux/man-pages/man5/procfs.5.html)
330+
[^man_proc_pid_mem]: [`man proc_pid_mem`](https://man7.org/linux/man-pages/man5/proc_pid_mem.5.html)
331+
[^man_proc_pid_cmdline]: [`man proc_pid_cmdline`](https://man7.org/linux/man-pages/man5/proc_pid_cmdline.5.html)
332+
[^man_ulimit]: [`man ulimit`](https://man7.org/linux/man-pages/man3/ulimit.3.html)
333+
[^man_signal]: [`man signal`](https://www.man7.org/linux/man-pages/man7/signal.7.html)
334+
<!-- prettier-ignore-end -->
270 KB
Loading
217 KB
Loading
3.47 KB
Loading

0 commit comments

Comments
 (0)