doc: Add initial USDT documentation

0xB10C · laanwj · 0xB10C · commit 84ace9aef116 · 2021-07-27T16:32:01.000+02:00
Both added files are extended in the following commits.

doc/usdt.md is based on earlier work by laanwj.

Co-authored-by: W. J. van der Laan &lt;laanwj@protonmail.com&gt;
diff --git a/contrib/tracing/README.md b/contrib/tracing/README.md
@@ -0,0 +1,45 @@
+Example scripts for User-space, Statically Defined Tracing (USDT)
+=================================================================
+
+This directory contains scripts showcasing User-space, Statically Defined
+Tracing (USDT) support for Bitcoin Core on Linux using. For more information on
+USDT support in Bitcoin Core see the [USDT documentation].
+
+[USDT documentation]: ../../doc/tracing.md
+
+
+Examples for the two main eBPF front-ends, [bpftrace] and
+[BPF Compiler Collection (BCC)], with support for USDT, are listed. BCC is used
+for complex tools and daemons and `bpftrace` is preferred for one-liners and
+shorter scripts.
+
+[bpftrace]: https://github.com/iovisor/bpftrace
+[BPF Compiler Collection (BCC)]: https://github.com/iovisor/bcc
+
+
+To develop and run bpftrace and BCC scripts you need to install the
+corresponding packages. See [installing bpftrace] and [installing BCC] for more
+information. For development there exist a [bpftrace Reference Guide], a
+[BCC Reference Guide], and a [bcc Python Developer Tutorial].
+
+[installing bpftrace]: https://github.com/iovisor/bpftrace/blob/master/INSTALL.md
+[installing BCC]: https://github.com/iovisor/bcc/blob/master/INSTALL.md
+[bpftrace Reference Guide]: https://github.com/iovisor/bpftrace/blob/master/docs/reference_guide.md
+[BCC Reference Guide]: https://github.com/iovisor/bcc/blob/master/docs/reference_guide.md
+[bcc Python Developer Tutorial]: https://github.com/iovisor/bcc/blob/master/docs/tutorial_bcc_python_developer.md
+
+## Examples
+
+The bpftrace examples contain a relative path to the `bitcoind` binary. By
+default, the scripts should be run from the repository-root and assume a
+self-compiled `bitcoind` binary. The paths in the examples can be changed, for
+example, to point to release builds if needed. See the
+[Bitcoin Core USDT documentation] on how to list available tracepoints in your
+`bitcoind` binary.
+
+[Bitcoin Core USDT documentation]: ../../doc/tracing.md#listing-available-tracepoints
+
+**WARNING: eBPF programs require root privileges to be loaded into a Linux
+kernel VM. This means the bpftrace and BCC examples must be executed with root
+privileges. Make sure to carefully review any scripts that you run with root
+privileges first!**
diff --git a/doc/tracing.md b/doc/tracing.md
@@ -0,0 +1,204 @@
+# User-space, Statically Defined Tracing (USDT) for Bitcoin Core
+
+Bitcoin Core includes statically defined tracepoints to allow for more
+observability during development, debugging, code review, and production usage.
+These tracepoints make it possible to keep track of custom statistics and
+enable detailed monitoring of otherwise hidden internals. They have
+little to no performance impact when unused.
+
+```
+eBPF and USDT Overview
+======================
+
+                ┌──────────────────┐            ┌──────────────┐
+                │ tracing script   │            │ bitcoind     │
+                │==================│      2.    │==============│
+                │  eBPF  │ tracing │      hooks │              │
+                │  code  │ logic   │      into┌─┤►tracepoint 1─┼───┐ 3.
+                └────┬───┴──▲──────┘          ├─┤►tracepoint 2 │   │ pass args
+            1.       │      │ 4.              │ │ ...          │   │ to eBPF
+    User    compiles │      │ pass data to    │ └──────────────┘   │ program
+    Space    & loads │      │ tracing script  │                    │
+    ─────────────────┼──────┼─────────────────┼────────────────────┼───
+    Kernel           │      │                 │                    │
+    Space       ┌──┬─▼──────┴─────────────────┴────────────┐       │
+                │  │  eBPF program                         │◄──────┘
+                │  └───────────────────────────────────────┤
+                │ eBPF kernel Virtual Machine (sandboxed)  │
+                └──────────────────────────────────────────┘
+
+1. The tracing script compiles the eBPF code and loads the eBPF program into a kernel VM
+2. The eBPF program hooks into one or more tracepoints
+3. When the tracepoint is called, the arguments are passed to the eBPF program
+4. The eBPF program processes the arguments and returns data to the tracing script
+```
+
+The Linux kernel can hook into the tracepoints during runtime and pass data to
+sandboxed [eBPF] programs running in the kernel. These eBPF programs can, for
+example, collect statistics or pass data back to user-space scripts for further
+processing.
+
+[eBPF]: https://ebpf.io/
+
+The two main eBPF front-ends with support for USDT are [bpftrace] and
+[BPF Compiler Collection (BCC)]. BCC is used for complex tools and daemons and
+`bpftrace` is preferred for one-liners and shorter scripts. Examples for both can
+be found in [contrib/tracing].
+
+[bpftrace]: https://github.com/iovisor/bpftrace
+[BPF Compiler Collection (BCC)]: https://github.com/iovisor/bcc
+[contrib/tracing]: ../contrib/tracing/
+
+## Tracepoint documentation
+
+The currently available tracepoints are listed here.
+
+## Adding tracepoints to Bitcoin Core
+
+To add a new tracepoint, `#include <util/trace.h>` in the compilation unit where
+the tracepoint is inserted. Use one of the `TRACEx` macros listed below
+depending on the number of arguments passed to the tracepoint. Up to 12
+arguments can be provided. The `context` and `event` specify the names by which
+the tracepoint is referred to. Please use `snake_case` and try to make sure that
+the tracepoint names make sense even without detailed knowledge of the
+implementation details. Do not forget to update the tracepoint list in this
+document.
+
+```c
+#define TRACE(context, event)
+#define TRACE1(context, event, a)
+#define TRACE2(context, event, a, b)
+#define TRACE3(context, event, a, b, c)
+#define TRACE4(context, event, a, b, c, d)
+#define TRACE5(context, event, a, b, c, d, e)
+#define TRACE6(context, event, a, b, c, d, e, f)
+#define TRACE7(context, event, a, b, c, d, e, f, g)
+#define TRACE8(context, event, a, b, c, d, e, f, g, h)
+#define TRACE9(context, event, a, b, c, d, e, f, g, h, i)
+#define TRACE10(context, event, a, b, c, d, e, f, g, h, i, j)
+#define TRACE11(context, event, a, b, c, d, e, f, g, h, i, j, k)
+#define TRACE12(context, event, a, b, c, d, e, f, g, h, i, j, k, l)
+```
+
+For example:
+
+```C++
+TRACE6(net, inbound_message,
+    pnode->GetId(),
+    pnode->GetAddrName().c_str(),
+    pnode->ConnectionTypeAsString().c_str(),
+    sanitizedType.c_str(),
+    msg.data.size(),
+    msg.data.data()
+);
+```
+
+### Guidelines and best practices
+
+#### Clear motivation and use-case
+Tracepoints need a clear motivation and use-case. The motivation should
+outweigh the impact on, for example, code readability. There is no point in
+adding tracepoints that don't end up being used.
+
+#### Provide an example
+When adding a new tracepoint, provide an example. Examples can show the use case
+and help reviewers testing that the tracepoint works as intended. The examples
+can be kept simple but should give others a starting point when working with
+the tracepoint. See existing examples in [contrib/tracing/].
+
+[contrib/tracing/]: ../contrib/tracing/
+
+#### No expensive computations for tracepoints
+Data passed to the tracepoint should be inexpensive to compute. Although the
+tracepoint itself only has overhead when enabled, the code to compute arguments
+is always run - even if the tracepoint is not used. For example, avoid
+serialization and parsing.
+
+#### Semi-stable API
+Tracepoints should have a semi-stable API. Users should be able to rely on the
+tracepoints for scripting. This means tracepoints need to be documented, and the
+argument order ideally should not change. If there is an important reason to
+change argument order, make sure to document the change and update the examples
+using the tracepoint.
+
+#### eBPF Virtual Machine limits
+Keep the eBPF Virtual Machine limits in mind. eBPF programs receiving data from
+the tracepoints run in a sandboxed Linux kernel VM. This VM has a limited stack
+size of 512 bytes. Check if it makes sense to pass larger amounts of data, for
+example, with a tracing script that can handle the passed data.
+
+#### `bpftrace` argument limit
+While tracepoints can have up to 12 arguments, bpftrace scripts currently only
+support reading from the first six arguments (`arg0` till `arg5`) on `x86_64`.
+bpftrace currently lacks real support for handling and printing binary data,
+like block header hashes and txids. When a tracepoint passes more than six
+arguments, then string and integer arguments should preferably be placed in the
+first six argument fields. Binary data can be placed in later arguments. The BCC
+supports reading from all 12 arguments.
+
+#### Strings as C-style String
+Generally, strings should be passed into the `TRACEx` macros as pointers to
+C-style strings (a null-terminated sequence of characters). For C++
+`std::strings`, [`c_str()`]  can be used. It's recommended to document the
+maximum expected string size if known.
+
+
+[`c_str()`]: https://www.cplusplus.com/reference/string/string/c_str/
+
+
+## Listing available tracepoints
+
+Multiple tools can list the available tracepoints in a `bitcoind` binary with
+USDT support.
+
+### GDB - GNU Project Debugger
+
+To list probes in Bitcoin Core, use `info probes` in `gdb`:
+
+```
+$ gdb ./src/bitcoind
+…
+(gdb) info probes
+Type Provider   Name             Where              Semaphore Object
+stap net        inbound_message  0x000000000014419e /src/bitcoind
+stap net        outbound_message 0x0000000000107c05 /src/bitcoind
+stap validation block_connected  0x00000000002fb10c /src/bitcoind
+…
+```
+
+### With `readelf`
+
+The `readelf` tool can be used to display the USDT tracepoints in Bitcoin Core.
+Look for the notes with the description `NT_STAPSDT`.
+
+```
+$ readelf -n ./src/bitcoind | grep NT_STAPSDT -A 4 -B 2
+Displaying notes found in: .note.stapsdt
+  Owner                 Data size	Description
+  stapsdt              0x0000005d	NT_STAPSDT (SystemTap probe descriptors)
+    Provider: net
+    Name: outbound_message
+    Location: 0x0000000000107c05, Base: 0x0000000000579c90, Semaphore: 0x0000000000000000
+    Arguments: -8@%r12 8@%rbx 8@%rdi 8@192(%rsp) 8@%rax 8@%rdx
+…
+```
+
+### With `tplist`
+
+The `tplist` tool is provided by BCC (see [Installing BCC]). It displays kernel
+tracepoints or USDT probes and their formats (for more information, see the
+[`tplist` usage demonstration]). There are slight binary naming differences
+between distributions. For example, on
+[Ubuntu the binary is called `tplist-bpfcc`][ubuntu binary].
+
+[Installing BCC]: https://github.com/iovisor/bcc/blob/master/INSTALL.md
+[`tplist` usage demonstration]: https://github.com/iovisor/bcc/blob/master/tools/tplist_example.txt
+[ubuntu binary]: https://github.com/iovisor/bcc/blob/master/INSTALL.md#ubuntu---binary
+
+```
+$ tplist -l ./src/bitcoind -v
+b'net':b'outbound_message' [sema 0x0]
+  1 location(s)
+  6 argument(s)
+…
+```