erts: Improve internal Tracing.md

sverker · sverker · commit 55df6b0ab281 · 2025-10-21T17:19:16.000+02:00
with some talk about sessions
and call_time and call_memory.
diff --git a/erts/emulator/internal_doc/Tracing.md b/erts/emulator/internal_doc/Tracing.md
@@ -78,6 +78,13 @@ local breakpoints.
 Things get a bit more involved in the JIT. See `BeamAsm.md` for more
 details.
 
+### Trace sessions
+
+Since OTP 27, isolated trace session can be dynamically created. Each trace
+session is represented by an instance of struct `ErtsTraceSession`. The old
+legacy session (kept for backward compatibility) is represented by the static
+instance `erts_trace_session_0`.
+
 ## Setting breakpoints
 
 ### Introduction
@@ -88,29 +95,24 @@ was carried out in single threaded mode. Similar to code loading, this
 can impose a severe problem for availability that grows with the
 number of cores.
 
-In OTP R16, breakpoints are set in the code without blocking the VM.
+Since OTP R16, breakpoints are set in the code without blocking the VM.
 Erlang processes may continue executing undisturbed in parallel during the
 entire operation. The same base technique is used as for code loading. A
 staging area of breakpoints is prepared and then made active with a single
 atomic operation.
 
-### Redesign of Breakpoint Wheel
+### Breakpoints
 
-To make it easier to manage breakpoints without single threaded mode a
-redesign of the breakpoint mechanism has been made. The old
-"breakpoint wheel" data structure was a circular double-linked list of
-breakpoints for each instrumented function. It was invented before the
-SMP emulator. To support it in the SMP emulator, is was essentially
-expanded to one breakpoint wheel per scheduler. As more breakpoint
-types have been added, the implementation have become messy and hard
-to understand and maintain.
+For call tracing, breakpoints are created and inserted in the ingress of each
+traced Erlang function. A pointer to the allocated struct `GenericBp` is
+inserted that holds all the data for all types of breakpoints. A bit-flag field
+is used to indicate what different type of break actions that are
+enabled. Struct `GenericBp` is session specific. If more than one trace session
+affects a function, one `GenericBp` instance is created for each session. They
+are linked together in a singly linked list that is traversed when the
+breakpoint is hit.
 
-In the new design the old wheel was dropped and instead replaced by
-one struct (`GenericBp`) to hold the data for all types of breakpoints
-for each instrumented function. A bit-flag field is used to indicate
-what different type of break actions that are enabled.
-
-### Same Same but Different
+### Similar to Code Loading but Different
 
 Even though `trace_pattern` use the same technique as the non-blocking
 code loading with replicated generations of data structures and an
@@ -290,6 +292,35 @@ tracing is that we insert the `op_i_generic_breakpoint` instruction
 (with its pointer at offset -4) in the export entry rather than in the
 code.
 
+### call_time and call_memory tracing
+
+For profiling, `call_time` and/or `call_memory` tracing can be set for a function.
+This will measure the time/memory spent by a function. The measured
+time/memory is kept in individual counters for every call traced process
+calling that function. To ensure scalability, scheduler specific hash tables
+(`BpTimemTrace`) are used in the breakpoint to map the calling process pid to
+its time/memory counters.
+
+Function `trace:info` is used to collect stats for `call_time`, `call_memory`
+or both (`all`). It has to aggregate the counters from all those scheduler
+specific hash tables to build a list with one tuple with counters for each
+pid. This cannot be done safely while the hash tables may be concurrently
+updated by traced processes.
+
+Since OTP 29, `trace:info` collects `call_time` and `call_memory` stats without
+blocking all schedulers from running. This is done by using the active and
+staging halves of the breakpoint. During normal operations both halves of the
+breakpoint refer to the same thread specific hash tables. To collect the stats
+safely, temporary hash tables are created to be used by traced calls happening
+during the call to `trace:info`. The temporary hash tables are being made active
+while the "real" hash tables are made inactive in the staging half. When the hash
+tables are inactive, they can be safely traversed. When done, the real
+tables are made active again. A final consolidation step is done to collect any
+stats from the temporary tables, delete them and make the two halves of the
+breakpoint identical again using the same real hash tables. Scheduling with
+thread progress is done between the switching to make sure the traversed hash
+tables are not being concurrently updated.
+
 ### Future work
 
 We still go to single threaded mode when new code is loaded for a