You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
runtime/pprof: add debug=3 goroutine profile with goid and labels
This adds a new goroutine profile debug mode, debug=3.
This mode emits in the same binary proto output format as debug=0, with
the only difference being that it does not aggregate matching
stack/label combinations into a single count, and instead emits a sample
per goroutine with additional synthesized labels to communicate some of
the details of each goroutine, specifically:
- go::goroutine: the goroutine's ID
- go::goroutine_created_by: ID of the goroutine's creator (if any)
- go::goroutine_state: current state of the goroutine (e.g. runnable)
- go::goroutine_wait_minutes: approximate minutes goroutine has spent
waiting (if applicable)
Previously the debug=2 mode was the only way to get this kind of
per-goroutine information, that is sometimes vital to understanding the
state of a process. However debug=2 has two major drawbacks:
1) its collection incurs a lengthy and disruptive stop-the-world pause and
2) it does not include user-set labels along side per-goroutine details in the same profile.
This new debug=3 mode uses the same concurrent collection mechanism used
to produce debug=0 and debug=1 profiles, meaning it has the same minimial
stop-the-world penalty. At the same time, it includes the per-goroutine
details like status and wait time that make debug=2 so useful, providing
a "best-of-both-worlds" option.
A new mode is introduced, rather than changing the implementation of the
debug=2 format in-place, as it is not clear that debug=2 can utilize a
concurrent collection mechanism while maintaining the correctness of its
existing output, which includes argument values in its printed stacks.
The difference in STW latency observed by running goroutines during
profile collection is demonstrated by an included benchmark which spawns
a number of goroutines to be profiled and then measures the latency of a
short timer while collecting goroutine profiles.
BenchmarkGoroutineProfileLatencyImpact
│ debug=2 │ debug=3 │
│ max_latency_ns │ max_latency_ns vs base │
goroutines=100x3-14 422.2k ± 13% 190.3k ± 38% -54.93% (p=0.002 n=6)
goroutines=100x10-14 619.7k ± 10% 171.1k ± 43% -72.38% (p=0.002 n=6)
goroutines=100x50-14 1423.6k ± 7% 174.3k ± 44% -87.76% (p=0.002 n=6)
goroutines=1000x3-14 2424.8k ± 8% 298.6k ± 106% -87.68% (p=0.002 n=6)
goroutines=1000x10-14 7378.4k ± 2% 268.2k ± 146% -96.36% (p=0.002 n=6)
goroutines=1000x50-14 23372.5k ± 10% 330.1k ± 173% -98.59% (p=0.002 n=6)
goroutines=10000x3-14 42.802M ± 47% 1.991M ± 105% -95.35% (p=0.002 n=6)
goroutines=10000x10-14 36668.2k ± 95% 743.1k ± 72% -97.97% (p=0.002 n=6)
goroutines=10000x50-14 120639.1k ± 2% 188.2k ± 2582% -99.84% (p=0.002 n=6)
geomean 6.760M 326.2k -95.18%
The per-goroutine details are included in the profile as labels, along
side any user-set labels. While the pprof format allows for multi-valued
labels, so a collision with a user-set label would preserve both values,
it also discourages them, thus the 'go::' namespace prefix is used to
minimize collisions with user-set labels. The form 'go::' follows the
convention established in the pprof format, which reserves 'pprof::'.
Fixesgolang#74954.
Change-Id: If90eb01887ae3f35be8acc3d239b88dc29d338a8
0 commit comments