You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
runtime/pprof: add debug=26257 goroutine profile with labels and reduced STW
This adds a new goroutine profile mode (debug=26257) that emits one
entry per goroutine in a format nearly identical to debug=2 that
additionally includes any pprof labels in the header, following the [state]
segment, e.g:
goroutine 123 [select, 2 minutes]{"svc": "users"}:
The implementation of this mode is however significantly different from
debug=2, and instead based on the same underlying mechanism as debug=1.
Unlike the collection of debug=2 profiles, which walks all stacks while
the world is stopped, the implementation of this mode uses the
concurrent collection integrated with the scheduler that backs debug=1.
As a result, this new mode reduces the duration of the stop-the-world
phase, particularly when profiling processes with many goroutines,
compared to debug=2, and demonstrated by the included benchmark:
│ debug=2 │ debug=26257
│ max_latency_ns │ max_latency_ns vs base
goroutines=100x3-14 1013.17k ± 47% 84.06k ± 27% -91.70% (p=0.002 n=6)
goroutines=100x10-14 769.23k ± 7% 80.29k ± 22% -89.56% (p=0.002 n=6)
goroutines=100x50-14 2172.4k ± 9% 181.8k ± 46% -91.63% (p=0.002 n=6)
goroutines=1000x3-14 7133.9k ± 3% 195.7k ± 42% -97.26% (p=0.002 n=6)
goroutines=1000x10-14 11787.6k ± 48% 494.4k ± 77% -95.81% (p=0.002 n=6)
goroutines=1000x50-14 20234.0k ± 87% 174.8k ± 137% -99.14% (p=0.002 n=6)
goroutines=10000x3-14 68611.0k ± 49% 168.5k ± 2768% -99.75% (p=0.002 n=6)
goroutines=10000x10-14 60.261M ± 95% 3.460M ± 166% -94.26% (p=0.002 n=6)
goroutines=10000x50-14 284.144M ± 40% 4.672M ± 89% -98.36% (p=0.002 n=6)
goroutines=25000x3-14 171.290M ± 48% 4.287M ± 394% -97.50% (p=0.002 n=6)
goroutines=25000x10-14 150.827M ± 92% 6.424M ± 158% -95.74% (p=0.002 n=6)
goroutines=25000x50-14 708.238M ± 34% 2.249M ± 410% -99.68% (p=0.002 n=6)
geomean 25.08M 624.2k -97.51%
This concurrent collection approach and its relaxed consistency compared
to keeping the world stopped doesm mean the behavior of this new mode is
not exactly identical to debug=2. Additionally, currently this mode
always elides argument values when printing stacks, whereas debug=2
includes them most of the time. These behavior differences mean that
despite the performance benefits, the new implementation is not used for
the existing debug=2 mode, but rather only for the new mode making it an
opt-in alternative.
0 commit comments