-
Notifications
You must be signed in to change notification settings - Fork 4k
Description
Describe the problem
Experiment discussed internally here. When trying to reproduce snapshot-induced-latency-hits, using the roachtest added in #89191, we noticed that p99.9 latencies for read traffic over data that's not currently receiving snapshots see an increase. When looking at outlier traces, the time is spent entirely below pebble. There's little trace info from within pebble to understand why; this issue tracks investigating just that.
To Reproduce
Using #89191-ish:
First red annotation is leases for foreground load being transferred to the node that's going to start receiving snapshots. Second red annotation is when it starts receiving snapshots, and service latencies start going through the roof. A set of outlier traces can be found here: trace-snapshot-latency.tar.gz. They look roughly like the one below:
+cc @andrewbaptist, @sumeerbhola.
Jira issue: CRDB-20434

