Skip to content

Commit eabcae0

Browse files
craig[bot]yuzefovich
andcommitted
143857: explain: show "execution time" on EXPLAIN ANALYZE output r=yuzefovich a=yuzefovich We've been tracking "execution time" statistic for most operators (except for vectorized KV-reading ones which get separate "KV time" stat) for many years now, but for some reason we only reported it on the DistSQL diagram. This commit fixes that oversight so that the stat is now included on `EXPLAIN ANALYZE` output when available. Fixes: #143443. Release note (sql change): New statistic `execution time` is now reported on EXPLAIN ANALYZE output for most operators. It was previously only available on the DistSQL diagrams (included in `EXPLAIN ANALYZE (DISTSQL)` output). 143859: explain: clarify "network usage" line of EXPLAIN ANALYZE r=yuzefovich a=yuzefovich In "network statistics" (messages and bytes sent) we only include the network traffic between DistSQL components completely ignoring any traffic from KV layer to SQL layer. In order to clarify this a bit, this commit renames `network usage` top-level information to include "DistSQL" prefix. Epic: None Release note (sql change): `network usage` top-level statistic in EXPLAIN ANALYZE output has been renamed to `DistSQL network usage` to clarify that it only tracks network traffic between DistSQL components and completely ignores traffic from KV layer to DistSQL components. 143865: kv/bulk: close bulk-adder-monitor on error r=yuzefovich a=yuzefovich Previously, we would have left the `bulk-adder-monitor` not closed if we hit an error in `MakeBulkAdder`, and this is now fixed. (The only error possible is actually not enough memory budget to reserve the buffer size.) Fixes: #143862. Release note: None 143867: explain: harden RU reporting for read-only queries r=yuzefovich a=yuzefovich In a recent change 365dfa4 we made it possible to no longer report RU estimate in `planNodeToRowSource` if it wraps a non-mutation planNode. That change was a de-flaking fix that was later super-seded by 66065df, so we can revert the former. I think it makes the code more bullet-proof even though AFAICT we report consumed RUs for only for mutation planNodes. Also this commit makes some minor fixes to `TestEstimateQueryRUConsumption` which is skipped unconditionally. Epic: None Release note: None 143868: memo: check `insert_fast_path` for staleness r=yuzefovich a=yuzefovich Since we consult the value of `insert_fast_path` session variable in `CanUseUniqueChecksForInsertFastPath` (which is called during the exploration), we need to check the value for memo staleness, and this has been missing. Epic: None Release note: None Co-authored-by: Yahor Yuzefovich <[email protected]>
6 parents 0d17d84 + 1067c32 + 7795741 + 2a9065c + 60f2c4d + 62b7136 commit eabcae0

38 files changed

+353
-151
lines changed

docs/generated/eventlog.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -3222,10 +3222,10 @@ Fields in this struct should be updated in sync with apps_stats.proto.
32223222

32233223
| Field | Description | Sensitive |
32243224
|--|--|--|
3225-
| `NetworkBytes` | NetworkBytes collects the number of bytes sent over the network. | no |
3225+
| `NetworkBytes` | NetworkBytes collects the number of bytes sent over the network by DistSQL components. | no |
32263226
| `MaxMemUsage` | MaxMemUsage collects the maximum memory usage that occurred on a node. | no |
32273227
| `ContentionTime` | ContentionTime collects the time in seconds statements in the transaction spent contending. | no |
3228-
| `NetworkMessages` | NetworkMessages collects the number of messages that were sent over the network. | no |
3228+
| `NetworkMessages` | NetworkMessages collects the number of messages that were sent over the network by DistSQL components. | no |
32293229
| `MaxDiskUsage` | MaxDiskUsage collects the maximum temporary disk usage that occurred. This is set in cases where a query had to spill to disk, e.g. when performing a large sort where not all of the tuples fit in memory. | no |
32303230
| `CPUSQLNanos` | CPUSQLNanos collects the CPU time spent executing SQL operations in nanoseconds. Currently, it is only collected for statements without mutations that have a vectorized plan. | no |
32313231
| `MVCCIteratorStats` | Internal storage iteration statistics. | yes |
@@ -3273,13 +3273,13 @@ contains common SQL event/execution details.
32733273
| `ZigZagJoinCount` | The number of zig zag joins in the query plan. | no |
32743274
| `ContentionNanos` | The duration of time in nanoseconds that the query experienced contention. | no |
32753275
| `Regions` | The regions of the nodes where SQL processors ran. | no |
3276-
| `NetworkBytesSent` | The number of network bytes sent by nodes for this query. | no |
3276+
| `NetworkBytesSent` | The number of network bytes by DistSQL components. | no |
32773277
| `MaxMemUsage` | The maximum amount of memory usage by nodes for this query. | no |
32783278
| `MaxDiskUsage` | The maximum amount of disk usage by nodes for this query. | no |
32793279
| `KVBytesRead` | The number of bytes read at the KV layer for this query. | no |
32803280
| `KVPairsRead` | The number of key-value pairs read at the KV layer for this query. | no |
32813281
| `KVRowsRead` | The number of rows read at the KV layer for this query. | no |
3282-
| `NetworkMessages` | The number of network messages sent by nodes for this query. | no |
3282+
| `NetworkMessages` | The number of network messages sent by nodes for this query by DistSQL components. | no |
32833283
| `IndexRecommendations` | Generated index recommendations for this query. | no |
32843284
| `ScanCount` | The number of scans in the query plan. | no |
32853285
| `ScanWithStatsCount` | The number of scans using statistics (including forecasted statistics) in the query plan. | no |

pkg/ccl/multitenantccl/tenantcostclient/query_ru_estimate_test.go

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -72,7 +72,6 @@ func TestEstimateQueryRUConsumption(t *testing.T) {
7272
tenantID := serverutils.TestTenantID()
7373
tenant1, tenantDB1 := serverutils.StartTenant(t, s, base.TestTenantArgs{
7474
TenantID: tenantID,
75-
Settings: st,
7675
TestingKnobs: base.TestingKnobs{
7776
SQLEvalContext: &eval.TestingKnobs{
7877
// We disable the randomization of some batch sizes because with
@@ -196,7 +195,7 @@ func TestEstimateQueryRUConsumption(t *testing.T) {
196195
const deltaFraction = 0.25
197196
allowedDelta := tenantMeasuredRUs * deltaFraction
198197
require.InDeltaf(t, tenantMeasuredRUs, tenantEstimatedRUs, allowedDelta,
199-
"estimated RUs (%d) were not within %f RUs of the expected value (%f)",
198+
"estimated RUs (%.2f) were not within %.2f RUs of the expected value (%.2f)",
200199
tenantEstimatedRUs,
201200
allowedDelta,
202201
tenantMeasuredRUs,

pkg/kv/bulk/buffering_adder.go

Lines changed: 14 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -79,6 +79,9 @@ var _ kvserverbase.BulkAdder = &BufferingAdder{}
7979
// passed to add into SSTs that are then ingested. rangeCache if set is
8080
// consulted to avoid generating an SST that will span a range boundary and thus
8181
// encounter an error and need to be split and retired to be applied.
82+
//
83+
// The BulkAdder takes ownership of the memory monitor which must be non-nil. In
84+
// case of an error, the monitor will be stopped.
8285
func MakeBulkAdder(
8386
ctx context.Context,
8487
db *kv.DB,
@@ -88,7 +91,15 @@ func MakeBulkAdder(
8891
opts kvserverbase.BulkAdderOptions,
8992
bulkMon *mon.BytesMonitor,
9093
sendLimiter limit.ConcurrentRequestLimiter,
91-
) (*BufferingAdder, error) {
94+
) (_ *BufferingAdder, retErr error) {
95+
if bulkMon == nil {
96+
return nil, errors.New("bulkMon must be non-nil")
97+
}
98+
defer func() {
99+
if retErr != nil {
100+
bulkMon.Stop(ctx)
101+
}
102+
}()
92103
if opts.MinBufferSize == 0 {
93104
opts.MinBufferSize = 32 << 20
94105
}
@@ -166,11 +177,8 @@ func (b *BufferingAdder) Close(ctx context.Context) {
166177
b.sink.mu.Unlock()
167178
}
168179
b.sink.Close(ctx)
169-
170-
if b.bulkMon != nil {
171-
b.memAcc.Close(ctx)
172-
b.bulkMon.Stop(ctx)
173-
}
180+
b.memAcc.Close(ctx)
181+
b.bulkMon.Stop(ctx)
174182
}
175183

176184
// Add adds a key to the buffer and checks if it needs to flush.

pkg/sql/appstatspb/app_stats.proto

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -328,7 +328,9 @@ message ExecStats {
328328
// run.
329329
optional int64 count = 1 [(gogoproto.nullable) = false];
330330

331-
// NetworkBytes collects the number of bytes sent over the network.
331+
// NetworkBytes collects the number of bytes sent over the network by DistSQL
332+
// components.
333+
// TODO(yuzefovich): rename this all the way up to the DB Console.
332334
optional NumericStat network_bytes = 2 [(gogoproto.nullable) = false];
333335

334336
// MaxMemUsage collects the maximum memory usage that occurred on a node.
@@ -338,7 +340,8 @@ message ExecStats {
338340
optional NumericStat contention_time = 4 [(gogoproto.nullable) = false];
339341

340342
// NetworkMessages collects the number of messages that were sent over the
341-
// network.
343+
// network by DistSQL components.
344+
// TODO(yuzefovich): rename this all the way up to the DB Console.
342345
optional NumericStat network_messages = 5 [(gogoproto.nullable) = false];
343346

344347
// MaxDiskUsage collects the maximum temporary disk usage that occurred. This

pkg/sql/exec_log.go

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -396,15 +396,15 @@ func (p *planner) maybeLogStatementInternal(
396396
SQLInstanceIDs: queryLevelStats.SQLInstanceIDs,
397397
KVNodeIDs: queryLevelStats.KVNodeIDs,
398398
UsedFollowerRead: queryLevelStats.UsedFollowerRead,
399-
NetworkBytesSent: queryLevelStats.NetworkBytesSent,
399+
NetworkBytesSent: queryLevelStats.DistSQLNetworkBytesSent,
400400
MaxMemUsage: queryLevelStats.MaxMemUsage,
401401
MaxDiskUsage: queryLevelStats.MaxDiskUsage,
402402
KVBytesRead: queryLevelStats.KVBytesRead,
403403
KVPairsRead: queryLevelStats.KVPairsRead,
404404
KVRowsRead: queryLevelStats.KVRowsRead,
405405
KvTimeNanos: queryLevelStats.KVTime.Nanoseconds(),
406406
KvGrpcCalls: queryLevelStats.KVBatchRequestsIssued,
407-
NetworkMessages: queryLevelStats.NetworkMessages,
407+
NetworkMessages: queryLevelStats.DistSQLNetworkMessages,
408408
CpuTimeNanos: queryLevelStats.CPUTime.Nanoseconds(),
409409
IndexRecommendations: indexRecs,
410410
// TODO(mgartner): Use a slice of struct{uint64, uint64} instead of
@@ -498,10 +498,10 @@ func (p *planner) logTransaction(
498498

499499
if txnStats.CollectedExecStats {
500500
sampledTxn.SampledExecStats = &eventpb.SampledExecStats{
501-
NetworkBytes: txnStats.ExecStats.NetworkBytesSent,
501+
NetworkBytes: txnStats.ExecStats.DistSQLNetworkBytesSent,
502502
MaxMemUsage: txnStats.ExecStats.MaxMemUsage,
503503
ContentionTime: int64(txnStats.ExecStats.ContentionTime.Seconds()),
504-
NetworkMessages: txnStats.ExecStats.NetworkMessages,
504+
NetworkMessages: txnStats.ExecStats.DistSQLNetworkMessages,
505505
MaxDiskUsage: txnStats.ExecStats.MaxDiskUsage,
506506
CPUSQLNanos: txnStats.ExecStats.CPUTime.Nanoseconds(),
507507
MVCCIteratorStats: eventpb.MVCCIteratorStats{

pkg/sql/execstats/traceanalyzer.go

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -95,7 +95,7 @@ func NewFlowsMetadata(flows map[base.SQLInstanceID]*execinfrapb.FlowSpec) *Flows
9595
// given traces and flow metadata.
9696
// NOTE: When adding fields to this struct, be sure to update Accumulate.
9797
type QueryLevelStats struct {
98-
NetworkBytesSent int64
98+
DistSQLNetworkBytesSent int64
9999
MaxMemUsage int64
100100
MaxDiskUsage int64
101101
KVBytesRead int64
@@ -116,7 +116,7 @@ type QueryLevelStats struct {
116116
MvccRangeKeyCount int64
117117
MvccRangeKeyContainedPoints int64
118118
MvccRangeKeySkippedPoints int64
119-
NetworkMessages int64
119+
DistSQLNetworkMessages int64
120120
ContentionTime time.Duration
121121
LockWaitTime time.Duration
122122
LatchWaitTime time.Duration
@@ -156,7 +156,7 @@ func MakeQueryLevelStatsWithErr(stats QueryLevelStats, err error) QueryLevelStat
156156

157157
// Accumulate accumulates other's stats into the receiver.
158158
func (s *QueryLevelStats) Accumulate(other QueryLevelStats) {
159-
s.NetworkBytesSent += other.NetworkBytesSent
159+
s.DistSQLNetworkBytesSent += other.DistSQLNetworkBytesSent
160160
if other.MaxMemUsage > s.MaxMemUsage {
161161
s.MaxMemUsage = other.MaxMemUsage
162162
}
@@ -181,7 +181,7 @@ func (s *QueryLevelStats) Accumulate(other QueryLevelStats) {
181181
s.MvccRangeKeyCount += other.MvccRangeKeyCount
182182
s.MvccRangeKeyContainedPoints += other.MvccRangeKeyContainedPoints
183183
s.MvccRangeKeySkippedPoints += other.MvccRangeKeySkippedPoints
184-
s.NetworkMessages += other.NetworkMessages
184+
s.DistSQLNetworkMessages += other.DistSQLNetworkMessages
185185
s.ContentionTime += other.ContentionTime
186186
s.LockWaitTime += other.LockWaitTime
187187
s.LatchWaitTime += other.LatchWaitTime
@@ -295,8 +295,8 @@ func (a *TraceAnalyzer) ProcessStats() {
295295
if stats.stats == nil {
296296
continue
297297
}
298-
s.NetworkBytesSent += getNetworkBytesFromComponentStats(stats.stats)
299-
s.NetworkMessages += getNumNetworkMessagesFromComponentsStats(stats.stats)
298+
s.DistSQLNetworkBytesSent += getNetworkBytesFromComponentStats(stats.stats)
299+
s.DistSQLNetworkMessages += getNumDistSQLNetworkMessagesFromComponentsStats(stats.stats)
300300
}
301301

302302
// Process flowStats.
@@ -350,7 +350,7 @@ func getNetworkBytesFromComponentStats(v *execinfrapb.ComponentStats) int64 {
350350
return 0
351351
}
352352

353-
func getNumNetworkMessagesFromComponentsStats(v *execinfrapb.ComponentStats) int64 {
353+
func getNumDistSQLNetworkMessagesFromComponentsStats(v *execinfrapb.ComponentStats) int64 {
354354
// We expect exactly one of MessagesReceived and MessagesSent to be set. It
355355
// may seem like we are double-counting everything (from both the send and
356356
// the receive side) but in practice only one side of each stream presents

pkg/sql/execstats/traceanalyzer_test.go

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -161,7 +161,7 @@ func TestTraceAnalyzer(t *testing.T) {
161161
// The stats don't count the actual bytes, but they are a synthetic value
162162
// based on the number of tuples. In this test 21 tuples flow over the
163163
// network.
164-
require.Equal(t, int64(21*8), queryLevelStats.NetworkBytesSent)
164+
require.Equal(t, int64(21*8), queryLevelStats.DistSQLNetworkBytesSent)
165165

166166
// Soft check that MaxMemUsage is set to a non-zero value. The actual
167167
// value differs between test runs due to metamorphic randomization.
@@ -176,7 +176,7 @@ func TestTraceAnalyzer(t *testing.T) {
176176

177177
// For tests, network messages is a synthetic value based on the number of
178178
// network tuples. In this test 21 tuples flow over the network.
179-
require.Equal(t, int64(21/2), queryLevelStats.NetworkMessages)
179+
require.Equal(t, int64(21/2), queryLevelStats.DistSQLNetworkMessages)
180180
})
181181
}
182182
}
@@ -233,14 +233,14 @@ func TestTraceAnalyzerProcessStats(t *testing.T) {
233233
func TestQueryLevelStatsAccumulate(t *testing.T) {
234234
aEvent := kvpb.ContentionEvent{Duration: 7 * time.Second}
235235
a := execstats.QueryLevelStats{
236-
NetworkBytesSent: 1,
236+
DistSQLNetworkBytesSent: 1,
237237
MaxMemUsage: 2,
238238
KVBytesRead: 3,
239239
KVPairsRead: 4,
240240
KVRowsRead: 4,
241241
KVBatchRequestsIssued: 4,
242242
KVTime: 5 * time.Second,
243-
NetworkMessages: 6,
243+
DistSQLNetworkMessages: 6,
244244
ContentionTime: 7 * time.Second,
245245
LockWaitTime: 4 * time.Second,
246246
LatchWaitTime: 3 * time.Second,
@@ -269,14 +269,14 @@ func TestQueryLevelStatsAccumulate(t *testing.T) {
269269
}
270270
bEvent := kvpb.ContentionEvent{Duration: 14 * time.Second}
271271
b := execstats.QueryLevelStats{
272-
NetworkBytesSent: 8,
272+
DistSQLNetworkBytesSent: 8,
273273
MaxMemUsage: 9,
274274
KVBytesRead: 10,
275275
KVPairsRead: 11,
276276
KVRowsRead: 11,
277277
KVBatchRequestsIssued: 11,
278278
KVTime: 12 * time.Second,
279-
NetworkMessages: 13,
279+
DistSQLNetworkMessages: 13,
280280
ContentionTime: 14 * time.Second,
281281
LockWaitTime: 10 * time.Second,
282282
LatchWaitTime: 4 * time.Second,
@@ -304,14 +304,14 @@ func TestQueryLevelStatsAccumulate(t *testing.T) {
304304
ClientTime: 2 * time.Second,
305305
}
306306
expected := execstats.QueryLevelStats{
307-
NetworkBytesSent: 9,
307+
DistSQLNetworkBytesSent: 9,
308308
MaxMemUsage: 9,
309309
KVBytesRead: 13,
310310
KVPairsRead: 15,
311311
KVRowsRead: 15,
312312
KVBatchRequestsIssued: 15,
313313
KVTime: 17 * time.Second,
314-
NetworkMessages: 19,
314+
DistSQLNetworkMessages: 19,
315315
ContentionTime: 21 * time.Second,
316316
LockWaitTime: 14 * time.Second,
317317
LatchWaitTime: 7 * time.Second,

pkg/sql/instrumentation.go

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -839,7 +839,7 @@ func (ih *instrumentationHelper) emitExplainAnalyzePlanToOutputBuilder(
839839
}
840840

841841
ob.AddMaxMemUsage(queryStats.MaxMemUsage)
842-
ob.AddNetworkStats(queryStats.NetworkMessages, queryStats.NetworkBytesSent)
842+
ob.AddDistSQLNetworkStats(queryStats.DistSQLNetworkMessages, queryStats.DistSQLNetworkBytesSent)
843843
ob.AddMaxDiskUsage(queryStats.MaxDiskUsage)
844844
if len(queryStats.Regions) > 0 {
845845
ob.AddRegionsStats(queryStats.Regions)
@@ -1060,6 +1060,10 @@ func (m execNodeTraceMetadata) annotateExplain(
10601060
nodeStats.InternalStepCount.MaybeAdd(stats.KV.NumInternalSteps)
10611061
nodeStats.SeekCount.MaybeAdd(stats.KV.NumInterfaceSeeks)
10621062
nodeStats.InternalSeekCount.MaybeAdd(stats.KV.NumInternalSeeks)
1063+
// If multiple physical plan stages correspond to a single
1064+
// operator, we want to aggregate the execution time across
1065+
// all of them.
1066+
nodeStats.ExecTime.MaybeAdd(stats.Exec.ExecTime)
10631067
nodeStats.MaxAllocatedMem.MaybeAdd(stats.Exec.MaxAllocatedMem)
10641068
nodeStats.MaxAllocatedDisk.MaybeAdd(stats.Exec.MaxAllocatedDisk)
10651069
if noMutations && !makeDeterministic {

pkg/sql/opt/exec/execbuilder/testdata/call

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -139,7 +139,7 @@ distribution: <hidden>
139139
vectorized: <hidden>
140140
plan type: custom
141141
maximum memory usage: <hidden>
142-
network usage: <hidden>
142+
DistSQL network usage: <hidden>
143143
regions: <hidden>
144144
isolation level: serializable
145145
priority: normal
@@ -149,6 +149,7 @@ quality of service: regular
149149
sql nodes: <hidden>
150150
regions: <hidden>
151151
actual row count: 0
152+
execution time: 0µs
152153
estimated row count: 0
153154
procedure: foo(3)
154155

@@ -161,7 +162,7 @@ distribution: <hidden>
161162
vectorized: <hidden>
162163
plan type: custom
163164
maximum memory usage: <hidden>
164-
network usage: <hidden>
165+
DistSQL network usage: <hidden>
165166
regions: <hidden>
166167
isolation level: serializable
167168
priority: normal
@@ -171,6 +172,7 @@ quality of service: regular
171172
sql nodes: <hidden>
172173
regions: <hidden>
173174
actual row count: 0
175+
execution time: 0µs
174176
estimated row count: 0
175177
procedure: foo(3)
176178
·

pkg/sql/opt/exec/execbuilder/testdata/call_plpgsql

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -145,7 +145,7 @@ distribution: <hidden>
145145
vectorized: <hidden>
146146
plan type: custom
147147
maximum memory usage: <hidden>
148-
network usage: <hidden>
148+
DistSQL network usage: <hidden>
149149
regions: <hidden>
150150
isolation level: serializable
151151
priority: normal
@@ -155,6 +155,7 @@ quality of service: regular
155155
sql nodes: <hidden>
156156
regions: <hidden>
157157
actual row count: 0
158+
execution time: 0µs
158159
estimated row count: 0
159160
procedure: foo(3)
160161

@@ -167,7 +168,7 @@ distribution: <hidden>
167168
vectorized: <hidden>
168169
plan type: custom
169170
maximum memory usage: <hidden>
170-
network usage: <hidden>
171+
DistSQL network usage: <hidden>
171172
regions: <hidden>
172173
isolation level: serializable
173174
priority: normal
@@ -177,6 +178,7 @@ quality of service: regular
177178
sql nodes: <hidden>
178179
regions: <hidden>
179180
actual row count: 0
181+
execution time: 0µs
180182
estimated row count: 0
181183
procedure: foo(3)
182184
·

0 commit comments

Comments
 (0)