Skip to content

Commit 44563fe

Browse files
craig[bot]michae2
andcommitted
Merge #147682
147682: sql: increase observability of automatic statement retries r=yuzefovich a=michae2 In #107044 we added statement-level automatic retries to the conn executor. These retries are used under Read Committed isolation to try to prevent retryable errors from escaping to the application. This PR adds more observability to statement-level retries, in some cases bringing it up to par with transaction-level retries. See individual commits for details. Informs: #145377 Release note: None Co-authored-by: Michael Erickson <[email protected]>
2 parents 2b8a0ae + 0958bae commit 44563fe

File tree

39 files changed

+512
-63
lines changed

39 files changed

+512
-63
lines changed

docs/generated/metrics/metrics.yaml

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8823,6 +8823,22 @@ layers:
88238823
unit: COUNT
88248824
aggregation: AVG
88258825
derivative: NON_NEGATIVE_DERIVATIVE
8826+
- name: sql.statements.auto_retry.count
8827+
exported_name: sql_statements_auto_retry_count
8828+
description: Number of SQL statement automatic retries
8829+
y_axis_label: SQL Statements
8830+
type: COUNTER
8831+
unit: COUNT
8832+
aggregation: AVG
8833+
derivative: NON_NEGATIVE_DERIVATIVE
8834+
- name: sql.statements.auto_retry.count.internal
8835+
exported_name: sql_statements_auto_retry_count_internal
8836+
description: Number of SQL statement automatic retries (internal queries)
8837+
y_axis_label: SQL Internal Statements
8838+
type: COUNTER
8839+
unit: COUNT
8840+
aggregation: AVG
8841+
derivative: NON_NEGATIVE_DERIVATIVE
88268842
- name: sql.stats.activity.update.latency
88278843
exported_name: sql_stats_activity_update_latency
88288844
description: The latency of updates made by the SQL activity updater job. Includes failed update attempts
@@ -8991,6 +9007,22 @@ layers:
89919007
unit: COUNT
89929008
aggregation: AVG
89939009
derivative: NON_NEGATIVE_DERIVATIVE
9010+
- name: sql.txn.auto_retry.count
9011+
exported_name: sql_txn_auto_retry_count
9012+
description: Number of SQL transaction automatic retries
9013+
y_axis_label: SQL Transactions
9014+
type: COUNTER
9015+
unit: COUNT
9016+
aggregation: AVG
9017+
derivative: NON_NEGATIVE_DERIVATIVE
9018+
- name: sql.txn.auto_retry.count.internal
9019+
exported_name: sql_txn_auto_retry_count_internal
9020+
description: Number of SQL transaction automatic retries (internal queries)
9021+
y_axis_label: SQL Internal Statements
9022+
type: COUNTER
9023+
unit: COUNT
9024+
aggregation: AVG
9025+
derivative: NON_NEGATIVE_DERIVATIVE
89949026
- name: sql.txn.begin.started.count
89959027
exported_name: sql_txn_begin_started_count
89969028
description: Number of SQL transaction BEGIN statements started

pkg/ccl/kvccl/kvfollowerreadsccl/testdata/boundedstaleness/single_row

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -245,24 +245,26 @@ query idx=2
245245
SELECT * FROM t2 AS OF SYSTEM TIME with_min_timestamp(now() - '10s', true) WHERE pk = 2
246246
----
247247
pq: referenced descriptor ID 105: looking up ID 105: descriptor not found
248-
events (7 found):
248+
events (8 found):
249249
* event 1: colbatchscan trace on node_idx 2: local read
250250
* event 2: transaction retry on node_idx: 2
251251
* event 3: colbatchscan trace on node_idx 2: local read
252252
* event 4: transaction retry on node_idx: 2
253253
* event 5: colbatchscan trace on node_idx 2: local read
254254
* event 6: transaction retry on node_idx: 2
255255
* event 7: colbatchscan trace on node_idx 2: local read
256+
* event 8: transaction retry on node_idx: 2
256257

257258
query idx=2
258259
SELECT * FROM t2 AS OF SYSTEM TIME with_min_timestamp(now() - '10s', true) WHERE pk = 2
259260
----
260261
pq: referenced descriptor ID 105: looking up ID 105: descriptor not found
261-
events (7 found):
262+
events (8 found):
262263
* event 1: colbatchscan trace on node_idx 2: local read
263264
* event 2: transaction retry on node_idx: 2
264265
* event 3: colbatchscan trace on node_idx 2: local read
265266
* event 4: transaction retry on node_idx: 2
266267
* event 5: colbatchscan trace on node_idx 2: local read
267268
* event 6: transaction retry on node_idx: 2
268269
* event 7: colbatchscan trace on node_idx 2: local read
270+
* event 8: transaction retry on node_idx: 2
Lines changed: 92 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,92 @@
1+
# Test that automatic transaction retries and automatic statement retries are
2+
# counted separately.
3+
4+
statement ok
5+
SET max_retries_for_read_committed = 5
6+
7+
statement ok
8+
SET tracing = on
9+
10+
statement ok
11+
BEGIN ISOLATION LEVEL READ COMMITTED
12+
13+
# There's currently some quirky behavior for the first statement of a read-
14+
# committed transaction: it is subject to both transaction retries and statement
15+
# retries. We can see this if we force more than max_retries_for_read_committed
16+
# number of retries for the first statement.
17+
18+
query I
19+
SELECT crdb_internal.force_retry(7)
20+
----
21+
0
22+
23+
statement ok
24+
SET tracing = off
25+
26+
# We use transaction retry count + statement retry count to decide if we have
27+
# retried enough. We don't currently remember statement retry counts across
28+
# transaction retries, so we end up retrying 17 times here instead of 7.
29+
30+
query T nosort
31+
SELECT message FROM [SHOW TRACE FOR SESSION] WHERE message LIKE '%executing after % retries%'
32+
----
33+
executing after 1 statement retries, last retry reason: TransactionRetryWithProtoRefreshError: forced by crdb_internal.force_retry()
34+
executing after 2 statement retries, last retry reason: TransactionRetryWithProtoRefreshError: forced by crdb_internal.force_retry()
35+
executing after 3 statement retries, last retry reason: TransactionRetryWithProtoRefreshError: forced by crdb_internal.force_retry()
36+
executing after 4 statement retries, last retry reason: TransactionRetryWithProtoRefreshError: forced by crdb_internal.force_retry()
37+
executing after 5 statement retries, last retry reason: TransactionRetryWithProtoRefreshError: forced by crdb_internal.force_retry()
38+
executing after 1 transaction retries, last retry reason: read committed retry limit exceeded; set by max_retries_for_read_committed=5: TransactionRetryWithProtoRefreshError: forced by crdb_internal.force_retry()
39+
executing after 1 statement retries, last retry reason: TransactionRetryWithProtoRefreshError: forced by crdb_internal.force_retry()
40+
executing after 2 statement retries, last retry reason: TransactionRetryWithProtoRefreshError: forced by crdb_internal.force_retry()
41+
executing after 3 statement retries, last retry reason: TransactionRetryWithProtoRefreshError: forced by crdb_internal.force_retry()
42+
executing after 4 statement retries, last retry reason: TransactionRetryWithProtoRefreshError: forced by crdb_internal.force_retry()
43+
executing after 5 statement retries, last retry reason: TransactionRetryWithProtoRefreshError: forced by crdb_internal.force_retry()
44+
executing after 2 transaction retries, last retry reason: read committed retry limit exceeded; set by max_retries_for_read_committed=5: TransactionRetryWithProtoRefreshError: forced by crdb_internal.force_retry()
45+
executing after 1 statement retries, last retry reason: TransactionRetryWithProtoRefreshError: forced by crdb_internal.force_retry()
46+
executing after 2 statement retries, last retry reason: TransactionRetryWithProtoRefreshError: forced by crdb_internal.force_retry()
47+
executing after 3 statement retries, last retry reason: TransactionRetryWithProtoRefreshError: forced by crdb_internal.force_retry()
48+
executing after 4 statement retries, last retry reason: TransactionRetryWithProtoRefreshError: forced by crdb_internal.force_retry()
49+
executing after 5 statement retries, last retry reason: TransactionRetryWithProtoRefreshError: forced by crdb_internal.force_retry()
50+
51+
statement ok
52+
SET tracing = on
53+
54+
query I
55+
SELECT crdb_internal.force_retry(3)
56+
----
57+
0
58+
59+
statement ok
60+
SET tracing = off
61+
62+
# We use transaction retry count + statement retry count to decide if we have
63+
# retried enough. We don't currently remember which statements participated in
64+
# previous transaction retries, so we end up retrying 1 time here instead of 3.
65+
66+
query T nosort
67+
SELECT message FROM [SHOW TRACE FOR SESSION] WHERE message LIKE '%executing after % retries%'
68+
----
69+
executing after 2 transaction retries, last retry reason: read committed retry limit exceeded; set by max_retries_for_read_committed=5: TransactionRetryWithProtoRefreshError: forced by crdb_internal.force_retry()
70+
executing after 1 statement retries, last retry reason: TransactionRetryWithProtoRefreshError: forced by crdb_internal.force_retry()
71+
72+
statement ok
73+
SET tracing = on
74+
75+
query I
76+
SELECT 0
77+
----
78+
0
79+
80+
statement ok
81+
SET tracing = off
82+
83+
# By the time we get to the last statement, the transaction retry count hasn't
84+
# been affected by any of the statement retries, showing that it is separate.
85+
86+
query T nosort
87+
SELECT message FROM [SHOW TRACE FOR SESSION] WHERE message LIKE '%executing after % retries%'
88+
----
89+
executing after 2 transaction retries, last retry reason: read committed retry limit exceeded; set by max_retries_for_read_committed=5: TransactionRetryWithProtoRefreshError: forced by crdb_internal.force_retry()
90+
91+
statement ok
92+
COMMIT

pkg/ccl/logictestccl/tests/3node-tenant/generated_test.go

Lines changed: 7 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

pkg/ccl/logictestccl/tests/fakedist-disk/BUILD.bazel

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ go_test(
1212
"//build/toolchains:is_heavy": {"test.Pool": "heavy"},
1313
"//conditions:default": {"test.Pool": "large"},
1414
}),
15-
shard_count = 33,
15+
shard_count = 34,
1616
tags = ["cpu:2"],
1717
deps = [
1818
"//pkg/base",

pkg/ccl/logictestccl/tests/fakedist-disk/generated_test.go

Lines changed: 7 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

pkg/ccl/logictestccl/tests/fakedist-vec-off/BUILD.bazel

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ go_test(
1212
"//build/toolchains:is_heavy": {"test.Pool": "heavy"},
1313
"//conditions:default": {"test.Pool": "large"},
1414
}),
15-
shard_count = 33,
15+
shard_count = 34,
1616
tags = ["cpu:2"],
1717
deps = [
1818
"//pkg/base",

pkg/ccl/logictestccl/tests/fakedist-vec-off/generated_test.go

Lines changed: 7 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

pkg/ccl/logictestccl/tests/fakedist/BUILD.bazel

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ go_test(
1212
"//build/toolchains:is_heavy": {"test.Pool": "heavy"},
1313
"//conditions:default": {"test.Pool": "large"},
1414
}),
15-
shard_count = 34,
15+
shard_count = 35,
1616
tags = ["cpu:2"],
1717
deps = [
1818
"//pkg/base",

pkg/ccl/logictestccl/tests/fakedist/generated_test.go

Lines changed: 7 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)