Skip to content

Conversation

@miraradeva
Copy link
Contributor

Currently, there is a single kvnemesis operation that executes via SQL: ToggleGlobalReads. We have seen this operation get stuck and cause the test to timeout under safety more. The expected behavior is that any stuck operations would time out, but it seems like there is a context cancelation propagation issue, most likely in lib/pq, but not confirmed.

This commit disables ToggleGlobalReads in safety mode to reduce test failures. This change would also help confirm that this is the only operation susceptible to the hanging behavior.

Fixes: #160293

Release note: None

Currently, there is a single kvnemesis operation that executes via SQL:
`ToggleGlobalReads`. We have seen this operation get stuck and cause
the test to timeout under safety more. The expected behavior is that
any stuck operations would time out, but it seems like there is a
context cancelation propagation issue, most likely in lib/pq, but not
confirmed.

This commit disables `ToggleGlobalReads` in safety mode to reduce test
failures. This change would also help confirm that this is the only
operation susceptible to the hanging behavior.

Fixes: cockroachdb#160293

Release note: None
@miraradeva miraradeva requested a review from stevendanna January 7, 2026 18:14
@miraradeva miraradeva added the backport-25.4.x Flags PRs that need to be backported to 25.4 label Jan 7, 2026
@miraradeva miraradeva requested a review from a team as a code owner January 7, 2026 18:14
@miraradeva miraradeva added the backport-26.1.x Flags PRs that need to be backported to 26.1 label Jan 7, 2026
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@miraradeva
Copy link
Contributor Author

CI is failing consistently for this test, in different ways. I'll take a look before it shows up in test triage.

@miraradeva
Copy link
Contributor Author

The most recent CI failure is an instance of #160653. An AddSSTable request is double applied; we see it in the error:

committed txn overwritten key had write: [d][/Table/100/"bfdd633c954626b0",/Table/100/"c290817c6647bad6"):1767810116.153540362,0-><nil>@s33 [d][/Table/100/"bfdd633c954626b0",/Table/100/"c290817c6647bad6"):1767810116.127524762,0-><nil>@s33 [d][/Table/100/"d2fb31c34986afd6",/Table/100/"d6bc9d6c978bf5f7"):1767810116.153540362,0-><nil>@s33 [d][/Table/100/"d2fb31c34986afd6",/Table/100/"d6bc9d6c978bf5f7"):1767810116.127524762,0-><nil>@s33

We're trying to validate this outcome as a single atomic operation with a single timestamp, but that's clearly not the case here.

I think we should still merge this, as it addresses a different issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport-25.4.x Flags PRs that need to be backported to 25.4 backport-26.1.x Flags PRs that need to be backported to 26.1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

kv/kvnemesis: TestKVNemesisMultiNode_Partition_Safety failed

2 participants