You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
155176: log,logflags: allow seeding test log config from env r=tbg a=tbg
This makes it easier to apply a custom config to various test invocations without having to juggle flags.
Many KV-specific logs are not included by default, so they're not in `-show-logs`, which can be annoying.
Epic: none
156295: workload/schemachange: add INSPECT operation to random schema workload r=spilchen a=spilchen
Adds a new inspect operation to the schema change workload, enabling random generation of INSPECT TABLE and INSPECT DATABASE statements.
Features:
- Support for TABLE/DB targets, AS OF SYSTEM TIME
- Always runs in DETACHED mode so that it can be run inside a transaciton
- Results checked post-run via SHOW INSPECT ERRORS
Errors reported in JSON, consistent with existing workload logs
Closes#155483
Epic: CRDB-55075
Release note: none
156445: asim: reproduce loss of quorum during zone cfg change with suspect nodes r=tbg a=tbg
I believe this newly added datadriven test reproduces #152604.
The test sets up five nodes with 5x replication, marks n4 and n5 as non-live, and drops the replication factor to 3. We see that the allocator merrily removes replicas from n1-n3 and loses quorum in the process. If it removed any replicas in this scenario, it really ought to be removing from n4 and n5.
> next replica action: remove voter
> removing voting replica n2,s2 due to over-replication: [1*:2, 2:2, 3:2, 4:2, 5:2]
Then:
> unable to take action - live voters [(n1,s1):1 (n3,s3):3] don't meet quorum of 3
Informs #152604.
Epic: none
156463: kvserver: remove within10s r=stevendanna a=tbg
Replace it with a standard invocation of `retry`. This is a follow-up to #156464, which fixed a bug in `within10s`.
Epic: none
156516: roachtest: adjust failover tests to changed liveness rangeID r=tbg a=tbg
As of #155554, the liveness range has rangeID three, not two.
The tests are updated to avoid relying on the particular ID.
Instead, we check on the pretty-printed start key of the
liveness range, /System/NodeLiveness.
Closes#156450.
Closes#156449.
Closes#156448.
Epic: none
156519: asim: rewrite high_cpu_25nodes.txt r=tbg a=tbg
The random numbers changed due to the Go upgrade. This didn't fail CI since we run most asim tests nightly only. No big deal.
Closes#156411.
Epic: none
156563: storage_api: fix TestDecommissionSelf flake r=tbg a=tbg
TestDecommissionSelf was flaking with timeouts waiting for decommissioned nodes to observe their own DECOMMISSIONED status. The test would fail when node 4 (one of the decommissioned nodes) never saw its liveness record update to DECOMMISSIONED within the 5-second timeout.
The likely root cause is a race condition in how decommission status propagates: when a node is marked as decommissioned, the updated liveness record is gossiped to all nodes. However, if other nodes receive this gossip update before the decommissioned node does, they will write a tombstone to local storage and subsequently reject all RPCs from the decommissioned node, including gossip messages. This can prevent the decommissioned node from ever learning about its own status change.
This commit fixes the test by only verifying the cluster state from the perspective of non-decommissioned nodes. We now assert that:
1. Active nodes see themselves as ACTIVE
2. Active nodes see the decommissioned nodes as DECOMMISSIONED
We no longer attempt to verify that decommissioned nodes observe their own status, since this is not guaranteed due to the gossip/tombstone race.
Fixes#156402.
Fixes#156104.
Fixes#154474.
Release note: None
Epic: None
156616: kvserver: deflake TestFlowControlSendQueueRangeFeed r=tbg a=tbg
The recently added logging showed that n2 and n3 just weren't even considered
for starting the rangefeed. Likely this was because the descriptors weren't
gossiped yet (this was corroborated by gossip logging).
Rather than waiting for specific preconditions that "likely" fix
the problem for the particular version of the code, widen the scope
of the retry loop to re-establish the rangefeed until it does get
scheduled were it needs to for the test to succeed.
Epic: none
Co-authored-by: Tobias Grieger <[email protected]>
Co-authored-by: Matt Spilchen <[email protected]>
0 commit comments