You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
150743: contention: Increase RetryBudgetForMissingResult r=alyshanjahani-crl a=alyshanjahani-crl
Previously the retry budget was set to 1, however this budget lead to
a significant amount of failed resolutions.
To see why a retry budget of 1 is not sufficient consider the case where an
in progress transaction is in the writer buffer when resolution is attempted.
The in progress txn is then ingested into the cache after the txn resolution
endpoint drains the write buffer - i.e. it is stored in the cache with the
appstatspb.InvalidTransactionFingerprintID value.
Now the transaction finishes and its respective fingerprint ID is recorded.
However, it is in the writer buffer of the txn id cache. When resolution is
attempted again, the lookup gets the invalid / in-progress value that is stored
in the cache. The subsequent flush then gets the cache to ingest the actual
fingerprint ID value for the txn. But we've run out of budget, and don't retry
resolution.
This commit increases the budget to 2. In addition to handling the case above,
experimentally it shows to lower the number of failed resolutions (see issue
linked).
Lastly, this commit removes dead code in the TxnID resolution endpoint. A map
was being created and never added to. The logic resulted in the RPC flushing
the TxnID Cache on every invocation, that behaviour is preserved and made
more explicit.
Fixes: #148686
Release note: None
151063: roachtest/ttl: fix TTL restart test flakiness r=spilchen a=spilchen
The TTL restart test was experiencing flakiness due to the default stability window causing delays in replanning when nodes changed. The test would wait for TTL progress across all nodes but the replanning logic wouldn't trigger immediately when nodes were restarted. This change disables the stability window.
This also fixes a bug in the logic that checks if the TTL job is progressing. It would look for key removal across all ranges over time. The existing check repeatedly change the baseline. We now save that the baseline and compare it with each check.
Release note: None
Epic: None
Closes#151011
Co-authored-by: Alyshan Jahani <[email protected]>
Co-authored-by: Matt Spilchen <[email protected]>
0 commit comments