Skip to content

Conversation

@brfrn169
Copy link
Collaborator

@brfrn169 brfrn169 commented Jun 19, 2025

Description

Currently, we always update the read set when the same record is read multiple times from storage. However, this behavior can lead to anomalies like lost updates.

You can reproduce the issue with the following test:
https://github.com/scalar-labs/scalardb/pull/2797/files#diff-7d8b7163e906546d45d9a7b662f97eb671e39030217ab77ef22325b0f74c2b5eR5898-R5981

In the test, two slightly different Get operations are used to force reading the same record multiple times from storage and bypass snapshot reads.

Related issues and/or PRs

N/A

Changes made

  • Updated the code to prevent overwriting read set.

Checklist

The following is a best-effort checklist. If any items in this checklist are not applicable to this PR or are dependent on other, unmerged PRs, please still mark the checkboxes after you have read and understood each item.

  • I have commented my code, particularly in hard-to-understand areas.
  • I have updated the documentation to reflect the changes.
  • I have considered whether similar issues could occur in other products, components, or modules if this PR is for bug fixes.
  • Any remaining open issues linked to this PR are documented and up-to-date (Jira, GitHub, etc.).
  • Tests (unit, integration, etc.) have been added for the changes.
  • My changes generate no new warnings.
  • Any dependent changes in other PRs have been merged and published.

Additional notes (optional)

N/A

Release notes

Fixed an issue in Consensus Commit where reading the same record multiple times from storage could cause anomalies like lost updates. The read set is no longer updated on repeated reads to resolve this issue.

@brfrn169 brfrn169 requested a review from Copilot June 19, 2025 00:03
@brfrn169 brfrn169 self-assigned this Jun 19, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR aims to prevent the read set from being overwritten during repeated reads within a transaction to avoid lost updates. Key changes include:

  • Adding new tests in both integration-test and core test suites to verify the read set behavior.
  • Updating the CrudHandler to conditionally write to the read set only if it is not already present.
  • Adjusting documentation comments to reflect the new intended behavior.

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File Description
integration-test/src/main/java/com/scalar/db/transaction/consensuscommit/ConsensusCommitSpecificIntegrationTestBase.java Adds a new integration test to validate that conflicting external updates are correctly detected and that the read set remains unaltered.
core/src/test/java/com/scalar/db/transaction/consensuscommit/CrudHandlerTest.java Introduces a new test to ensure that performing different GET operations on the same record does not overwrite the read set.
core/src/main/java/com/scalar/db/transaction/consensuscommit/CrudHandler.java Modifies the logic so that the read set is only updated when it hasn’t been populated already, with updated comments to clarify behavior.
Comments suppressed due to low confidence (2)

core/src/main/java/com/scalar/db/transaction/consensuscommit/CrudHandler.java:318

  • The added check for snapshot.containsKeyInReadSet(key) prevents overwriting the read set. Consider adding a brief comment explaining why this condition is critical to avoid anomalies from multiple reads.
    if (!readOnly && !snapshot.containsKeyInReadSet(key)) {

integration-test/src/main/java/com/scalar/db/transaction/consensuscommit/ConsensusCommitSpecificIntegrationTestBase.java:5900

  • [nitpick] The test method name is quite long and may impact readability. Consider simplifying or refactoring it in accordance with your project's naming conventions.
      commit_ConflictingExternalUpdate_DifferentGetButSameRecordReturned_ShouldThrowCommitConflictExceptionAndPreserveExternalChanges()

Copy link
Contributor

@jnmt jnmt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again for finding a bug. LGTM, thank you!

Comment on lines -167 to -169
// Keep the read set latest to create before image by using the latest record (result)
// because another conflicting transaction might have updated the record after this
// transaction read it first. However, we update it only if a get operation has no
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, my bad. This part was a completely wrong idea. Probably, I was trying to avoid aborts as much as possible, but paid little attention to the case for reading a record multiple times. I need to think about how we can avoid these kinds of concurrency bugs…

Copy link
Contributor

@Torch3333 Torch3333 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you!

.intValue(BALANCE, 200)
.build());

// Retrieve the record again, but use a different Get object (with a where clause)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

private void putIntoReadSetInSnapshot(Snapshot.Key key, Optional<TransactionResult> result) {
// In read-only mode, we don't need to put the result into the read set
if (!readOnly) {
if (!readOnly && !snapshot.containsKeyInReadSet(key)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, we can use the latest result if read-modify-write hasn't occurred on the key, like as follows?

    if (!readOnly && !(snapshot.containsKeyInReadSet(key) && snapshot.containsKeyInWriteOrDeleteSet(key)) {

This question is mainly for confirmation and the current code looks good to me, though.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, what do you mean by read-modify-write? You mean CAS operations in the prepare-records phase, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the unclear comment. My question was as follows:

  • With this PR, the read-set isn't updated if the key is read
  • I think we can update the read-set even if the key is read, as long as the key isn't written or deleted in the transaction
  • For instance
    • Record A has the key = 100 and value = "one"
    • Transaction 1 reads the key and the value is "one", and puts it into the read-set
    • Transaction 2 updates the value of the record to "two" and commits it
    • Transaction 1 reads the key with different GET and the value is "two", and puts it into the read-set since there is no write operation on the key in Transaction 1
    • Transaction 1 updates the value of the record to "three" based on the read value "two"
    • In this case, no anomaly doesn't occur even if we use the latest result as a read-set
      • If Transaction 1 updates the record before the second GET, the second GET's result shouldn't be put into the read-set to avoid lost update anomaly

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. We might be able to rescue such a transaction, but I'm really not sure if it's good for our snapshot definition. One approach is to make it a safer side (like the current PR) for now, and then improve it after the deep discussion and confirmation.

Copy link
Collaborator Author

@brfrn169 brfrn169 Jun 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@komamitsu Indeed, we might be able to rescue such a transaction. However, since we don’t have enough time to test it, let’s make it the safer side for now as @jnmt suggested. Thanks!

@brfrn169 brfrn169 requested a review from komamitsu June 19, 2025 01:43
Copy link
Contributor

@feeblefakie feeblefakie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thank you!

Copy link
Contributor

@komamitsu komamitsu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 👍

@brfrn169 brfrn169 merged commit 88914f1 into master Jun 19, 2025
57 checks passed
@brfrn169 brfrn169 deleted the prevent-overwriting-read-set-in-consensus-commit branch June 19, 2025 04:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants