Skip to content

Conversation

@brfrn169
Copy link
Collaborator

@brfrn169 brfrn169 commented Jun 11, 2025

Description

This PR introduces StorageInfo, which provides metadata about the storage.

Currently, it includes the following information:

  • Storage name
  • Mutation atomicity unit: Defines the scope of atomicity for mutations.
  • Maximum atomic mutation count: The maximum number of mutations that can be performed atomically in the storage.

For more details, refer to the StorageInfo interface.

You can get a StorageInfo instance via the DistributedStorageAdmin.getStorageInfo() method.

This PR also extends the behavior of the DistributedStorage.mutate() method based on the storage’s mutation atomicity unit and maximum atomic mutation count. For details, see the Javadoc for the DistributedStorage.mutate() method.

Related issues and/or PRs

N/A

Changes made

Added some inline comments. Please take a look for the details.

Checklist

The following is a best-effort checklist. If any items in this checklist are not applicable to this PR or are dependent on other, unmerged PRs, please still mark the checkboxes after you have read and understood each item.

  • I have commented my code, particularly in hard-to-understand areas.
  • I have updated the documentation to reflect the changes.
  • I have considered whether similar issues could occur in other products, components, or modules if this PR is for bug fixes.
  • Any remaining open issues linked to this PR are documented and up-to-date (Jira, GitHub, etc.).
  • Tests (unit, integration, etc.) have been added for the changes.
  • My changes generate no new warnings.
  • Any dependent changes in other PRs have been merged and published.

Additional notes (optional)

N/A

Release notes

Introduced StorageInfo, which provides metadata about the storage. You can obtain a StorageInfo instance via the DistributedStorageAdmin.getStorageInfo() method.

@brfrn169 brfrn169 requested a review from Copilot June 11, 2025 14:11
@brfrn169 brfrn169 self-assigned this Jun 11, 2025
@brfrn169 brfrn169 added the enhancement New feature or request label Jun 11, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces the StorageInfo concept to provide storage metadata, along with integrating a StorageInfoProvider that supplies this information to various storage implementations and operation checkers. Additionally, new error messages and integration tests for mutation atomicity units across different storages have been added.

  • Introduces a new StorageInfo API and its implementation
  • Refactors storage admin and operation checker implementations (Dynamo, Cosmos, Cassandra, etc.) to use StorageInfo and StorageInfoProvider
  • Adds integration tests to validate mutation atomicity behavior

Reviewed Changes

Copilot reviewed 33 out of 33 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
core/src/main/java/com/scalar/db/storage/dynamo/Dynamo.java Integrates StorageInfoProvider and updates constructor usage
core/src/main/java/com/scalar/db/storage/cosmos/CosmosOperationChecker.java Updates constructor to use StorageInfoProvider for Cosmos storage
core/src/main/java/com/scalar/db/storage/cosmos/CosmosAdmin.java Implements getStorageInfo returning predefined Cosmos StorageInfo
core/src/main/java/com/scalar/db/storage/cosmos/Cosmos.java Instantiates CosmosAdmin and passes it to StorageInfoProvider
core/src/main/java/com/scalar/db/storage/cassandra/CassandraAdmin.java Adds StorageInfo and its implementation for Cassandra storage
core/src/main/java/com/scalar/db/storage/cassandra/Cassandra.java Updates to instantiate CassandraAdmin and use StorageInfoProvider
core/src/main/java/com/scalar/db/service/AdminService.java Delegates getStorageInfo calls to the underlying admin
core/src/main/java/com/scalar/db/common/error/CoreError.java Adds new error messages for multi-record, multi-table, multi-namespace, and multi-storage mutations
core/src/main/java/com/scalar/db/common/checker/OperationChecker.java Enhances mutation atomicity checks using StorageInfo and custom error messages
core/src/main/java/com/scalar/db/common/StorageInfoProvider.java Introduces a caching provider for StorageInfo retrieval
core/src/main/java/com/scalar/db/common/StorageInfoImpl.java Provides an immutable implementation of the StorageInfo interface
core/src/main/java/com/scalar/db/common/CheckedDistributedStorageAdmin.java Implements getStorageInfo with proper error wrapping
core/src/main/java/com/scalar/db/api/StorageInfo.java Defines the new StorageInfo API interface and its atomicity unit enum
core/src/main/java/com/scalar/db/api/DistributedStorageAdmin.java Updates the admin interface to include getStorageInfo
core/src/main/java/com/scalar/db/api/DistributedStorage.java Updates documentation to reference the StorageInfo related behavior
Integration Test Files Added new tests across multi-storage, JDBC, Dynamo, Cosmos, and Cassandra for atomicity unit validation

}

Mutation first = mutations.get(0);
assert first.forNamespace().isPresent();
Copy link

Copilot AI Jun 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider replacing the assert with an explicit runtime check and throwing an appropriate exception to ensure the precondition is always enforced in production, as assertions may be disabled in runtime.

Suggested change
assert first.forNamespace().isPresent();
if (!first.forNamespace().isPresent()) {
throw new IllegalArgumentException(
CoreError.OPERATION_CHECK_ERROR_NAMESPACE_NOT_PRESENT.buildMessage(first));
}

Copilot uses AI. Check for mistakes.
Comment on lines 184 to 190
* <p>Note that this method only supports mutations within the atomicity unit specified by {@link
* StorageInfo#getMutationAtomicityUnit()}. For example, if the atomicity unit of the storage is
* {@link StorageInfo.MutationAtomicityUnit#PARTITION}, the mutations must occur within the same
* partition. Also note that the maximum number of mutations that can be performed atomically is
* defined by {@link StorageInfo#getMaxAtomicMutationCount()}.
*
* <p>To retrieve storage information, use {@link DistributedStorageAdmin#getStorageInfo(String)}.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extended the behavior of the DistributedStorage.mutate(). See this Javadoc for the details.

Comment on lines +91 to +98
/**
* Returns the storage information.
*
* @param namespace the namespace to get the storage information for
* @return the storage information
* @throws ExecutionException if the operation fails
*/
StorageInfo getStorageInfo(String namespace) throws ExecutionException;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can get the StorageInfo instance of the storage through this method.

import javax.annotation.concurrent.ThreadSafe;

@ThreadSafe
public class StorageInfoProvider {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StorageInfoProvider provides caching functionality for StorageInfo instances.

Comment on lines -341 to +355
// Check if all mutations are for the same partition
if (!mutation.forNamespace().equals(first.forNamespace())
|| !mutation.forTable().equals(first.forTable())
|| !mutation.getPartitionKey().equals(first.getPartitionKey())) {
// Check if the mutations are within the atomicity unit of the storage
if (isOutOfAtomicityUnit(first, storageInfoForFirst, mutation)) {
throw new IllegalArgumentException(
CoreError.OPERATION_CHECK_ERROR_MULTI_PARTITION_MUTATION.buildMessage(mutations));
getErrorMessageForOutOfAtomicityUnit(storageInfoForFirst, mutations));
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed the check logic based on the mutation atomicity unit of the storage.

Comment on lines +52 to +57
private static final StorageInfo STORAGE_INFO =
new StorageInfoImpl(
"cassandra",
StorageInfo.MutationAtomicityUnit.PARTITION,
// No limit on the number of mutations
Integer.MAX_VALUE);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the StorageInfo instance for Cassandra.

Comment on lines +69 to +74
private static final StorageInfo STORAGE_INFO =
new StorageInfoImpl(
"cosmos",
StorageInfo.MutationAtomicityUnit.PARTITION,
// No limit on the number of mutations
Integer.MAX_VALUE);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the StorageInfo instance for Cosmos DB.

Comment on lines +158 to +163
private static final StorageInfo STORAGE_INFO =
new StorageInfoImpl(
"dynamo",
StorageInfo.MutationAtomicityUnit.STORAGE,
// DynamoDB has a limit of 100 items per transactional batch write operation
100);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the StorageInfo instance for DynamoDB.

Comment on lines +64 to +69
private static final StorageInfo STORAGE_INFO =
new StorageInfoImpl(
"jdbc",
StorageInfo.MutationAtomicityUnit.STORAGE,
// No limit on the number of mutations
Integer.MAX_VALUE);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the StorageInfo instance for JDBC databases.

@brfrn169 brfrn169 requested review from Torch3333, feeblefakie and komamitsu and removed request for Torch3333, feeblefakie and komamitsu June 11, 2025 14:33
@brfrn169 brfrn169 force-pushed the introduce-storage-info branch from e5466c1 to a52030e Compare June 11, 2025 15:08
@brfrn169 brfrn169 force-pushed the introduce-storage-info branch from a52030e to 8ca4fad Compare June 11, 2025 16:33
Copy link
Contributor

@Torch3333 Torch3333 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you!

Copy link
Contributor

@feeblefakie feeblefakie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, looking good! Thank you!
Left naming suggestions. PTAL!

*
* @return the mutation atomicity unit of the storage
*/
MutationAtomicityUnit getMutationAtomicityUnit();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
MutationAtomicityUnit getMutationAtomicityUnit();
MutationAtomicityGranularity getMutationAtomicityGranularity();

After some thought, I feel it's probably more proper to call it granularity, although it's a bit long.
What do you think?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gemini says MutationAtomicityUnit is more appropriate, but I'm not sure.

In the context of transactions, when referring to the scope within which updates can be performed atomically, **"MutationAtomicityUnit"** is the better choice.

Here's why:

* **Unit**: "Unit" directly implies a distinct, defined **scope** or **entity**. The example you provided with `RECORD`, `PARTITION`, `TABLE`, etc., clearly defines the atomic "unit" of operation. It naturally conveys the idea that mutations are performed atomically for *each* of these specified units.
* **Granularity**: "Granularity" refers to the **fineness or coarseness** of something. While related, it describes the *degree* of detail or division rather than the specific scope itself. For instance, you might discuss whether the atomicity has "fine-grained" or "coarse-grained" granularity. When you want to name the actual *thing* that is atomic, "Unit" is more precise and less ambiguous.

Therefore, "MutationAtomicityUnit" more accurately and clearly communicates the intended meaning of the atomic **scope** for mutations.

What do you think? 🙇

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also asked Gemini. It suggests Granularity.... I think it depends on how we ask.

Question:
I'm considering how to name a method, and I would appreciate your advice.

We provide a transaction manager for various kinds of databases. The transaction manager pushes down write operations into the underlying databases based on the size that the databases can atomically write.

To make each database provide the size information, I created a StorageInfo interface, created a class that implements the interface for each database, and created a method that returns the size information.
The size information is an Enum class that has the following values: STORAGE, NAMESPACE, TABLE, PARTITION, and RECORD.

I have two candidate names for the method.
One is getMutationAtomicityUnit() and the other is getMutationAtomicityGranularity().
Which do you think is better?
Do you have any other better ideas?

Answer:
I think getMutationAtomicityGranularity() is the better choice.

Here’s a breakdown of why, along with some alternative ideas.

Comparison of Your Candidates
Both of your proposed names are very good because they are descriptive and unambiguous. However, getMutationAtomicityGranularity() has a slight edge.

getMutationAtomicityGranularity(): Slightly better. 👍
In database and systems design, granularity is the standard term for referring to the scale or level at which an operation occurs (e.g., fine-grained vs. coarse-grained). Since your Enum defines a scale from RECORD (fine-grained) to STORAGE (coarse-grained), this name is a perfect technical fit.

getMutationAtomicityUnit(): Also very good.
The word unit is also accurate. It correctly describes that the method returns the single, indivisible "chunk" for atomic writes. It's simple and clear. The only minor drawback is that "granularity" is more idiomatic in this specific technical domain.

Other Ideas
If you're open to other suggestions, here are a couple of alternatives that aim for simplicity while remaining clear.

getAtomicityLevel()
This is a strong contender. It's shorter and replaces the more technical "Granularity" with the simpler "Level," which is easily understood by any developer. The context of your transaction manager implies that the atomicity relates to mutations.

getAtomicWriteScope()
Using Scope is another excellent option. It clearly communicates that the Enum return value defines the boundary or scope (e.g., the scope of a table, the scope of a partition) within which writes are atomic.

Final Recommendation
Go with getMutationAtomicityGranularity(). It's the most precise and idiomatic term for your use case.

If you feel it's too much jargon, getAtomicityLevel() is a fantastic, simpler alternative that conveys the same meaning effectively.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you share the question as well?

Anyway, Unit seems also correct, so I'm fine with Unit, though.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My question was as follows:

In the context of transactions, which term is more appropriate to express the range in which updates can be performed atomically: MutationAtomicityUnit or MutationAtomicityGranularity?

Below is an example using MutationAtomicityUnit:

  /**
   * The mutation atomicity unit of the storage.
   *
   * <p>This enum defines the atomicity unit for mutations in the storage. It determines the scope
   * of atomicity for mutations such as put and delete.
   */
  enum MutationAtomicityUnit {
    /**
     * The atomicity unit is at the record level, meaning that mutations are performed atomically
     * for each record.
     */
    RECORD,

    /**
     * The atomicity unit is at the partition level, meaning that mutations are performed atomically
     * for each partition.
     */
    PARTITION,

    /**
     * The atomicity unit is at the table level, meaning that mutations are performed atomically for
     * each table.
     */
    TABLE,

    /**
     * The atomicity unit is at the namespace level, meaning that mutations are performed atomically
     * for each namespace.
     */
    NAMESPACE,

    /**
     * The atomicity unit is at the storage level, meaning that mutations are performed atomically
     * for the entire storage.
     */
    STORAGE
  }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the discussion, since Unit is also correct and simpler, let's go with as is.

*
* @return the maximum number of mutations that can be performed atomically in the storage
*/
int getMaxAtomicMutationCount();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
int getMaxAtomicMutationCount();
int getMaxAtomicMutationsCount();

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 0a32de8. Thanks!

@brfrn169 brfrn169 requested a review from feeblefakie June 12, 2025 05:41

@VisibleForTesting
static class AdminHolder {
public final String storageName;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[minor] These don't need to be public.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean the following right?

@VisibleForTesting
  static class AdminHolder {
    final String storageName;
    final DistributedStorageAdmin admin;

    AdminHolder(String storageName, DistributedStorageAdmin admin) {
      this.storageName = storageName;
      this.admin = admin;
    }
  }

BTW, considering that the class is already package-private, is there any benefit to removing public from the fields or constructor?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disclaimer: I'm a bit paranoia on this kind of minor things, and you don't need to follow the comment 😁

I was thinking of making the fields private:

  static class AdminHolder {
    private final String storageName;
    private final DistributedStorageAdmin admin;

    AdminHolder(String storageName, DistributedStorageAdmin admin) {
      this.storageName = storageName;
      this.admin = admin;
    }
  }

This should be stronger encapsulation. If the fields are package-private, I guess that these fields are intended to be accessed from other class in the same package and it requires more attention during code review. (Also, I basically follow Principle of Least Privilege if possible.)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I basically follow Principle of Least Privilege if possible.

Yeah, I totally agree with you. The downside I’ve noticed recently is that, in such cases, we sometimes have to change the access level when we need to use the class outside the package or in a different context, which can be a bit of a hassle. But I guess it’s a trade-off we have to accept.

Fixed in bd542b7. Thanks!

@brfrn169 brfrn169 requested a review from komamitsu June 12, 2025 09:14
Copy link
Contributor

@komamitsu komamitsu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 👍

Copy link
Contributor

@feeblefakie feeblefakie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thank you!

@brfrn169 brfrn169 merged commit ff9a2e2 into master Jun 13, 2025
55 checks passed
@brfrn169 brfrn169 deleted the introduce-storage-info branch June 13, 2025 04:38
brfrn169 added a commit that referenced this pull request Jun 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants