-
Notifications
You must be signed in to change notification settings - Fork 40
Introduce StorageInfo #2756
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce StorageInfo #2756
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces the StorageInfo concept to provide storage metadata, along with integrating a StorageInfoProvider that supplies this information to various storage implementations and operation checkers. Additionally, new error messages and integration tests for mutation atomicity units across different storages have been added.
- Introduces a new StorageInfo API and its implementation
- Refactors storage admin and operation checker implementations (Dynamo, Cosmos, Cassandra, etc.) to use StorageInfo and StorageInfoProvider
- Adds integration tests to validate mutation atomicity behavior
Reviewed Changes
Copilot reviewed 33 out of 33 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| core/src/main/java/com/scalar/db/storage/dynamo/Dynamo.java | Integrates StorageInfoProvider and updates constructor usage |
| core/src/main/java/com/scalar/db/storage/cosmos/CosmosOperationChecker.java | Updates constructor to use StorageInfoProvider for Cosmos storage |
| core/src/main/java/com/scalar/db/storage/cosmos/CosmosAdmin.java | Implements getStorageInfo returning predefined Cosmos StorageInfo |
| core/src/main/java/com/scalar/db/storage/cosmos/Cosmos.java | Instantiates CosmosAdmin and passes it to StorageInfoProvider |
| core/src/main/java/com/scalar/db/storage/cassandra/CassandraAdmin.java | Adds StorageInfo and its implementation for Cassandra storage |
| core/src/main/java/com/scalar/db/storage/cassandra/Cassandra.java | Updates to instantiate CassandraAdmin and use StorageInfoProvider |
| core/src/main/java/com/scalar/db/service/AdminService.java | Delegates getStorageInfo calls to the underlying admin |
| core/src/main/java/com/scalar/db/common/error/CoreError.java | Adds new error messages for multi-record, multi-table, multi-namespace, and multi-storage mutations |
| core/src/main/java/com/scalar/db/common/checker/OperationChecker.java | Enhances mutation atomicity checks using StorageInfo and custom error messages |
| core/src/main/java/com/scalar/db/common/StorageInfoProvider.java | Introduces a caching provider for StorageInfo retrieval |
| core/src/main/java/com/scalar/db/common/StorageInfoImpl.java | Provides an immutable implementation of the StorageInfo interface |
| core/src/main/java/com/scalar/db/common/CheckedDistributedStorageAdmin.java | Implements getStorageInfo with proper error wrapping |
| core/src/main/java/com/scalar/db/api/StorageInfo.java | Defines the new StorageInfo API interface and its atomicity unit enum |
| core/src/main/java/com/scalar/db/api/DistributedStorageAdmin.java | Updates the admin interface to include getStorageInfo |
| core/src/main/java/com/scalar/db/api/DistributedStorage.java | Updates documentation to reference the StorageInfo related behavior |
| Integration Test Files | Added new tests across multi-storage, JDBC, Dynamo, Cosmos, and Cassandra for atomicity unit validation |
| } | ||
|
|
||
| Mutation first = mutations.get(0); | ||
| assert first.forNamespace().isPresent(); |
Copilot
AI
Jun 11, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider replacing the assert with an explicit runtime check and throwing an appropriate exception to ensure the precondition is always enforced in production, as assertions may be disabled in runtime.
| assert first.forNamespace().isPresent(); | |
| if (!first.forNamespace().isPresent()) { | |
| throw new IllegalArgumentException( | |
| CoreError.OPERATION_CHECK_ERROR_NAMESPACE_NOT_PRESENT.buildMessage(first)); | |
| } |
| * <p>Note that this method only supports mutations within the atomicity unit specified by {@link | ||
| * StorageInfo#getMutationAtomicityUnit()}. For example, if the atomicity unit of the storage is | ||
| * {@link StorageInfo.MutationAtomicityUnit#PARTITION}, the mutations must occur within the same | ||
| * partition. Also note that the maximum number of mutations that can be performed atomically is | ||
| * defined by {@link StorageInfo#getMaxAtomicMutationCount()}. | ||
| * | ||
| * <p>To retrieve storage information, use {@link DistributedStorageAdmin#getStorageInfo(String)}. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extended the behavior of the DistributedStorage.mutate(). See this Javadoc for the details.
| /** | ||
| * Returns the storage information. | ||
| * | ||
| * @param namespace the namespace to get the storage information for | ||
| * @return the storage information | ||
| * @throws ExecutionException if the operation fails | ||
| */ | ||
| StorageInfo getStorageInfo(String namespace) throws ExecutionException; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can get the StorageInfo instance of the storage through this method.
| import javax.annotation.concurrent.ThreadSafe; | ||
|
|
||
| @ThreadSafe | ||
| public class StorageInfoProvider { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
StorageInfoProvider provides caching functionality for StorageInfo instances.
| // Check if all mutations are for the same partition | ||
| if (!mutation.forNamespace().equals(first.forNamespace()) | ||
| || !mutation.forTable().equals(first.forTable()) | ||
| || !mutation.getPartitionKey().equals(first.getPartitionKey())) { | ||
| // Check if the mutations are within the atomicity unit of the storage | ||
| if (isOutOfAtomicityUnit(first, storageInfoForFirst, mutation)) { | ||
| throw new IllegalArgumentException( | ||
| CoreError.OPERATION_CHECK_ERROR_MULTI_PARTITION_MUTATION.buildMessage(mutations)); | ||
| getErrorMessageForOutOfAtomicityUnit(storageInfoForFirst, mutations)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed the check logic based on the mutation atomicity unit of the storage.
| private static final StorageInfo STORAGE_INFO = | ||
| new StorageInfoImpl( | ||
| "cassandra", | ||
| StorageInfo.MutationAtomicityUnit.PARTITION, | ||
| // No limit on the number of mutations | ||
| Integer.MAX_VALUE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the StorageInfo instance for Cassandra.
| private static final StorageInfo STORAGE_INFO = | ||
| new StorageInfoImpl( | ||
| "cosmos", | ||
| StorageInfo.MutationAtomicityUnit.PARTITION, | ||
| // No limit on the number of mutations | ||
| Integer.MAX_VALUE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the StorageInfo instance for Cosmos DB.
| private static final StorageInfo STORAGE_INFO = | ||
| new StorageInfoImpl( | ||
| "dynamo", | ||
| StorageInfo.MutationAtomicityUnit.STORAGE, | ||
| // DynamoDB has a limit of 100 items per transactional batch write operation | ||
| 100); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the StorageInfo instance for DynamoDB.
| private static final StorageInfo STORAGE_INFO = | ||
| new StorageInfoImpl( | ||
| "jdbc", | ||
| StorageInfo.MutationAtomicityUnit.STORAGE, | ||
| // No limit on the number of mutations | ||
| Integer.MAX_VALUE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the StorageInfo instance for JDBC databases.
e5466c1 to
a52030e
Compare
a52030e to
8ca4fad
Compare
Torch3333
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you!
feeblefakie
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, looking good! Thank you!
Left naming suggestions. PTAL!
| * | ||
| * @return the mutation atomicity unit of the storage | ||
| */ | ||
| MutationAtomicityUnit getMutationAtomicityUnit(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| MutationAtomicityUnit getMutationAtomicityUnit(); | |
| MutationAtomicityGranularity getMutationAtomicityGranularity(); |
After some thought, I feel it's probably more proper to call it granularity, although it's a bit long.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gemini says MutationAtomicityUnit is more appropriate, but I'm not sure.
In the context of transactions, when referring to the scope within which updates can be performed atomically, **"MutationAtomicityUnit"** is the better choice.
Here's why:
* **Unit**: "Unit" directly implies a distinct, defined **scope** or **entity**. The example you provided with `RECORD`, `PARTITION`, `TABLE`, etc., clearly defines the atomic "unit" of operation. It naturally conveys the idea that mutations are performed atomically for *each* of these specified units.
* **Granularity**: "Granularity" refers to the **fineness or coarseness** of something. While related, it describes the *degree* of detail or division rather than the specific scope itself. For instance, you might discuss whether the atomicity has "fine-grained" or "coarse-grained" granularity. When you want to name the actual *thing* that is atomic, "Unit" is more precise and less ambiguous.
Therefore, "MutationAtomicityUnit" more accurately and clearly communicates the intended meaning of the atomic **scope** for mutations.
What do you think? 🙇
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also asked Gemini. It suggests Granularity.... I think it depends on how we ask.
Question:
I'm considering how to name a method, and I would appreciate your advice.
We provide a transaction manager for various kinds of databases. The transaction manager pushes down write operations into the underlying databases based on the size that the databases can atomically write.
To make each database provide the size information, I created a StorageInfo interface, created a class that implements the interface for each database, and created a method that returns the size information.
The size information is an Enum class that has the following values: STORAGE, NAMESPACE, TABLE, PARTITION, and RECORD.
I have two candidate names for the method.
One is getMutationAtomicityUnit() and the other is getMutationAtomicityGranularity().
Which do you think is better?
Do you have any other better ideas?
Answer:
I think getMutationAtomicityGranularity() is the better choice.
Here’s a breakdown of why, along with some alternative ideas.
Comparison of Your Candidates
Both of your proposed names are very good because they are descriptive and unambiguous. However, getMutationAtomicityGranularity() has a slight edge.
getMutationAtomicityGranularity(): Slightly better. 👍
In database and systems design, granularity is the standard term for referring to the scale or level at which an operation occurs (e.g., fine-grained vs. coarse-grained). Since your Enum defines a scale from RECORD (fine-grained) to STORAGE (coarse-grained), this name is a perfect technical fit.
getMutationAtomicityUnit(): Also very good.
The word unit is also accurate. It correctly describes that the method returns the single, indivisible "chunk" for atomic writes. It's simple and clear. The only minor drawback is that "granularity" is more idiomatic in this specific technical domain.
Other Ideas
If you're open to other suggestions, here are a couple of alternatives that aim for simplicity while remaining clear.
getAtomicityLevel()
This is a strong contender. It's shorter and replaces the more technical "Granularity" with the simpler "Level," which is easily understood by any developer. The context of your transaction manager implies that the atomicity relates to mutations.
getAtomicWriteScope()
Using Scope is another excellent option. It clearly communicates that the Enum return value defines the boundary or scope (e.g., the scope of a table, the scope of a partition) within which writes are atomic.
Final Recommendation
Go with getMutationAtomicityGranularity(). It's the most precise and idiomatic term for your use case.
If you feel it's too much jargon, getAtomicityLevel() is a fantastic, simpler alternative that conveys the same meaning effectively.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you share the question as well?
Anyway, Unit seems also correct, so I'm fine with Unit, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My question was as follows:
In the context of transactions, which term is more appropriate to express the range in which updates can be performed atomically: MutationAtomicityUnit or MutationAtomicityGranularity?
Below is an example using MutationAtomicityUnit:
/**
* The mutation atomicity unit of the storage.
*
* <p>This enum defines the atomicity unit for mutations in the storage. It determines the scope
* of atomicity for mutations such as put and delete.
*/
enum MutationAtomicityUnit {
/**
* The atomicity unit is at the record level, meaning that mutations are performed atomically
* for each record.
*/
RECORD,
/**
* The atomicity unit is at the partition level, meaning that mutations are performed atomically
* for each partition.
*/
PARTITION,
/**
* The atomicity unit is at the table level, meaning that mutations are performed atomically for
* each table.
*/
TABLE,
/**
* The atomicity unit is at the namespace level, meaning that mutations are performed atomically
* for each namespace.
*/
NAMESPACE,
/**
* The atomicity unit is at the storage level, meaning that mutations are performed atomically
* for the entire storage.
*/
STORAGE
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the discussion, since Unit is also correct and simpler, let's go with as is.
| * | ||
| * @return the maximum number of mutations that can be performed atomically in the storage | ||
| */ | ||
| int getMaxAtomicMutationCount(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| int getMaxAtomicMutationCount(); | |
| int getMaxAtomicMutationsCount(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 0a32de8. Thanks!
|
|
||
| @VisibleForTesting | ||
| static class AdminHolder { | ||
| public final String storageName; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[minor] These don't need to be public.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean the following right?
@VisibleForTesting
static class AdminHolder {
final String storageName;
final DistributedStorageAdmin admin;
AdminHolder(String storageName, DistributedStorageAdmin admin) {
this.storageName = storageName;
this.admin = admin;
}
}BTW, considering that the class is already package-private, is there any benefit to removing public from the fields or constructor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Disclaimer: I'm a bit paranoia on this kind of minor things, and you don't need to follow the comment 😁
I was thinking of making the fields private:
static class AdminHolder {
private final String storageName;
private final DistributedStorageAdmin admin;
AdminHolder(String storageName, DistributedStorageAdmin admin) {
this.storageName = storageName;
this.admin = admin;
}
}This should be stronger encapsulation. If the fields are package-private, I guess that these fields are intended to be accessed from other class in the same package and it requires more attention during code review. (Also, I basically follow Principle of Least Privilege if possible.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I basically follow Principle of Least Privilege if possible.
Yeah, I totally agree with you. The downside I’ve noticed recently is that, in such cases, we sometimes have to change the access level when we need to use the class outside the package or in a different context, which can be a bit of a hassle. But I guess it’s a trade-off we have to accept.
Fixed in bd542b7. Thanks!
komamitsu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! 👍
feeblefakie
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thank you!
Description
This PR introduces
StorageInfo, which provides metadata about the storage.Currently, it includes the following information:
For more details, refer to the
StorageInfointerface.You can get a
StorageInfoinstance via theDistributedStorageAdmin.getStorageInfo()method.This PR also extends the behavior of the
DistributedStorage.mutate()method based on the storage’s mutation atomicity unit and maximum atomic mutation count. For details, see the Javadoc for theDistributedStorage.mutate()method.Related issues and/or PRs
N/A
Changes made
Added some inline comments. Please take a look for the details.
Checklist
Additional notes (optional)
N/A
Release notes
Introduced
StorageInfo, which provides metadata about the storage. You can obtain aStorageInfoinstance via theDistributedStorageAdmin.getStorageInfo()method.