-
Notifications
You must be signed in to change notification settings - Fork 40
Add support for scan fetch size in storage adapters #2731
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for scan fetch size in storage adapters #2731
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces a configurable scanner fetch size across all storage adapters, allowing users to control how many records are fetched per scan/query operation.
- DatabaseConfig now reads
scalar.db.scanner_fetch_size(default 10). - All storage handlers (JDBC, Cassandra, DynamoDB, Cosmos) accept and propagate the fetch size into their underlying clients/requests.
- QueryScanner and JDBC scanner implementations have been updated to use the fetch size and updated tests accordingly.
Reviewed Changes
Copilot reviewed 23 out of 23 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| core/src/main/java/com/scalar/db/config/DatabaseConfig.java | Load and expose scannerFetchSize from properties |
| core/src/main/java/com/scalar/db/storage/jdbc/JdbcService.java | Pass scannerFetchSize into PreparedStatement.setFetchSize in 3-param scanner, but missing in 2-param version |
| core/src/main/java/com/scalar/db/storage/jdbc/ScannerImpl.java | Rename close flag, commit on close, updated exception details |
| core/src/main/java/com/scalar/db/storage/dynamo/QueryScanner.java | Apply fetchSize to paginated requests |
| core/src/main/java/com/scalar/db/storage/cosmos/SelectStatementHandler.java | Propagate fetchSize into Cosmos query options |
| core/src/main/java/com/scalar/db/storage/cassandra/SelectStatementHandler.java | Set driver fetch size on bound statements |
| core/src/test/java/com/scalar/db/config/DatabaseConfigTest.java | Add tests for default/explicit scanner fetch size |
| core/src/test/java/com/scalar/db/storage/** | Update tests in all adapters to assert correct use of fetch size |
Comments suppressed due to low confidence (2)
core/src/main/java/com/scalar/db/storage/jdbc/JdbcService.java:100
- The two-parameter getScanner method does not set the fetch size on the PreparedStatement, so the configured scannerFetchSize is ignored when calling getScanner(scan, connection). Consider adding preparedStatement.setFetchSize(scannerFetchSize) before executing the query.
PreparedStatement preparedStatement = connection.prepareStatement(selectQuery.sql());
core/src/test/java/com/scalar/db/storage/jdbc/SelectStatementHandlerTest.java:610
- No test covers the two-parameter getScanner method in JdbcService to verify that the fetchSize is applied correctly. Consider adding a unit test to simulate getScanner(scan, connection) and assert that PreparedStatement.setFetchSize is called with the configured scannerFetchSize.
}
|
|
||
| systemNamespaceName = getSystemNamespaceName(getProperties()); | ||
|
|
||
| scannerFetchSize = getInt(getProperties(), SCANNER_FETCH_SIZE, DEFAULT_SCANNER_FETCH_SIZE); |
Copilot
AI
Jun 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing validation for scannerFetchSize from configuration. Negative or zero values could lead to unexpected behavior. Consider enforcing scannerFetchSize > 0 and throwing an exception on invalid values.
| } | ||
| } | ||
|
|
||
| @Override |
Copilot
AI
Jun 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Add @OverRide annotation to the close() method to clarify that it implements the Closeable/AutoCloseable interface and to catch mismatches in the method signature.
| @Override | |
| @Override | |
| @Override |
| public static final String CROSS_PARTITION_SCAN_FILTERING = SCAN_PREFIX + "filtering.enabled"; | ||
| public static final String CROSS_PARTITION_SCAN_ORDERING = SCAN_PREFIX + "ordering.enabled"; | ||
| public static final String SYSTEM_NAMESPACE_NAME = PREFIX + "system_namespace_name"; | ||
| public static final String SCANNER_FETCH_SIZE = PREFIX + "scanner_fetch_size"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a new property scalar.db.scan_fetch_size.
| @Override | ||
| @Nonnull | ||
| protected ResultSet execute(BoundStatement bound, Operation operation) { | ||
| bound.setFetchSize(fetchSize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For Cassandra, call Statement.setFetchSize().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the main change for Dynamo.
| Connection connection = null; | ||
| try { | ||
| connection = dataSource.getConnection(); | ||
| connection.setAutoCommit(false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For JDBC, call Connection.setAutoCommit(false).
| SelectQuery selectQuery = buildSelectQuery(scan, tableMetadata); | ||
| PreparedStatement preparedStatement = connection.prepareStatement(selectQuery.sql()); | ||
| selectQuery.bind(preparedStatement); | ||
| preparedStatement.setFetchSize(scannerFetchSize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then, call Statement.setFetchSize() as well for JDBC.
|
|
||
| @Override | ||
| public Map<String, String> getConnectionProperties() { | ||
| return Collections.singletonMap("useCursorFetch", "true"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, for MySQL, we need to set useCursorFetch=true in the connection properties.
| .queryItems(query, queryOptions.setMaxBufferedItemCount(fetchSize), Record.class) | ||
| .iterableByPage(fetchSize) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the main change for Cosmos.
feeblefakie
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, looking good to me!
Left a minor naming suggestion. PTAL!
| public static final String CROSS_PARTITION_SCAN_FILTERING = SCAN_PREFIX + "filtering.enabled"; | ||
| public static final String CROSS_PARTITION_SCAN_ORDERING = SCAN_PREFIX + "ordering.enabled"; | ||
| public static final String SYSTEM_NAMESPACE_NAME = PREFIX + "system_namespace_name"; | ||
| public static final String SCANNER_FETCH_SIZE = PREFIX + "scanner_fetch_size"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably, it should be scan_fetch_size since a scanner is more like an iterator for results that are scanned with the specified fetch size?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Fixed in 877ddbf.
feeblefakie
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thank you!
|
@brfrn169 It seems no integration test case is added that scans records more than the fetch size. How about adding such a test case? Sorry in advance if existing test cases already cover that. |
@komamitsu We have the following tests for large scans: scalardb/integration-test/src/main/java/com/scalar/db/api/DistributedStorageIntegrationTestBase.java Lines 1880 to 2015 in 877ddbf
|
komamitsu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! 👍
Torch3333
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you!
thongdk8
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thank you for the PR. I have also checked and confirmed that the fix works well with the default configuration for all the supported databases.
@Torch3333 Thank you for your confirmation! |
Description
This PR adds support for scan fetch size in the storage adapters. It introduces a property,
scalar.db.scan_fetch_size, to control the number of records fetched by the scan API in the storage layer. It includes support for all storage adapters.Related issues and/or PRs
N/A
Changes made
scalar.db.scan_fetch_size.Checklist
Additional notes (optional)
N/A
Release notes
Added support for scan fetch size in the storage adapters. You can control the number of records fetched by the scan API in the storage layer by configuring the
scalar.db.scan_fetch_sizeproperty.