-
Notifications
You must be signed in to change notification settings - Fork 3
feat: Add coordinator implementation of CLP connector. #15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ataProvider and ClpMySqlSplitProvider
…mizer and unit tests for a future PR
WalkthroughA new Presto connector module named Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant PrestoEngine
participant ClpPlugin
participant ClpConnectorFactory
participant ClpConnector
participant ClpMetadata
participant ClpSplitManager
participant ClpRecordSetProvider
participant ClpMetadataProvider (MySQL)
participant ClpSplitProvider (MySQL)
User->>PrestoEngine: Submit SQL query
PrestoEngine->>ClpPlugin: Load connector factories
ClpPlugin->>ClpConnectorFactory: getConnectorFactories()
PrestoEngine->>ClpConnectorFactory: create()
ClpConnectorFactory->>ClpConnector: instantiate with dependencies
PrestoEngine->>ClpConnector: beginTransaction()
PrestoEngine->>ClpConnector: getMetadata()
PrestoEngine->>ClpMetadata: listSchemaNames()/listTables()/getTableMetadata()
ClpMetadata->>ClpMetadataProvider (MySQL): listTableHandles()/listColumnHandles()
PrestoEngine->>ClpConnector: getSplitManager()
PrestoEngine->>ClpSplitManager: getSplits()
ClpSplitManager->>ClpSplitProvider (MySQL): listSplits()
PrestoEngine->>ClpConnector: getRecordSetProvider()
PrestoEngine->>ClpRecordSetProvider: getRecordSet() (not supported)
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 34
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (30)
pom.xml(1 hunks)presto-clp/pom.xml(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpColumnHandle.java(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnector.java(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnectorFactory.java(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpErrorCode.java(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpExpression.java(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpHandleResolver.java(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadata.java(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlugin.java(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpRecordSetProvider.java(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpSplit.java(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpSplitManager.java(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableHandle.java(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableLayoutHandle.java(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTransactionHandle.java(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMySqlMetadataProvider.java(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpNodeType.java(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpSchemaTree.java(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpSplitProvider.java(1 hunks)presto-clp/src/test/java/com/facebook/presto/plugin/clp/ClpMetadataDbSetUp.java(1 hunks)presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpMetadata.java(1 hunks)presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java(1 hunks)presto-docs/src/main/sphinx/connector.rst(1 hunks)presto-docs/src/main/sphinx/connector/clp.rst(1 hunks)presto-server/src/main/provisio/presto.xml(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (4)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpSplitProvider.java (2)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpSplit.java (1)
ClpSplit(28-62)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableLayoutHandle.java (1)
ClpTableLayoutHandle(25-79)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java (2)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpColumnHandle.java (1)
ClpColumnHandle(26-117)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableHandle.java (1)
ClpTableHandle(25-96)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpColumnHandle.java (1)
presto-spi/src/main/java/com/facebook/presto/spi/ColumnMetadata.java (1)
ColumnMetadata(29-205)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpSchemaTree.java (1)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpColumnHandle.java (1)
ClpColumnHandle(26-117)
⏰ Context from checks skipped due to timeout of 90000ms (17)
- GitHub Check: test (17.0.13, :presto-main)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-distributed-queries)
- GitHub Check: test (8.0.442, :presto-main-base)
- GitHub Check: test (17.0.13, :presto-main-base)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-tpch-distributed-queries)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-local-queries)
- GitHub Check: test (17.0.13, :presto-tests -P presto-tests-execution-memory)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-distributed-queries)
- GitHub Check: test (8.0.442, :presto-tests -P presto-tests-general)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-plan-determinism)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-local-queries)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-distributed-non-hash-gen)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-tpch-distributed-queries)
- GitHub Check: test (8.0.442, :presto-tests -P presto-tests-execution-memory)
- GitHub Check: maven-checks (8.0.442)
- GitHub Check: prestocpp-linux-build-for-test
- GitHub Check: maven-checks (17.0.13)
🔇 Additional comments (9)
presto-docs/src/main/sphinx/connector.rst (1)
17-17: Add CLP connector entry to documentation indexThe CLP connector has been included in the toctree. Please verify that
connector/clp.rstexists and that the reference resolves correctly in the generated documentation.presto-server/src/main/provisio/presto.xml (1)
236-240: Provision new CLP connector pluginAn
artifactSetforpresto-clphas been added. Confirm that the Maven coordinates (${project.groupId}:presto-clp:zip:${project.version}) align with your module’s POM and that the plugin directory is populated as expected during provisioning.pom.xml (1)
218-218: Register the CLP connector module.
The new<module>presto-clp</module>is correctly added to the root POM. Ensure this aligns with provisioning and documentation updates.Please verify that the connector is also declared in
presto-server/src/main/provisio/presto.xml.presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlugin.java (1)
23-27: Expose connector factory.
getConnectorFactories()correctly returns an immutable list containingClpConnectorFactory, conforming to Presto SPI expectations for a single‐factory plugin.presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpRecordSetProvider.java (1)
25-35:❌ Incorrect review comment
Advertised provider throws unconditionally – verify connector wiring
Registering a
ConnectorRecordSetProviderthat always throws risks runtime failures if the engine ever routes scan tasks here (e.g., during fallback when noPageSourceProvideris present). If record sets are intentionally unsupported, omit this class from the Guice bindings, or implement the properPageSourceProviderand let this class disappear.-@Singleton -bind(ConnectorRecordSetProvider.class).to(ClpRecordSetProvider.class); +// Remove binding – record sets not supported +// bind(ConnectorPageSourceProvider.class).to(ClpPageSourceProvider.class);Please double-check
ClpModuleto ensure only the supported scan path is exposed.
To verify how
ClpModuleis wiringConnectorRecordSetProvider, let’s inspect its bindings:
🏁 Script executed:
#!/bin/bash # Locate ClpModule and show its Guice bindings rg "class ClpModule" -n -A200 # Check for any binding references to ConnectorRecordSetProvider rg "bind.*ConnectorRecordSetProvider" -nLength of output: 3767
Ignore record-set provider removal suggestion
The
ClpModuleonly binds the implementation class (ClpRecordSetProvider) to itself—it does not bindConnectorRecordSetProvidertoClpRecordSetProvider. As a result, this provider is never registered or invoked by the engine, and there’s no risk of unconditionally thrown exceptions at runtime.Likely an incorrect or invalid review comment.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpErrorCode.java (1)
25-30: Error-code block looks goodThe enum follows Presto conventions (unique high-order bits,
EXTERNALtype). No issues spotted.presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java (1)
39-46:❓ Verification inconclusive
Configuration read occurs before injector is fully initialised
buildConfigObject(ClpConfig.class)pulls configuration during module bootstrap.
If a required property is missing or malformed, the thrownPrestoExceptionwill abort the entire server startup before Guice has a chance to give a helpful diagnostics trace.Consider delaying provider-type validation to
@PostConstructinClpConnector(or dedicated validator) so that bootstrap can succeed and the connector reports a clearer error in the query failure path.This also removes hard coupling between module wiring and provider enumeration values.
🏁 Script executed:
#!/bin/bash set -eux # Locate ClpModule.java CLP_MODULE_PATH=$(fd ClpModule.java -t f | head -n1) echo "ClpModule file path: $CLP_MODULE_PATH" # Display the first 200 lines of ClpModule.java for context sed -n '1,200p' "$CLP_MODULE_PATH" # Find occurrences of buildConfigObject in the repository grep -R "buildConfigObject" -n . # Locate AbstractConfigurationAwareModule to inspect buildConfigObject implementation ACAM_PATH=$(grep -R "class AbstractConfigurationAwareModule" -n . | cut -d: -f1 | head -n1) echo "AbstractConfigurationAwareModule file path: $ACAM_PATH" sed -n '1,200p' "$ACAM_PATH"Length of output: 13425
Delay provider-type validation until connector startup
Currently,
ClpModule.setupcallsbuildConfigObject(ClpConfig.class)during Guice module configuration, which will throw aPrestoExceptionand abort server startup ifmetadataProviderType(orsplitProviderType) is missing or invalid—before the injector can provide full diagnostics. Move these checks into a@PostConstructmethod inClpConnector(or a dedicated validator) so that the server can start normally and the connector reports clear errors at runtime. This also decouples module wiring from enum values.
- File:
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java
Lines: 39–46presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java (2)
70-80: Add null/blank checks for mandatory DB connection properties
metadataDbUrl,metadataDbName, andmetadataDbUserare essential for the MySQL provider but are accepted unchecked. A misconfigured cluster would fail much later at runtime. Consider validating them eagerly (e.g., in a@PostConstructmethod or individual setters) and throwingPrestoExceptionwithCLP_UNSUPPORTED_CONFIG_OPTIONif a required value is missing or blank.Also applies to: 82-92, 94-104
24-32: Enum extensibility looks fineUsing enums for provider types with a single
MYSQLoption is reasonable for the initial drop and leaves room for future back-ends. No action required.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTransactionHandle.java
Show resolved
Hide resolved
| public interface ClpSplitProvider | ||
| { | ||
| /** | ||
| * Returns a list of splits for the given table layout handle. | ||
| */ | ||
| List<ClpSplit> listSplits(ClpTableLayoutHandle clpTableLayoutHandle); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Return type limits scalability
For very large tables, materialising all splits eagerly into a List<ClpSplit> can exhaust memory and block planning. Presto’s SPI usually streams splits via ConnectorSplitSource. Consider changing the contract to return a ConnectorSplitSource (or at least Iterable<ClpSplit>) so implementations can lazily yield splits.
🤖 Prompt for AI Agents
In
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpSplitProvider.java
around lines 21 to 27, the method listSplits currently returns a List<ClpSplit>,
which eagerly materializes all splits and can cause memory issues for large
tables. Change the return type to ConnectorSplitSource or at least
Iterable<ClpSplit> to enable lazy streaming of splits. Update the method
signature accordingly and adjust implementations to yield splits lazily instead
of returning a fully populated list.
| /** | ||
| * Returns a list of splits for the given table layout handle. | ||
| */ | ||
| List<ClpSplit> listSplits(ClpTableLayoutHandle clpTableLayoutHandle); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Missing nullability / immutability hints
Annotating the parameter and return value with @Nonnull (and returning an ImmutableList) clarifies expectations and prevents accidental nulls.
🤖 Prompt for AI Agents
In
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpSplitProvider.java
around lines 23 to 26, the method listSplits lacks nullability annotations and
does not specify immutability for its return type. Add @Nonnull annotations to
the parameter clpTableLayoutHandle and the method return type to indicate they
should never be null. Change the return type to ImmutableList<ClpSplit> to
ensure the returned list is immutable, improving safety and clarity.
| public interface ClpMetadataProvider | ||
| { | ||
| /** | ||
| * Returns the list of column handles for the given table. | ||
| */ | ||
| List<ClpColumnHandle> listColumnHandles(SchemaTableName schemaTableName); | ||
|
|
||
| /** | ||
| * Returns the list of table handles in the given schema. | ||
| */ | ||
| List<ClpTableHandle> listTableHandles(String schema); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Interface should specify caching / concurrency guarantees
Metadata calls may be invoked concurrently by Presto workers. The interface should document whether implementations must be thread-safe and whether returned collections may be cached / mutated. Adding @ThreadSafe (if applicable) and returning immutable collections would make the contract explicit.
🤖 Prompt for AI Agents
In
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java
around lines 22 to 33, the interface lacks documentation on thread-safety and
caching guarantees. Update the interface to explicitly state whether
implementations must be thread-safe, add the @ThreadSafe annotation if
applicable, and modify the methods to return immutable collections to prevent
mutation and clarify caching behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wraymo Understood—if only a single instance lives in the coordinator process, concurrency concerns are moot, so the extra thread-safety annotations and immutability constraints aren’t required. I’ll keep that context in mind for future reviews. Thanks for clarifying!
✏️ Learnings added
Learnt from: wraymo
PR: y-scope/presto#15
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java:22-33
Timestamp: 2025-06-13T12:56:06.302Z
Learning: `ClpMetadataProvider` is instantiated only once and used solely by the Presto coordinator, so concurrency/thread-safety guarantees are unnecessary.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpMetadata.java
Show resolved
Hide resolved
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpSchemaTree.java
Show resolved
Hide resolved
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpSchemaTree.java
Show resolved
Hide resolved
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadata.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (8)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpMetadata.java (2)
57-65: 🧹 Nitpick (assertive)Still relying on Apache Math3
Pair; consider a lighter alternativePrior feedback suggested replacing
org.apache.commons.math3.util.Pairwith a smaller dependency (e.g.,ImmutablePair, a Java record, orMap.entry). Keeping Math3 just for tests pulls in ~1 MB of transitive jars.- new Pair<>("a", ClpNodeType.Integer), - new Pair<>("a", ClpNodeType.VarString), + Map.entry("a", ClpNodeType.Integer), + Map.entry("a", ClpNodeType.VarString),You’d then adapt the helper to accept
Map.Entry<String, ClpNodeType>.
Same logic, leaner class-path.
83-86: 🧹 Nitpick (assertive)Replace mutable
HashSetwith immutable collection for clarityA mutable
HashSetis built only to be compared immediately. UseImmutableSetto reduce noise:- HashSet<SchemaTableName> tables = new HashSet<>(); - tables.add(new SchemaTableName(DEFAULT_SCHEMA_NAME, tableName)); - assertEquals(new HashSet<>(metadata.listTables(SESSION, Optional.empty())), tables); + ImmutableSet<SchemaTableName> expected = + ImmutableSet.of(new SchemaTableName(DEFAULT_SCHEMA_NAME, tableName)); + assertEquals(ImmutableSet.copyOf( + metadata.listTables(SESSION, Optional.empty())), expected);presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java (1)
64-76: 🧹 Nitpick (assertive)Hard-coded
/tmpclean-up path is brittleThe tear-down assumes H2 always writes to
/tmp. This breaks on Windows or when the JVM’s temp dir is re-configured. Capture the actual file location fromClpMetadataDbSetUp(or useFiles.createTempDirectory) and delete via that handle instead of hard-coding.No functional risk today, but the test will fail in CI environments with non-standard temp dirs.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpSplit.java (1)
58-62: 🧹 Nitpick (assertive)Add
toString()for easier debugging of splitsA small helper greatly simplifies log inspection and test diffs.
@@ public Object getInfo() { return this; } + + @Override + public String toString() + { + return "ClpSplit{path='" + path + "'}"; + } }presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableHandle.java (1)
38-46: 🧹 Nitpick (assertive)Fail fast on null handle components
schemaTableName,tablePath, andstorageTypeare required for identity but accepted unchecked. AddrequireNonNullto surface mis-wired injections early.- this.schemaTableName = schemaTableName; - this.tablePath = tablePath; - this.storageType = storageType; + this.schemaTableName = java.util.Objects.requireNonNull(schemaTableName, "schemaTableName is null"); + this.tablePath = java.util.Objects.requireNonNull(tablePath, "tablePath is null"); + this.storageType = java.util.Objects.requireNonNull(storageType, "storageType is null");presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpColumnHandle.java (1)
35-45: 🧹 Nitpick (assertive)Enforce non-null invariants for column handle fields
For consistency with other Presto handle classes, guard all reference fields:
- this.columnName = columnName; - this.originalColumnName = originalColumnName; - this.columnType = columnType; + this.columnName = java.util.Objects.requireNonNull(columnName, "columnName is null"); + this.originalColumnName = java.util.Objects.requireNonNull(originalColumnName, "originalColumnName is null"); + this.columnType = java.util.Objects.requireNonNull(columnType, "columnType is null");presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpSchemaTree.java (1)
137-147: 🧹 Nitpick (assertive)Consider sorting root-level columns for deterministic order
Nested
ROWfields are now sorted, but top-level columns are still emitted inHashMapiteration order. While functional correctness is unaffected, stable ordering helps with plan-cache keys, golden-file tests, and diff-based reviews.-for (Map.Entry<String, ClpNode> entry : root.children.entrySet()) { +for (String name : new java.util.TreeSet<>(root.children.keySet())) { + ClpNode child = root.children.get(name);presto-clp/src/test/java/com/facebook/presto/plugin/clp/ClpMetadataDbSetUp.java (1)
140-142: ReplaceSystem.out.printlnwith a logger
Same concern raised earlier; direct stdout still appears here.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (8)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpColumnHandle.java(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadata.java(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpSplit.java(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableHandle.java(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpSchemaTree.java(1 hunks)presto-clp/src/test/java/com/facebook/presto/plugin/clp/ClpMetadataDbSetUp.java(1 hunks)presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpMetadata.java(1 hunks)presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (17)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-distributed-non-hash-gen)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-local-queries)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-aggregation-queries)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-aggregation-queries)
- GitHub Check: test (17.0.13, :presto-tests -P presto-tests-execution-memory)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-distributed-queries)
- GitHub Check: test (17.0.13, :presto-tests -P presto-tests-general)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-resource-manager)
- GitHub Check: test (8.0.442, :presto-tests -P presto-tests-general)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-distributed-non-hash-gen)
- GitHub Check: test (8.0.442, :presto-tests -P presto-tests-execution-memory)
- GitHub Check: maven-checks (17.0.13)
- GitHub Check: maven-checks (8.0.442)
- GitHub Check: test (:presto-docs)
- GitHub Check: maven-checks (17.0.13)
- GitHub Check: maven-checks (8.0.442)
- GitHub Check: prestocpp-linux-build-for-test
| if (prefix.getTableName() == null) { | ||
| schemaTableNames = listTables(session, Optional.ofNullable(prefix.getSchemaName())); | ||
| } | ||
| else { | ||
| schemaTableNames = ImmutableList.of(new SchemaTableName(prefix.getSchemaName(), prefix.getTableName())); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Null schema triggers NPE in listTableColumns
When prefix.getSchemaName() is null and prefix.getTableName() is provided, you build
new SchemaTableName(null, …), which violates Presto’s non-null contract and blows up at runtime.
-else {
- schemaTableNames = ImmutableList.of(
- new SchemaTableName(prefix.getSchemaName(), prefix.getTableName()));
+else {
+ String schema = Optional.ofNullable(prefix.getSchemaName())
+ .orElse(DEFAULT_SCHEMA_NAME);
+ schemaTableNames = ImmutableList.of(new SchemaTableName(schema, prefix.getTableName()));
}Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadata.java
around lines 162 to 167, the code creates a SchemaTableName with a null schema
name when prefix.getSchemaName() is null but prefix.getTableName() is not,
causing a NullPointerException. To fix this, add a null check for
prefix.getSchemaName() before constructing SchemaTableName and handle the null
case appropriately, ensuring that SchemaTableName is never created with a null
schema name.
| // The remaining (non-translatable) portion of the RowExpression, if any. | ||
| private final Optional<RowExpression> remainingExpression; | ||
|
|
||
| public ClpExpression(Optional<String> definition, Optional<RowExpression> remainingExpression) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some IDEA warnings about that the Optional should only be used in the resturn type not the argument type. I think it make sense. Maybe in the caller we can call something like new ClpExpression(definition.orElse(null), remainingExpression.orElse(null)) instead of definiing it as Opational here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I think it's pretty common in Presto code. e.g. BasicQueryInfo, QueryStateInfo, SessionRepresentation, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. But I think we may want to make the code style consistent within our connector. In my metadata filter PR, I also need to add properties to this class, shall I also make it Optional? Maybe the higer level question is when to use Optional and when not use. WDYT @kirkrodrigues ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Read a bit about it. I don't think there's a good reason to use Optional for parameters.
However, Presto's style guide does require it for public method parameters if the method isn't performance critical.
Personally, I would just use overloading in this case (i.e., remove this constructor and get rid of the Optional in the other constructors). This case doesn't really need Optional though since we have overloads, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although this class is delayed to the next PR, I think this constructor can be considered as performance critical because it is in the critical path (to generate the pushdown and metadata filter SQL). So I think we can use @Nullable to annotate the nullable parameters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not in the critical path, as there are over 100 plan optimizers, and in most real-world cases, the coordinator accounts for less than 10%—or even under 1%—of the total query time. Even if all of the coordinator's time were spent on plan optimization, its impact on the overall query latency would be below 0.1%. The true critical path lies in data scanning and expression evaluation. As discussed before, our plan was to pass raw types but store them as Optional
presto-clp/src/test/java/com/facebook/presto/plugin/clp/ClpMetadataDbSetUp.java
Outdated
Show resolved
Hide resolved
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java
Outdated
Show resolved
Hide resolved
presto-clp/src/test/java/com/facebook/presto/plugin/clp/ClpMetadataDbSetUp.java
Outdated
Show resolved
Hide resolved
presto-clp/src/test/java/com/facebook/presto/plugin/clp/ClpMetadataDbSetUp.java
Show resolved
Hide resolved
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadata.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
♻️ Duplicate comments (2)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadata.java (2)
165-173: Still constructsSchemaTableNamewithnullschema – runtime NPE persists
schemaNamecan benullwhen the caller supplied only a table name (prefix.getSchemaName()==null).
new SchemaTableName(null, …)violates Presto’s non-null contract and blows up at runtime (same issue raised in the previous review).-else { - SchemaTableName table = new SchemaTableName(schemaName, prefix.getTableName()); +else { + // Default the schema when not provided to avoid NPEs + String effectiveSchema = Optional.ofNullable(schemaName).orElse(DEFAULT_SCHEMA_NAME); + SchemaTableName table = new SchemaTableName(effectiveSchema, prefix.getTableName());
175-179:getTableHandle()may returnnull⇒ guaranteed NPEThe stream unconditionally passes the result of
getTableHandle()togetTableMetadata().
When the table is missing, this evaluates togetTableMetadata(session, null)and dereferencesnullon the first cast insidegetTableMetadata().A minimal Java-8-compatible fix:
return schemaTableNames.stream() - .collect(ImmutableMap.toImmutableMap( - Function.identity(), - tableName -> getTableMetadata(session, getTableHandle(session, tableName)).getColumns())); + .map(tableName -> { + ConnectorTableHandle handle = getTableHandle(session, tableName); + return handle == null ? null : ImmutableMap.of( + tableName, + getTableMetadata(session, handle).getColumns()); + }) + .filter(Objects::nonNull) + .flatMap(map -> map.entrySet().stream()) + .collect(ImmutableMap.toImmutableMap(Map.Entry::getKey, Map.Entry::getValue));(or switch to an
ImmutableMap.Builderloop for clarity).Leaving this unresolved will crash any metadata listing that references a non-existent table.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (3)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadata.java(1 hunks)presto-clp/src/test/java/com/facebook/presto/plugin/clp/ClpMetadataDbSetUp.java(1 hunks)presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/ClpMetadataDbSetUp.java (2)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMySqlMetadataProvider.java (1)
ClpMySqlMetadataProvider(33-135)presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java (1)
ClpMySqlSplitProvider(32-97)
⏰ Context from checks skipped due to timeout of 90000ms (13)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-plan-determinism)
- GitHub Check: test (8.0.442, :presto-tests -P presto-tests-execution-memory)
- GitHub Check: test (8.0.442, :presto-main-base)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-resource-manager)
- GitHub Check: test (8.0.442, :presto-main)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-local-queries)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-aggregation-queries)
- GitHub Check: maven-checks (8.0.442)
- GitHub Check: maven-checks (17.0.13)
- GitHub Check: test (:presto-docs)
- GitHub Check: maven-checks (17.0.13)
- GitHub Check: maven-checks (8.0.442)
- GitHub Check: prestocpp-linux-build-for-test
🔇 Additional comments (1)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java (1)
84-92: LGTM – deterministic comparison fixes earlier flakinessSwitching to
ImmutableSetremoves the ordering assumption and makes the test reliable. Nice catch.
| private final String databaseName = "split_testdb"; | ||
|
|
||
| @BeforeMethod | ||
| public void setUp() | ||
| { | ||
| clpMetadataDbSetUp = new ClpMetadataDbSetUp(); | ||
| tableSplits = new HashMap<>(); | ||
|
|
||
| int numKeys = 3; | ||
| int numValuesPerKey = 10; | ||
|
|
||
| for (int i = 0; i < numKeys; i++) { | ||
| String key = "test_" + i; | ||
| List<String> values = new ArrayList<>(); | ||
|
|
||
| for (int j = 0; j < numValuesPerKey; j++) { | ||
| values.add("id_" + j); | ||
| } | ||
|
|
||
| tableSplits.put(key, values); | ||
| } | ||
| clpSplitProvider = clpMetadataDbSetUp.setupSplit(databaseName, tableSplits); | ||
| } | ||
|
|
||
| @AfterMethod | ||
| public void tearDown() | ||
| { | ||
| clpMetadataDbSetUp.tearDown(databaseName); | ||
| } | ||
|
|
||
| @Test | ||
| public void testListSplits() | ||
| { | ||
| for (Map.Entry<String, List<String>> entry : tableSplits.entrySet()) { | ||
| String tableName = entry.getKey(); | ||
| String tablePath = "/tmp/archives/" + tableName; | ||
| List<String> expectedSplits = entry.getValue(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Hard-coded DB name & /tmp path risk collisions across parallel runs
Using the constant split_testdb and a fixed on-disk directory means two test invocations (CI shards, IDE re-runs, etc.) will trample each other’s files or fail to start H2 because the DB is already open.
Consider:
- private final String databaseName = "split_testdb";
+ private final String databaseName = "split_testdb_" + System.nanoTime();and parameterising tablePath via Files.createTempDirectory("archives_") propagated through ClpMetadataDbSetUp.
This keeps the test hermetic and OS-portable.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| private final String databaseName = "split_testdb"; | |
| @BeforeMethod | |
| public void setUp() | |
| { | |
| clpMetadataDbSetUp = new ClpMetadataDbSetUp(); | |
| tableSplits = new HashMap<>(); | |
| int numKeys = 3; | |
| int numValuesPerKey = 10; | |
| for (int i = 0; i < numKeys; i++) { | |
| String key = "test_" + i; | |
| List<String> values = new ArrayList<>(); | |
| for (int j = 0; j < numValuesPerKey; j++) { | |
| values.add("id_" + j); | |
| } | |
| tableSplits.put(key, values); | |
| } | |
| clpSplitProvider = clpMetadataDbSetUp.setupSplit(databaseName, tableSplits); | |
| } | |
| @AfterMethod | |
| public void tearDown() | |
| { | |
| clpMetadataDbSetUp.tearDown(databaseName); | |
| } | |
| @Test | |
| public void testListSplits() | |
| { | |
| for (Map.Entry<String, List<String>> entry : tableSplits.entrySet()) { | |
| String tableName = entry.getKey(); | |
| String tablePath = "/tmp/archives/" + tableName; | |
| List<String> expectedSplits = entry.getValue(); | |
| private final String databaseName = "split_testdb_" + System.nanoTime(); | |
| @BeforeMethod | |
| public void setUp() | |
| { | |
| clpMetadataDbSetUp = new ClpMetadataDbSetUp(); | |
| tableSplits = new HashMap<>(); | |
| int numKeys = 3; | |
| int numValuesPerKey = 10; | |
| for (int i = 0; i < numKeys; i++) { | |
| String key = "test_" + i; | |
| List<String> values = new ArrayList<>(); | |
| for (int j = 0; j < numValuesPerKey; j++) { | |
| values.add("id_" + j); | |
| } | |
| tableSplits.put(key, values); | |
| } | |
| clpSplitProvider = clpMetadataDbSetUp.setupSplit(databaseName, tableSplits); | |
| } | |
| @AfterMethod | |
| public void tearDown() | |
| { | |
| clpMetadataDbSetUp.tearDown(databaseName); | |
| } | |
| @Test | |
| public void testListSplits() | |
| { | |
| for (Map.Entry<String, List<String>> entry : tableSplits.entrySet()) { | |
| String tableName = entry.getKey(); | |
| String tablePath = "/tmp/archives/" + tableName; | |
| List<String> expectedSplits = entry.getValue(); |
🤖 Prompt for AI Agents
In presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java
between lines 40 and 76, the test uses a hard-coded database name "split_testdb"
and a fixed directory path "/tmp/archives/", which can cause conflicts when
tests run in parallel. To fix this, modify the setup to generate a unique
temporary directory for the tablePath using
Files.createTempDirectory("archives_") and update ClpMetadataDbSetUp to accept
and use this dynamic path. Also, parameterize the database name if needed to
avoid collisions, ensuring tests run hermetically and are OS-portable.
presto-clp/src/test/java/com/facebook/presto/plugin/clp/ClpMetadataDbSetUp.java
Outdated
Show resolved
Hide resolved
| public static final String metadataDbUrlTemplate = | ||
| "jdbc:h2:file:/tmp/%s;MODE=MySQL;DATABASE_TO_UPPER=FALSE"; | ||
| public static final String metadataDbTablePrefix = "clp_"; | ||
| public static final String metadataDbUser = "sa"; | ||
| public static final String metadataDbPassword = ""; | ||
| private final String datasetsTableName = metadataDbTablePrefix + "datasets"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
/tmp hard-coding limits portability
metadataDbUrlTemplate writes directly to /tmp, which doesn’t exist (or isn’t writable) on all OSes.
Generate a temp directory instead:
- public static final String metadataDbUrlTemplate =
- "jdbc:h2:file:/tmp/%s;MODE=MySQL;DATABASE_TO_UPPER=FALSE";
+ private static final String TMP_DIR =
+ System.getProperty("java.io.tmpdir").replace(File.separatorChar, '/');
+ public static final String metadataDbUrlTemplate =
+ "jdbc:h2:file:" + TMP_DIR + "/%s;MODE=MySQL;DATABASE_TO_UPPER=FALSE";This preserves behaviour while remaining cross-platform.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| public static final String metadataDbUrlTemplate = | |
| "jdbc:h2:file:/tmp/%s;MODE=MySQL;DATABASE_TO_UPPER=FALSE"; | |
| public static final String metadataDbTablePrefix = "clp_"; | |
| public static final String metadataDbUser = "sa"; | |
| public static final String metadataDbPassword = ""; | |
| private final String datasetsTableName = metadataDbTablePrefix + "datasets"; | |
| private static final String TMP_DIR = | |
| System.getProperty("java.io.tmpdir").replace(File.separatorChar, '/'); | |
| public static final String metadataDbUrlTemplate = | |
| "jdbc:h2:file:" + TMP_DIR + "/%s;MODE=MySQL;DATABASE_TO_UPPER=FALSE"; | |
| public static final String metadataDbTablePrefix = "clp_"; | |
| public static final String metadataDbUser = "sa"; | |
| public static final String metadataDbPassword = ""; | |
| private final String datasetsTableName = metadataDbTablePrefix + "datasets"; |
🤖 Prompt for AI Agents
In
presto-clp/src/test/java/com/facebook/presto/plugin/clp/ClpMetadataDbSetUp.java
around lines 38 to 43, the metadataDbUrlTemplate hardcodes the /tmp directory,
which is not portable across all operating systems. Modify the code to generate
a temporary directory dynamically using Java's standard library (e.g.,
Files.createTempDirectory or System.getProperty("java.io.tmpdir")) and use that
path in the JDBC URL template instead of /tmp. This change will preserve the
intended behavior while ensuring cross-platform compatibility.
04ded2d to
996d67c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (4)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadata.java (1)
155-178:listTableColumnscan still throw NPEs for missing tables
The earlier review comment about filtering outnullhandles and avoidingnullschema names is still unresolved.Please apply the previously suggested Java-8 compatible loop or guard the stream chain by:
return schemaTableNames.stream() .map(name -> Optional.ofNullable(getTableHandle(session, name)) .map(handle -> Map.entry(name, getTableMetadata(session, handle).getColumns()))) .filter(Optional::isPresent) .map(Optional::get) .collect(ImmutableMap.toImmutableMap(Map.Entry::getKey, Map.Entry::getValue));and ensure
schemaNamedefaults toDEFAULT_SCHEMA_NAMEwhenprefix.getSchemaName()isnull.presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java (1)
39-75: Hard-coded DB name and/tmppath still risk collisions and break Windows
The previous feedback about generating a unique database name and usingjava.io.tmpdir(orFiles.createTempDirectory) hasn’t been addressed.Adopt something like:
private final Path tempDir = Files.createTempDirectory("clp_split_"); private final String databaseName = "split_testdb_" + System.nanoTime(); ... String tablePath = tempDir.resolve(tableName).toString();presto-clp/src/test/java/com/facebook/presto/plugin/clp/ClpMetadataDbSetUp.java (2)
38-43:/tmpis still hard-coded – not portable
ReuseSystem.getProperty("java.io.tmpdir")or ajava.nio.file.Pathcreated viaFiles.createTempDirectoryso the tests run on Windows and sandboxed CI environments.
140-148: CheckFile.delete()return value
Ignoring the boolean result hides failed clean-ups and can leave artefacts behind. Log a warning when deletion fails, as previously suggested.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (3)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadata.java(1 hunks)presto-clp/src/test/java/com/facebook/presto/plugin/clp/ClpMetadataDbSetUp.java(1 hunks)presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/ClpMetadataDbSetUp.java (2)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMySqlMetadataProvider.java (1)
ClpMySqlMetadataProvider(33-135)presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java (1)
ClpMySqlSplitProvider(32-97)
⏰ Context from checks skipped due to timeout of 90000ms (18)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-local-queries)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-plan-determinism)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-tpch-distributed-queries)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-tpch-distributed-queries)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-local-queries)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-distributed-queries)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-resource-manager)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-distributed-non-hash-gen)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-plan-determinism)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-aggregation-queries)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-distributed-non-hash-gen)
- GitHub Check: test (8.0.442, :presto-tests -P presto-tests-execution-memory)
- GitHub Check: prestocpp-linux-build-for-test
- GitHub Check: maven-checks (17.0.13)
- GitHub Check: test (:presto-docs)
- GitHub Check: maven-checks (8.0.442)
- GitHub Check: maven-checks (17.0.13)
- GitHub Check: maven-checks (8.0.442)
| this.columnHandleCache = CacheBuilder.newBuilder() | ||
| .expireAfterWrite(clpConfig.getMetadataExpireInterval(), SECONDS) | ||
| .refreshAfterWrite(clpConfig.getMetadataRefreshInterval(), SECONDS) | ||
| .build(CacheLoader.from(this::loadColumnHandles)); | ||
| this.tableHandleCache = CacheBuilder.newBuilder() | ||
| .expireAfterWrite(clpConfig.getMetadataExpireInterval(), SECONDS) | ||
| .refreshAfterWrite(clpConfig.getMetadataRefreshInterval(), SECONDS) | ||
| .build(CacheLoader.from(this::loadTableHandles)); | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add an upper bound to the caches and assign the provider first
Creating unbounded LoadingCaches can lead to unrestrained memory growth in long-lived coordinator JVMs.
Also, initialise clpMetadataProvider before building caches so that a future code change (or constructor re-ordering) cannot inadvertently trigger a load while the provider is still null.
- this.columnHandleCache = CacheBuilder.newBuilder()
+ this.clpMetadataProvider = requireNonNull(clpMetadataProvider, "ClpMetadataProvider is null");
+
+ this.columnHandleCache = CacheBuilder.newBuilder()
+ .maximumSize(10_000) // tune appropriately
.expireAfterWrite(clpConfig.getMetadataExpireInterval(), SECONDS)
.refreshAfterWrite(clpConfig.getMetadataRefreshInterval(), SECONDS)
.build(CacheLoader.from(this::loadColumnHandles));
- this.tableHandleCache = CacheBuilder.newBuilder()
+ this.tableHandleCache = CacheBuilder.newBuilder()
+ .maximumSize(1_000) // tune appropriately
.expireAfterWrite(clpConfig.getMetadataExpireInterval(), SECONDS)
.refreshAfterWrite(clpConfig.getMetadataRefreshInterval(), SECONDS)
.build(CacheLoader.from(this::loadTableHandles));
- this.clpMetadataProvider = requireNonNull(clpMetadataProvider, "ClpMetadataProvider is null");📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| this.columnHandleCache = CacheBuilder.newBuilder() | |
| .expireAfterWrite(clpConfig.getMetadataExpireInterval(), SECONDS) | |
| .refreshAfterWrite(clpConfig.getMetadataRefreshInterval(), SECONDS) | |
| .build(CacheLoader.from(this::loadColumnHandles)); | |
| this.tableHandleCache = CacheBuilder.newBuilder() | |
| .expireAfterWrite(clpConfig.getMetadataExpireInterval(), SECONDS) | |
| .refreshAfterWrite(clpConfig.getMetadataRefreshInterval(), SECONDS) | |
| .build(CacheLoader.from(this::loadTableHandles)); | |
| // assign provider first to avoid any loader invocation with a null field | |
| this.clpMetadataProvider = requireNonNull(clpMetadataProvider, "ClpMetadataProvider is null"); | |
| this.columnHandleCache = CacheBuilder.newBuilder() | |
| .maximumSize(10_000) // tune appropriately | |
| .expireAfterWrite(clpConfig.getMetadataExpireInterval(), SECONDS) | |
| .refreshAfterWrite(clpConfig.getMetadataRefreshInterval(), SECONDS) | |
| .build(CacheLoader.from(this::loadColumnHandles)); | |
| this.tableHandleCache = CacheBuilder.newBuilder() | |
| .maximumSize(1_000) // tune appropriately | |
| .expireAfterWrite(clpConfig.getMetadataExpireInterval(), SECONDS) | |
| .refreshAfterWrite(clpConfig.getMetadataRefreshInterval(), SECONDS) | |
| .build(CacheLoader.from(this::loadTableHandles)); |
🤖 Prompt for AI Agents
In presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadata.java
around lines 57 to 65, the caches are created without an upper bound which risks
unbounded memory growth. Add a maximum size limit to both columnHandleCache and
tableHandleCache using CacheBuilder's maximumSize method. Also, ensure that
clpMetadataProvider is assigned before building these caches to prevent any
accidental cache loading when the provider might still be null.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
♻️ Duplicate comments (5)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpMetadata.java (2)
29-29: Replace heavyweightPairwith lighter/idiomatic alternativeThe test still uses
org.apache.commons.math3.util.Pair, bringing in an otherwise-unused Math3 dependency. Switching toImmutablePair, a Java 17 record, orMap.entrywould eliminate the extra JAR and reduce class-path surface.
This was raised in a previous review and remains unaddressed.Also applies to: 58-64
34-34: Use immutable collections instead ofHashSetfor expectations
HashSetis verbose and mutable;ImmutableSet.of(...)(orImmutableSet.builder()) is the convention used elsewhere in the codebase and avoids accidental mutation.Also applies to: 82-85
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java (1)
42-57: Hard-coded DB name risks collisions; generate a unique one
getDbHandle("split_testdb")still uses a constant name. Parallel CI shards can trample each other’s files. AppendSystem.nanoTime()or reuse the UUID technique employed inClpMetadataDbSetUp.getDbHandle.presto-clp/src/test/java/com/facebook/presto/plugin/clp/ClpMetadataDbSetUp.java (2)
39-57: Avoid hard-coding/tmp; use the platform temp dirThe JDBC URL template and database path rely on
/tmp, which is not guaranteed to exist or be writable on all OSes (e.g., Windows). Build the path fromSystem.getProperty("java.io.tmpdir")orFiles.createTempDirectory(...).
154-162: CheckFile.delete()return value to surface clean-up failures
delete()is invoked without verifying its boolean result, silently ignoring failures (file locks, permissions, etc.). Log a warning when deletion returnsfalseso leftover artefacts aren’t missed.-if (dbFile.exists()) { - dbFile.delete(); - log.info("Deleted database file: " + dbFile.getAbsolutePath()); -} +if (dbFile.exists() && !dbFile.delete()) { + log.warn("Failed to delete database file: %s", dbFile.getAbsolutePath()); +}
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (3)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/ClpMetadataDbSetUp.java(1 hunks)presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpMetadata.java(1 hunks)presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (17)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-tpch-distributed-queries)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-distributed-queries)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-local-queries)
- GitHub Check: test (8.0.442, :presto-main-base)
- GitHub Check: test (17.0.13, :presto-tests -P presto-tests-general)
- GitHub Check: test (8.0.442, :presto-main)
- GitHub Check: test (17.0.13, :presto-tests -P presto-tests-execution-memory)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-distributed-queries)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-aggregation-queries)
- GitHub Check: test (8.0.442, :presto-tests -P presto-tests-general)
- GitHub Check: test (8.0.442, :presto-tests -P presto-tests-execution-memory)
- GitHub Check: test (:presto-docs)
- GitHub Check: maven-checks (8.0.442)
- GitHub Check: maven-checks (17.0.13)
- GitHub Check: prestocpp-linux-build-for-test
- GitHub Check: maven-checks (8.0.442)
- GitHub Check: maven-checks (17.0.13)
| ImmutableSet<ColumnMetadata> columnMetadata = ImmutableSet.<ColumnMetadata>builder() | ||
| .add(ColumnMetadata.builder() | ||
| .setName("a_bigint") | ||
| .setType(BigintType.BIGINT) | ||
| .setNullable(true) | ||
| .build()) | ||
| .add(ColumnMetadata.builder() | ||
| .setName("a_varchar") | ||
| .setType(VarcharType.VARCHAR) | ||
| .setNullable(true) | ||
| .build()) | ||
| .add(ColumnMetadata.builder() | ||
| .setName("b_double") | ||
| .setType(DoubleType.DOUBLE) | ||
| .setNullable(true) | ||
| .build()) | ||
| .add(ColumnMetadata.builder() | ||
| .setName("b_varchar") | ||
| .setType(VarcharType.VARCHAR) | ||
| .setNullable(true) | ||
| .build()) | ||
| .add(ColumnMetadata.builder() | ||
| .setName("c") | ||
| .setType(RowType.from(ImmutableList.of( | ||
| RowType.field("d", BooleanType.BOOLEAN), | ||
| RowType.field("e", VarcharType.VARCHAR)))) | ||
| .setNullable(true) | ||
| .build()) | ||
| .add(ColumnMetadata.builder() | ||
| .setName("f") | ||
| .setType(RowType.from(ImmutableList.of( | ||
| RowType.field("g", | ||
| RowType.from(ImmutableList.of( | ||
| RowType.field("h", new ArrayType(VarcharType.VARCHAR)))))))) | ||
| .setNullable(true) | ||
| .build()) | ||
| .build(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Consider extracting expected column metadata into a helper for readability
The 35-line builder chain obscures the intent of the test. A small helper such as expectedColumns("a_bigint", BIGINT, …) or an ImmutableSet<ColumnMetadata> returned from a dedicated method would make the assertion easier to read and maintain.
🤖 Prompt for AI Agents
In presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpMetadata.java
around lines 93 to 129, the long builder chain for creating the expected column
metadata reduces readability. Extract this column metadata construction into a
separate helper method that returns the ImmutableSet<ColumnMetadata> or use a
concise helper like expectedColumns(...) to build the set. This refactor will
simplify the test method and improve maintainability by isolating the setup
logic.
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java
Show resolved
Hide resolved
| private static void updateDatasetsTable(Connection conn, String tableName) throws SQLException | ||
| { | ||
| final String insertDatasetTableSQL = String.format( | ||
| "INSERT INTO %s (name, archive_storage_type, archive_storage_directory) VALUES (?, ?, ?)", datasetsTableName); | ||
| try (PreparedStatement pstmt = conn.prepareStatement(insertDatasetTableSQL)) { | ||
| pstmt.setString(1, tableName); | ||
| pstmt.setString(2, "fs"); | ||
| pstmt.setString(3, "/tmp/archives/" + tableName); | ||
| pstmt.executeUpdate(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Hard-coded archive path leaks /tmp usage into metadata
The helper inserts /tmp/archives/ directly into the datasets table. Consider parameterising the root directory (or using the same temp dir as the DB) so tests stay hermetic and portable.
🤖 Prompt for AI Agents
In
presto-clp/src/test/java/com/facebook/presto/plugin/clp/ClpMetadataDbSetUp.java
around lines 175 to 183, the archive storage directory path is hard-coded to
"/tmp/archives/", which leaks the use of the /tmp directory and reduces test
portability. Modify the method to accept the root archive directory as a
parameter or retrieve it from a configurable source, then use this parameter to
construct the archive_storage_directory value instead of the hard-coded path.
This will make the tests more hermetic and portable by avoiding fixed filesystem
paths.
d5e3ccc to
6859026
Compare
6859026 to
75fea65
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (4)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpMetadata.java (2)
82-84: Use ImmutableSet for cleaner codeReplace HashSet construction with ImmutableSet for better immutability and brevity.
- HashSet<SchemaTableName> tables = new HashSet<>(); - tables.add(new SchemaTableName(DEFAULT_SCHEMA_NAME, tableName)); - assertEquals(new HashSet<>(metadata.listTables(SESSION, Optional.empty())), tables); + ImmutableSet<SchemaTableName> expectedTables = ImmutableSet.of( + new SchemaTableName(DEFAULT_SCHEMA_NAME, tableName)); + assertEquals(ImmutableSet.copyOf(metadata.listTables(SESSION, Optional.empty())), expectedTables);
93-129: Extract column metadata creation for better readabilityThe long builder chain makes the test harder to read. Consider extracting to a helper method.
+ private static ImmutableSet<ColumnMetadata> buildExpectedColumnMetadata() + { + return ImmutableSet.<ColumnMetadata>builder() + .add(ColumnMetadata.builder() + .setName("a_bigint") + .setType(BigintType.BIGINT) + .setNullable(true) + .build()) + .add(ColumnMetadata.builder() + .setName("a_varchar") + .setType(VarcharType.VARCHAR) + .setNullable(true) + .build()) + .add(ColumnMetadata.builder() + .setName("b_double") + .setType(DoubleType.DOUBLE) + .setNullable(true) + .build()) + .add(ColumnMetadata.builder() + .setName("b_varchar") + .setType(VarcharType.VARCHAR) + .setNullable(true) + .build()) + .add(ColumnMetadata.builder() + .setName("c") + .setType(RowType.from(ImmutableList.of( + RowType.field("d", BooleanType.BOOLEAN), + RowType.field("e", VarcharType.VARCHAR)))) + .setNullable(true) + .build()) + .add(ColumnMetadata.builder() + .setName("f") + .setType(RowType.from(ImmutableList.of( + RowType.field("g", + RowType.from(ImmutableList.of( + RowType.field("h", new ArrayType(VarcharType.VARCHAR)))))))) + .setNullable(true) + .build()) + .build(); + } + @Test public void testGetTableMetadata() { ClpTableHandle clpTableHandle = (ClpTableHandle) metadata.getTableHandle(SESSION, new SchemaTableName(DEFAULT_SCHEMA_NAME, tableName)); ConnectorTableMetadata tableMetadata = metadata.getTableMetadata(SESSION, clpTableHandle); - ImmutableSet<ColumnMetadata> columnMetadata = ImmutableSet.<ColumnMetadata>builder() - .add(ColumnMetadata.builder() - .setName("a_bigint") - .setType(BigintType.BIGINT) - .setNullable(true) - .build()) - // ... rest of the builder chain ... - .build(); + ImmutableSet<ColumnMetadata> columnMetadata = buildExpectedColumnMetadata(); assertEquals(columnMetadata, ImmutableSet.copyOf(tableMetadata.getColumns())); }presto-clp/src/test/java/com/facebook/presto/plugin/clp/ClpMetadataDbSetUp.java (2)
162-168: Check file deletion results to detect cleanup failuresThe return value of
File.delete()is ignored, which can hide cleanup failures.if (dbFile.exists()) { - dbFile.delete(); - log.info("Deleted database file: " + dbFile.getAbsolutePath()); + if (dbFile.delete()) { + log.info("Deleted database file: " + dbFile.getAbsolutePath()); + } + else { + log.warn("Failed to delete database file: " + dbFile.getAbsolutePath()); + } } if (lockFile.exists()) { - lockFile.delete(); + if (!lockFile.delete()) { + log.warn("Failed to delete lock file: " + lockFile.getAbsolutePath()); + } }
181-190: Parameterize archive directory path for better test isolationThe archive storage directory is hard-coded to
/tmp/archives/, reducing portability and test isolation.Consider adding a parameter to accept the archive root directory:
-private static void updateDatasetsTable(Connection conn, String tableName) throws SQLException +private static void updateDatasetsTable(Connection conn, String tableName, String archiveRootDir) throws SQLException { final String insertDatasetTableSQL = String.format( "INSERT INTO %s (name, archive_storage_type, archive_storage_directory) VALUES (?, ?, ?)", datasetsTableName); try (PreparedStatement pstmt = conn.prepareStatement(insertDatasetTableSQL)) { pstmt.setString(1, tableName); pstmt.setString(2, "fs"); - pstmt.setString(3, "/tmp/archives/" + tableName); + pstmt.setString(3, archiveRootDir + "/" + tableName); pstmt.executeUpdate(); } }Then update the callers to pass a configurable archive root directory.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (3)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/ClpMetadataDbSetUp.java(1 hunks)presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpMetadata.java(1 hunks)presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (9)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-distributed-non-hash-gen)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-tpch-distributed-queries)
- GitHub Check: test (8.0.442, :presto-tests -P presto-tests-general)
- GitHub Check: maven-checks (17.0.13)
- GitHub Check: maven-checks (8.0.442)
- GitHub Check: maven-checks (8.0.442)
- GitHub Check: maven-checks (17.0.13)
- GitHub Check: test (:presto-docs)
- GitHub Check: prestocpp-linux-build-for-test
🔇 Additional comments (1)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java (1)
81-89: Good fix for the ordering issue!Using
ImmutableSetto compare splits as sets rather than relying on order is the correct approach. This eliminates the test flakiness that could occur from non-deterministic ordering.
|
|
||
| public static DbHandle getDbHandle(String dbName) | ||
| { | ||
| return new DbHandle(String.format("/tmp/presto-clp-test/%s-%s", dbName, UUID.randomUUID())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Use system temp directory for cross-platform compatibility
While the path now includes UUID for uniqueness, it still hard-codes /tmp which doesn't exist on all operating systems (e.g., Windows).
+import java.io.IOException;
+import java.nio.file.Files;
+import java.nio.file.Path;
public static DbHandle getDbHandle(String dbName)
{
- return new DbHandle(String.format("/tmp/presto-clp-test/%s-%s", dbName, UUID.randomUUID()));
+ try {
+ Path tempDir = Files.createTempDirectory("presto-clp-test");
+ return new DbHandle(tempDir.resolve(dbName + "-" + UUID.randomUUID()).toString());
+ }
+ catch (IOException e) {
+ throw new RuntimeException("Failed to create temporary directory", e);
+ }
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| return new DbHandle(String.format("/tmp/presto-clp-test/%s-%s", dbName, UUID.randomUUID())); | |
| // add at top of file | |
| import java.io.IOException; | |
| import java.nio.file.Files; | |
| import java.nio.file.Path; | |
| public static DbHandle getDbHandle(String dbName) | |
| { | |
| try { | |
| Path tempDir = Files.createTempDirectory("presto-clp-test"); | |
| return new DbHandle(tempDir.resolve(dbName + "-" + UUID.randomUUID()).toString()); | |
| } | |
| catch (IOException e) { | |
| throw new RuntimeException("Failed to create temporary directory", e); | |
| } | |
| } |
🤖 Prompt for AI Agents
In
presto-clp/src/test/java/com/facebook/presto/plugin/clp/ClpMetadataDbSetUp.java
at line 62, the file path is hard-coded to "/tmp", which is not cross-platform
compatible. Replace the hard-coded "/tmp" with a call to the system's temporary
directory, such as using Java's System.getProperty("java.io.tmpdir"), to ensure
the path works on all operating systems. Keep the UUID for uniqueness but
construct the full path using the system temp directory dynamically.
presto-clp/src/test/java/com/facebook/presto/plugin/clp/ClpMetadataDbSetUp.java
Outdated
Show resolved
Hide resolved
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpMetadata.java
Outdated
Show resolved
Hide resolved
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpMetadata.java
Show resolved
Hide resolved
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpMetadata.java
Show resolved
Hide resolved
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpMetadata.java
Show resolved
Hide resolved
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java
Outdated
Show resolved
Hide resolved
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java
Outdated
Show resolved
Hide resolved
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java
Show resolved
Hide resolved
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpMetadata.java
Outdated
Show resolved
Hide resolved
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpMetadata.java
Outdated
Show resolved
Hide resolved
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpMetadata.java
Show resolved
Hide resolved
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java
Outdated
Show resolved
Hide resolved
|
@coderabbitai please learn from my comments, especially for the static import and Guava: Instead, static import this method and use “format()” |
|
Tip For best results, initiate chat on the files or code changes. @anlowee, thank you for the clear coding style guidance! I've noted both preferences and will apply them in future reviews of this repository. 🐰 ✏️ Learnings added
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
♻️ Duplicate comments (22)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnectorFactory.java (3)
16-28: Consider alphabetizing imports for consistency.The import statements could be organized alphabetically for better maintainability, as suggested in previous reviews.
54-63: Missing strictConfig() in Bootstrap initialization.The Bootstrap is initialized without
strictConfig(), which means configuration typos or deprecated keys will be silently ignored, potentially leading to runtime issues.Apply this fix to enable strict configuration:
- Bootstrap app = new Bootstrap(new JsonModule(), new ClpModule(), binder -> { + Bootstrap app = new Bootstrap(new JsonModule(), new ClpModule(), binder -> { binder.bind(FunctionMetadataManager.class).toInstance(context.getFunctionMetadataManager()); binder.bind(NodeManager.class).toInstance(context.getNodeManager()); binder.bind(RowExpressionService.class).toInstance(context.getRowExpressionService()); binder.bind(StandardFunctionResolution.class).toInstance(context.getStandardFunctionResolution()); binder.bind(TypeManager.class).toInstance(context.getTypeManager()); - }); + }).strictConfig();
55-59: Consider alphabetizing the binding statements.The service bindings could be organized alphabetically for consistency, as suggested in previous reviews.
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpMetadata.java (2)
89-91: Use ImmutableSet instead of HashSet for better immutability.Replace HashSet construction with ImmutableSet for consistency with the codebase preferences.
- HashSet<SchemaTableName> tables = new HashSet<>(); - tables.add(new SchemaTableName(DEFAULT_SCHEMA_NAME, TABLE_NAME)); - assertEquals(new HashSet<>(metadata.listTables(SESSION, Optional.empty())), tables); + ImmutableSet<SchemaTableName> expectedTables = ImmutableSet.of( + new SchemaTableName(DEFAULT_SCHEMA_NAME, TABLE_NAME)); + assertEquals(ImmutableSet.copyOf(metadata.listTables(SESSION, Optional.empty())), expectedTables);
99-135: Extract expected column metadata into a helper method.The 35-line builder chain obscures the test intent. Consider extracting this into a helper method for better readability and maintainability.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java (1)
44-56: Consider using a strategy map to simplify provider binding.The repeated if/else blocks for metadata and split provider binding could be simplified using a strategy map pattern as suggested in previous reviews.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpSplit.java (1)
61-64: Implement getInfoMap instead of deprecated getInfo method.The
getInfo()method is deprecated. Consider implementinggetInfoMap()instead, which provides better type safety and flexibility for split information.Based on the HiveSplit example from previous reviews:
- @Override - public Map<String, String> getInfo() - { - return ImmutableMap.of("path", path); - } + @Override + public Map<String, String> getInfoMap() + { + return ImmutableMap.of("path", path); + }presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java (1)
82-84: Critical: Swallowing SQL exceptions hides split failuresThe catch block logs at WARN level and returns an empty list, causing queries to proceed with zero splits and produce confusing "table is empty" results instead of surfacing the real metadata issue.
Apply this fix to propagate the error properly:
- catch (SQLException e) { - log.warn("Database error while processing splits for %s: %s", tableName, e); - } + catch (SQLException e) { + throw new PrestoException(StandardErrorCode.CONNECTOR_ERROR, + format("Unable to list splits for table %s: %s", tableName, e.getMessage()), e); + }presto-docs/src/main/sphinx/connector/clp.rst (1)
54-56: Add missing target label for Data Types cross-referenceThe link
See \Data Types`_ for details` won't resolve because there's no label for the Data Types heading.Add a target label before the Data Types heading:
+.. _Data Types: + Data Types ----------presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpSchemaTree.java (2)
62-64: Add null check for input parameterThe method should validate that
fullNameis not null to prevent a potentialNullPointerExceptionat line 65 when callingsplit().Add a null check at the beginning:
public void addColumn(String fullName, byte type) { + if (fullName == null) { + throw new PrestoException(ClpErrorCode.CLP_INVALID_INPUT, "Column name cannot be null"); + } Type prestoType = mapColumnType(type);
167-170: Sanitise polymorphic type suffixes
type.getDisplayName()can contain spaces, commas, and parentheses (e.g.,row(col bigint)), producing invalid column identifiers likeid_row(col bigint).Replace non-alphanumeric characters with underscores:
private String getTypeSuffix(Type type) { - return (type instanceof ArrayType) ? "array" : type.getDisplayName(); + String suffix = (type instanceof ArrayType) ? "array" : type.getDisplayName(); + return suffix.replaceAll("[^a-zA-Z0-9_]", "_"); }presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMySqlMetadataProvider.java (1)
78-78: Validate table name before SQL embedding
schemaTableName.getTableName()is directly embedded into the SQL query without validation, risking SQL injection or broken queries if the name contains backticks.Add validation before constructing the query:
+String tableName = schemaTableName.getTableName(); +if (!isValidIdentifier(tableName)) { + throw new PrestoException(ClpErrorCode.CLP_INVALID_TABLE_NAME, "Invalid table name: " + tableName); +} -String query = format(SQL_SELECT_COLUMN_METADATA_TEMPLATE, config.getMetadataTablePrefix(), schemaTableName.getTableName()); +String query = format(SQL_SELECT_COLUMN_METADATA_TEMPLATE, config.getMetadataTablePrefix(), tableName);presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadata.java (2)
69-77: Add upper bounds to caches for memory safetyCreating unbounded
LoadingCaches can lead to uncontrolled memory growth in long-lived coordinator JVMs.Add reasonable maximum size limits:
this.columnHandleCache = CacheBuilder.newBuilder() + .maximumSize(10_000) .expireAfterWrite(clpConfig.getMetadataExpireInterval(), SECONDS) .refreshAfterWrite(clpConfig.getMetadataRefreshInterval(), SECONDS) .build(CacheLoader.from(this::loadColumnHandles)); this.tableHandleCache = CacheBuilder.newBuilder() + .maximumSize(1_000) .expireAfterWrite(clpConfig.getMetadataExpireInterval(), SECONDS) .refreshAfterWrite(clpConfig.getMetadataRefreshInterval(), SECONDS) .build(CacheLoader.from(this::loadTableHandles));
167-170: Potential NPE from null table handle
getTableHandle()can returnnull, butlistTableColumns()unconditionally passes its result togetTableMetadata(), which will throw aNullPointerException.Filter out null handles before processing:
return schemaTableNames.stream() + .map(tableName -> { + ConnectorTableHandle handle = getTableHandle(session, tableName); + return handle != null ? Map.entry(tableName, getTableMetadata(session, handle).getColumns()) : null; + }) + .filter(Objects::nonNull) .collect(toImmutableMap( - Function.identity(), - tableName -> getTableMetadata(session, getTableHandle(session, tableName)).getColumns())); + Map.Entry::getKey, + Map.Entry::getValue));Note: Add
import java.util.Objects;if not already present.presto-clp/src/test/java/com/facebook/presto/plugin/clp/ClpMetadataDbSetUp.java (3)
52-52: Replace hardcoded/tmpwith system temp directory.The archive storage directory still uses a hardcoded
/tmppath, which isn't portable across operating systems.Consider using
System.getProperty("java.io.tmpdir")for cross-platform compatibility.
65-65: Use system temp directory instead of hardcoded/tmp.The database handle creation still uses a hardcoded
/tmppath which was flagged in previous reviews for portability issues.Replace with a system-agnostic temporary directory approach.
175-175: Use parameterized logging instead of string concatenation.String concatenation creates unnecessary object allocation.
- log.info("Deleted database dir" + dir.getAbsolutePath()); + log.info("Deleted database dir: %s", dir.getAbsolutePath());presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java (5)
33-34: Consider using Duration objects for time intervals.Storing time intervals as raw
longvalues lacks unit safety and validation. Presto's configuration framework supportsDurationobjects which provide better type safety.This would allow human-friendly configuration values like "60s" or "10m" and prevent invalid negative values.
103-108: Add security annotation for password configuration.The password setter lacks
@ConfigSecuritySensitiveannotation which prevents accidental logging of sensitive values.@Config("clp.metadata-db-password") +@ConfigSecuritySensitive public ClpConfig setMetadataDbPassword(String metadataDbPassword)
128-150: Time interval methods should use Duration type.As noted in previous reviews, these methods should use
com.facebook.airlift.units.Durationinstead oflongfor better unit safety and validation.Duration objects provide human-friendly parsing and prevent unit ambiguities.
103-108: Duplicate comment: Password annotation missingThis setter should be annotated with
@ConfigSecuritySensitiveto prevent accidental logging of the password, as previously mentioned in past review comments.
133-138: Duplicate comment: Time intervals should use DurationAs noted in past comments, these time intervals should use
com.facebook.airlift.units.Durationinstead of rawlongvalues for better unit safety and validation.Also applies to: 145-150
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (12)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnectorFactory.java(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadata.java(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpSplit.java(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMySqlMetadataProvider.java(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpSchemaTree.java(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java(1 hunks)presto-clp/src/test/java/com/facebook/presto/plugin/clp/ClpMetadataDbSetUp.java(1 hunks)presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpMetadata.java(1 hunks)presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java(1 hunks)presto-docs/src/main/sphinx/connector/clp.rst(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMySqlMetadataProvider.java (1)
Learnt from: anlowee
PR: y-scope/presto#0
File: :0-0
Timestamp: 2025-06-19T08:41:46.280Z
Learning: In the y-scope/presto repository, prefer using Guava's Immutable collections over regular collections when possible.
⏰ Context from checks skipped due to timeout of 90000ms (20)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-local-queries)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-distributed-queries)
- GitHub Check: test (17.0.13, :presto-main-base)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-plan-determinism)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-distributed-non-hash-gen)
- GitHub Check: test (17.0.13, :presto-tests -P presto-tests-general)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-resource-manager)
- GitHub Check: test (8.0.442, :presto-main)
- GitHub Check: test (8.0.442, :presto-main-base)
- GitHub Check: test (8.0.442, :presto-tests -P presto-tests-general)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-aggregation-queries)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-local-queries)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-distributed-queries)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-tpch-distributed-queries)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-distributed-non-hash-gen)
- GitHub Check: test (8.0.442, :presto-tests -P presto-tests-execution-memory)
- GitHub Check: test (:presto-docs)
- GitHub Check: prestocpp-linux-build-for-test
- GitHub Check: maven-checks (17.0.13)
- GitHub Check: maven-checks (8.0.442)
🔇 Additional comments (13)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnectorFactory.java (1)
33-70: Well-structured connector factory implementation.The factory correctly implements the ConnectorFactory interface, properly handles null validation, and appropriately wraps exceptions. The dependency injection setup is sound.
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpMetadata.java (1)
80-137: Comprehensive test coverage with good validation.The tests effectively validate schema listing, table listing, and detailed table metadata retrieval including complex nested row and array types. The test setup and assertions are thorough.
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java (2)
86-94: Good fix for order-dependent test issues.The test now properly compares splits as sets rather than relying on order, which addresses the potential flakiness identified in previous reviews. This is a solid improvement.
72-96: Well-structured split validation test.The test effectively validates that the split provider returns the expected splits with correct paths. The setup with multiple tables and the set-based comparison ensures robust testing.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java (1)
31-58: Solid dependency injection configuration.The module properly configures all necessary bindings for the CLP connector, including conditional provider binding based on configuration. The error handling for unsupported provider types is appropriate.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpSplit.java (1)
31-65: Clean and well-implemented split class.The split implementation correctly handles immutability, JSON serialization, and null validation. The structure follows Presto connector patterns appropriately.
presto-clp/src/test/java/com/facebook/presto/plugin/clp/ClpMetadataDbSetUp.java (3)
68-120: Well-structured metadata setup implementation.The method properly uses try-with-resources for database connections and prepared statements, follows the configuration builder pattern, and handles batch operations efficiently.
122-167: Clean split provider setup implementation.Good use of try-with-resources, proper batch processing, and consistent configuration pattern.
42-43: Apply static import preference from coding guidelinesBased on the PR coding style guidance, consider using static import for
String.format:+import static java.lang.String.format; -import static java.lang.String.format;The static import is already present and correctly applied throughout the codebase.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java (4)
115-126: Excellent validation implementation for metadata table prefix.The validation logic properly checks for null values and uses regex pattern matching to ensure safe SQL identifiers, with clear error messaging.
164-172: Clean enum definitions for provider types.The enum structure provides a solid foundation for future extensibility while currently supporting only MySQL providers as intended.
23-23: ```shell
#!/bin/bashLocate all long-typed configuration fields in ClpConfig.java
rg -n "^\s*private\s+long" presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java
--- `118-122`: **Consider validating non-empty prefix** The current validation allows empty strings but not null. Consider whether an empty prefix should be valid or if additional validation is needed: ```diff - if (metadataTablePrefix == null || !SAFE_SQL_TABLE_NAME_PATTERN.matcher(metadataTablePrefix).matches()) { + if (metadataTablePrefix == null || metadataTablePrefix.trim().isEmpty() || + !SAFE_SQL_TABLE_NAME_PATTERN.matcher(metadataTablePrefix).matches()) {This would prevent accidental empty prefixes that might cause issues with table naming.
Likely an incorrect or invalid review comment.
| ClpTableHandle clpTableHandle = clpTableLayoutHandle.getTable(); | ||
| String tablePath = clpTableHandle.getTablePath(); | ||
| String tableName = clpTableHandle.getSchemaTableName().getTableName(); | ||
| String archivePathQuery = format(SQL_SELECT_ARCHIVES_TEMPLATE, config.getMetadataTablePrefix(), tableName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Validate table name before SQL embedding
The table name from tableName is directly embedded into the SQL query without validation, risking SQL injection or broken queries if the name contains backticks or other special characters.
Add validation before constructing the query:
String tableName = clpTableHandle.getSchemaTableName().getTableName();
+if (!isValidTableName(tableName)) {
+ throw new PrestoException(ClpErrorCode.CLP_INVALID_TABLE_NAME, "Invalid table name: " + tableName);
+}
String archivePathQuery = format(SQL_SELECT_ARCHIVES_TEMPLATE, config.getMetadataTablePrefix(), tableName);Add this validation method:
private boolean isValidTableName(String tableName) {
return tableName != null && ClpConfig.SAFE_SQL_TABLE_NAME_PATTERN.matcher(tableName).matches();
}🤖 Prompt for AI Agents
In
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java
at line 70, the tableName is directly embedded into the SQL query without
validation, which risks SQL injection or malformed queries. To fix this, add a
validation method isValidTableName that checks the tableName against a safe
pattern, and call this method before constructing the query. If the tableName is
invalid, handle it appropriately (e.g., throw an exception) to prevent unsafe
SQL execution.
| public static void tearDown(DbHandle dbHandle) | ||
| { | ||
| File dir = new File(dbHandle.dbPath).getParentFile(); | ||
| if (dir.exists()) { | ||
| try { | ||
| FileUtils.deleteDirectory(dir); | ||
| log.info("Deleted database dir" + dir.getAbsolutePath()); | ||
| } | ||
| catch (IOException e) { | ||
| log.warn("Failed to delete directory " + dir + ": " + e.getMessage()); | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Effective cleanup implementation with minor logging improvement.
The teardown logic properly handles directory removal and error cases.
Consider using parameterized logging on line 178 as well:
- log.warn("Failed to delete directory " + dir + ": " + e.getMessage());
+ log.warn("Failed to delete directory %s: %s", dir, e.getMessage());📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| public static void tearDown(DbHandle dbHandle) | |
| { | |
| File dir = new File(dbHandle.dbPath).getParentFile(); | |
| if (dir.exists()) { | |
| try { | |
| FileUtils.deleteDirectory(dir); | |
| log.info("Deleted database dir" + dir.getAbsolutePath()); | |
| } | |
| catch (IOException e) { | |
| log.warn("Failed to delete directory " + dir + ": " + e.getMessage()); | |
| } | |
| } | |
| } | |
| public static void tearDown(DbHandle dbHandle) | |
| { | |
| File dir = new File(dbHandle.dbPath).getParentFile(); | |
| if (dir.exists()) { | |
| try { | |
| FileUtils.deleteDirectory(dir); | |
| log.info("Deleted database dir" + dir.getAbsolutePath()); | |
| } | |
| catch (IOException e) { | |
| - log.warn("Failed to delete directory " + dir + ": " + e.getMessage()); | |
| + log.warn("Failed to delete directory %s: %s", dir, e.getMessage()); | |
| } | |
| } | |
| } |
🤖 Prompt for AI Agents
In
presto-clp/src/test/java/com/facebook/presto/plugin/clp/ClpMetadataDbSetUp.java
around lines 169 to 181, the log statement on line 178 concatenates strings
directly, which is less efficient and not recommended. Change the log.info call
to use parameterized logging by passing the directory path as a parameter
instead of concatenating it into the message string.
| try (PreparedStatement pstmt = conn.prepareStatement(insertDatasetsTableSql)) { | ||
| pstmt.setString(1, tableName); | ||
| pstmt.setString(2, "fs"); | ||
| pstmt.setString(3, ARCHIVE_STORAGE_DIRECTORY_BASE + tableName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Consider parameterizing the archive storage directory for consistency
The hardcoded concatenation of ARCHIVE_STORAGE_DIRECTORY_BASE + tableName could benefit from using the same format pattern used elsewhere in the class:
- pstmt.setString(3, ARCHIVE_STORAGE_DIRECTORY_BASE + tableName);
+ pstmt.setString(3, format("%s%s", ARCHIVE_STORAGE_DIRECTORY_BASE, tableName));Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In
presto-clp/src/test/java/com/facebook/presto/plugin/clp/ClpMetadataDbSetUp.java
at line 210, replace the hardcoded string concatenation of
ARCHIVE_STORAGE_DIRECTORY_BASE + tableName with the class's existing format
pattern for archive storage directory to maintain consistency. Identify the
format pattern used elsewhere in the class for this purpose and apply it here
when setting the string parameter in pstmt.
|
@anlowee I’ve updated the PR. There are a few cases I intentionally left unchanged:
|
Description
This PR introduces a CLP connector. The background and proposed implementation details are outlined in the associated RFC.
This PR implements one part of phase 1 of the proposed implementation, namely the Java implementation for the coordinator. The worker implementation will leverage Velox as the default query engine, so once the Velox PR is merged, we will submit another PR to this repo to add the necessary changes to
presto-native-execution.Like other connectors, we have created a
presto-clpmodule and implemented all required connector interfaces. The plan optimizer will be a future PR.The important classes in the connector are described below.
Core Classes in Java
ClpConfigThe configuration class for CLP. Currently, we support the following properties:
clp.metadata-expire-interval: Defines the time interval after which metadata entries are considered expired and removed from the cache.clp.metadata-refresh-interval: Specifies how frequently metadata should be refreshed from the source to ensure up-to-date information.clp.polymorphic-type-enabled: Enables or disables support for polymorphic types within CLP. This determines whether dynamic type resolution is allowed.clp.metadata-provider-type: Defines the type of the metadata provider. It could be a database, a file-based store, or another external system. By default, we use MySQL.clp.metadata-db-url: The connection URL for the metadata database, used whenclp.metadata-provider-typeis configured to use a database.clp.metadata-db-name: The name of the metadata database.clp.metadata-db-user: The database user with access to the metadata database.clp.metadata-db-password: The password for the metadata database user.clp.metadata-table-prefix: A prefix applied to table names in the metadata database.clp.split-provider-type: Defines the type of split provider for query execution. By default, we use MySQL, and the connection parameters are the same as those for the metadata database.ClpSchemaTreeA helper class for constructing a nested schema representation from CLP’s column definitions. It supports hierarchical column names (e.g.,
a.b.c), handles name/type conflicts when theclp.polymorphic-type-enabledoption is enabled, and maps serialized CLP types to Presto types. The schema tree produces a flat list ofClpColumnHandleinstances, includingRowTypefor nested structures, making it suitable for dynamic or semi-structured data formats.When polymorphic types are enabled, conflicting fields are given unique names by appending a type-specific suffix to the column name. For instance, if an integer field named "a" and a
Varstring(CLP type) field named "a" coexist in CLP’s schema tree, they are represented asa_bigintanda_varcharin Presto. This approach ensures that such fields remain queryable while adhering to Presto’s constraints.ClpMetadataProviderAn interface responsible for retrieving metadata from a specified source.
We provide a default implementation called
ClpMySqlMetadataProvider, which uses two MySQL tables. One of these is the datasets table, defined with the schema shown below. Currently, we support only a single Presto schema nameddefault, and this metadata table stores all table names, paths, and storage types associated with that Presto schema.nameVARCHAR(255)PRIMARY KEYarchive_storage_typeVARCHAR(4096)NOT NULLarchive_storage_directoryVARCHAR(4096)NOT NULLThe second MySQL table contains column metadata, defined by the schema shown below. Each Presto table is associated with a corresponding MySQL table that stores metadata about its columns.
nameVARCHAR(512)NOT NULLtypeTINYINTNOT NULLname,type)ClpSplitProviderIn CLP, an archive is the fundamental unit for searching, and we treat each archive as a Presto Split. This allows independent parallel searches across archives. The
ClpSplitProviderinterface, shown below, defines how to retrieve split information from a specified source:We provide a default implementation called
ClpMySqlSplitProvider. It uses an archive table to store archive IDs associated with each table. The table below shows part of the schema (some irrelevant fields are omitted).pagination_idBIGINTAUTO_INCREMENT PRIMARY KRYidVARCHAR(128)NOT NULLBy concatenating the table path (
archive_storage_directory) and the archive ID (id), we can retrieve all split paths for a table.ClpMetadataThis interface enables Presto to access various metadata. All requests are delegated to
ClpMetadataProviderFor metadata management, it also maintains two caches and periodically refreshes the metadata.
columnHandleCache: ALoadingCache<SchemaTableName, List<ClpColumnHandle>>that maps aSchemaTableNameto its corresponding list ofClpColumnHandleobjects.tableHandleCache: ALoadingCache<String, List<ClpTableHandle>>that maps a schema name (String) to a list ofClpTableHandleobjectsChecklist
breaking change.
Validation performed
Summary by CodeRabbit
New Features
Tests
Documentation
Chores