-
Notifications
You must be signed in to change notification settings - Fork 3
feat: Add CLP connector #3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
presto-clp/src/test/java/com/yscope/presto/TestClpPlanOptimizer.java
Outdated
Show resolved
Hide resolved
| public List<ClpColumnHandle> loadColumnHandles(SchemaTableName schemaTableName) | ||
| { | ||
| List<ClpColumnHandle> columnHandles = clpMetadataProvider.listColumnHandles(schemaTableName); | ||
| if (!config.isPolymorphicTypeEnabled()) { | ||
| return columnHandles; | ||
| } | ||
| return handlePolymorphicType(columnHandles); | ||
| } | ||
|
|
||
| public List<String> loadTableNames(String schemaName) | ||
| { | ||
| return clpMetadataProvider.listTableNames(schemaName); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (11)
presto-clp/src/main/java/com/yscope/presto/ClpExpression.java (1)
46-46: Minor spacing issue in constructor call.There's an extra space between
thisand the opening parenthesis.- this (Optional.empty(), Optional.empty()); + this(Optional.empty(), Optional.empty());presto-clp/src/main/java/com/yscope/presto/ClpSplitManager.java (3)
26-28: Add Javadoc documentation to the classAdding class-level documentation would improve code maintainability by clarifying the purpose and usage of this connector component.
+/** + * Split manager for the CLP connector. + * Provides the implementation for retrieving the splits for CLP tables. + */ public class ClpSplitManager implements ConnectorSplitManager {
29-29: Make the field declaration finalSince the field is only assigned in the constructor and not modified elsewhere, marking it as final would make the immutability explicit and improve code safety.
- private final ClpSplitProvider clpSplitProvider; + private final ClpSplitProvider clpSplitProvider;
44-44: Add error handling for listSplits operationThe call to clpSplitProvider.listSplits() lacks error handling. Consider wrapping this call in a try-catch block to handle potential exceptions from the split provider and provide more informative error messages.
- return new FixedSplitSource(clpSplitProvider.listSplits(layoutHandle)); + try { + return new FixedSplitSource(clpSplitProvider.listSplits(layoutHandle)); + } + catch (RuntimeException e) { + throw new RuntimeException("Failed to get splits for " + layoutHandle.getTable().getSchemaTableName(), e); + }presto-clp/src/main/java/com/yscope/presto/ClpPlanOptimizer.java (4)
51-58: Consider using thevariableAllocatorparameter or remove it if unnecessary.The
optimizemethod signature requires aVariableAllocatorparameter, yet it is not used within the method. If it is not needed, consider removing it (if permissible by the interface) or explicitly document why it is unused.
73-73: Usefalse == ...instead of negation.According to the coding guidelines, prefer checking
false == (node.getSource() instanceof TableScanNode)rather than using!(node.getSource() instanceof TableScanNode).Apply this diff to line 73:
-if (!(node.getSource() instanceof TableScanNode)) { +if (false == (node.getSource() instanceof TableScanNode)) {
86-86: Replace negation withfalse == ....In keeping with the guidelines, prefer
false == kqlQuery.isPresent()to!kqlQuery.isPresent().Apply this diff to line 86:
-if (!kqlQuery.isPresent()) { +if (false == kqlQuery.isPresent()) {
105-105: Usefalse == remainingPredicate.isPresent().Same reasoning applies here. Change
!remainingPredicate.isPresent()tofalse == remainingPredicate.isPresent().Apply this diff to line 105:
-if (!remainingPredicate.isPresent()) { +if (false == remainingPredicate.isPresent()) {presto-clp/src/main/java/com/yscope/presto/ClpConfig.java (1)
133-133: Follow coding guidelines for negation checks.Instead of using
!SAFE_SQL_IDENTIFIER.matcher(metadataTablePrefix).matches(), consider usingfalse == SAFE_SQL_IDENTIFIER.matcher(metadataTablePrefix).matches()to align with the specified coding standard.- if (metadataTablePrefix == null || !SAFE_SQL_IDENTIFIER.matcher(metadataTablePrefix).matches()) { + if (metadataTablePrefix == null || false == SAFE_SQL_IDENTIFIER.matcher(metadataTablePrefix).matches()) {presto-clp/src/main/java/com/yscope/presto/ClpFilterToKqlConverter.java (2)
94-94: Adhere to coding guidelines for boolean negation.
Within these lines, local style guidelines recommend usingfalse == <expression>instead of!<expression>. Consider updating for consistency.- if (expression.getRemainingExpression().isPresent() || !expression.getDefinition().isPresent()) { + if (expression.getRemainingExpression().isPresent() || false == expression.getDefinition().isPresent()) { - else if (!remainingExpressions.isEmpty()) { + else if (false == remainingExpressions.isEmpty()) { - if (expression.getRemainingExpression().isPresent() || !expression.getDefinition().isPresent()) { + if (expression.getRemainingExpression().isPresent() || false == expression.getDefinition().isPresent()) {Also applies to: 120-120, 144-144
100-135: Refactor repetitive string-building in handleAnd/handleOr.
Both methods perform very similar string operations and then remove trailing operators. Refactoring to a shared helper method would improve readability and remove code duplication.- // Repetitive building, appending " AND " or " OR ", then slicing off the last operator + // Possible shared utility method e.g., buildCompoundExpression(List<ClpExpression> expressions, String operator)Also applies to: 137-152
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (11)
presto-clp/src/main/java/com/yscope/presto/ClpConfig.java(1 hunks)presto-clp/src/main/java/com/yscope/presto/ClpExpression.java(1 hunks)presto-clp/src/main/java/com/yscope/presto/ClpFilterToKqlConverter.java(1 hunks)presto-clp/src/main/java/com/yscope/presto/ClpMetadata.java(1 hunks)presto-clp/src/main/java/com/yscope/presto/ClpModule.java(1 hunks)presto-clp/src/main/java/com/yscope/presto/ClpPlanOptimizer.java(1 hunks)presto-clp/src/main/java/com/yscope/presto/ClpSplitManager.java(1 hunks)presto-clp/src/main/java/com/yscope/presto/metadata/ClpSchemaTree.java(1 hunks)presto-clp/src/test/java/com/yscope/presto/TestClpMetadata.java(1 hunks)presto-clp/src/test/java/com/yscope/presto/TestClpPlanOptimizer.java(1 hunks)presto-clp/src/test/java/com/yscope/presto/TestClpSplit.java(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (5)
- presto-clp/src/test/java/com/yscope/presto/TestClpSplit.java
- presto-clp/src/test/java/com/yscope/presto/TestClpMetadata.java
- presto-clp/src/test/java/com/yscope/presto/TestClpPlanOptimizer.java
- presto-clp/src/main/java/com/yscope/presto/metadata/ClpSchemaTree.java
- presto-clp/src/main/java/com/yscope/presto/ClpMetadata.java
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}`: - Prefer `false == ` rather than `!`.
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}: - Preferfalse == <expression>rather than!<expression>.
presto-clp/src/main/java/com/yscope/presto/ClpSplitManager.javapresto-clp/src/main/java/com/yscope/presto/ClpExpression.javapresto-clp/src/main/java/com/yscope/presto/ClpFilterToKqlConverter.javapresto-clp/src/main/java/com/yscope/presto/ClpPlanOptimizer.javapresto-clp/src/main/java/com/yscope/presto/ClpConfig.javapresto-clp/src/main/java/com/yscope/presto/ClpModule.java
🧬 Code Definitions (2)
presto-clp/src/main/java/com/yscope/presto/ClpPlanOptimizer.java (1)
presto-spi/src/main/java/com/facebook/presto/spi/VariableAllocator.java (1)
VariableAllocator(33-167)
presto-clp/src/main/java/com/yscope/presto/ClpModule.java (3)
presto-clp/src/main/java/com/yscope/presto/metadata/ClpMySQLMetadataProvider.java (1)
ClpMySQLMetadataProvider(30-112)presto-clp/src/main/java/com/yscope/presto/split/ClpMySQLSplitProvider.java (1)
ClpMySQLSplitProvider(31-108)presto-clp/src/main/java/com/yscope/presto/ClpConfig.java (1)
ClpConfig(21-189)
⏰ Context from checks skipped due to timeout of 90000ms (20)
- GitHub Check: product-tests-basic-environment (8)
- GitHub Check: kudu (8)
- GitHub Check: spark-integration (17.0.13)
- GitHub Check: spark-integration (8)
- GitHub Check: singlestore-dockerized-tests (8)
- GitHub Check: singlestore-dockerized-tests (17.0.13)
- GitHub Check: hive-tests (17.0.13)
- GitHub Check: hive-tests (8)
- GitHub Check: hive-dockerized-tests (8)
- GitHub Check: hive-dockerized-tests (17.0.13)
- GitHub Check: product-tests-specific-environment2 (17.0.13)
- GitHub Check: product-tests-specific-environment2 (8)
- GitHub Check: product-tests-specific-environment1 (17.0.13)
- GitHub Check: product-tests-specific-environment1 (8)
- GitHub Check: test-other-modules (17.0.13)
- GitHub Check: test-other-modules (8)
- GitHub Check: test (:presto-base-arrow-flight)
- GitHub Check: test (:presto-docs)
- GitHub Check: maven-checks (17.0.13)
- GitHub Check: maven-checks (8)
🔇 Additional comments (18)
presto-clp/src/main/java/com/yscope/presto/ClpExpression.java (3)
31-35: Good documentation for class variables.The comments for
definitionandremainingExpressionprovide clear explanations of what these fields represent, addressing the previous review comment about adding introductory comments for these variables.
20-28: Well-documented class purpose.The Javadoc clearly explains the purpose of the class and the three different cases it handles when converting Presto RowExpression to KQL, which makes the code more maintainable and understandable.
43-47: Good use of convenience constructors.The additional constructors provide convenient ways to create ClpExpression objects for each of the three cases described in the class documentation, making the code more readable where this class is used.
Also applies to: 49-53, 55-59
presto-clp/src/main/java/com/yscope/presto/ClpSplitManager.java (2)
16-22: Missing import for SplitSchedulingContextThe SplitSchedulingContext class is used as a parameter type in the getSplits method (line 41), but there's no import statement for it. This will cause a compilation error.
Add the following import:
import com.facebook.presto.spi.connector.ConnectorSplitManager; import com.facebook.presto.spi.connector.ConnectorTransactionHandle; +import com.facebook.presto.spi.connector.SplitSchedulingContext;
37-45: Add type safety check before castingThe layout parameter is cast to ClpTableLayoutHandle without verifying its type first. This could lead to ClassCastException if an incorrect handle type is passed.
- ClpTableLayoutHandle layoutHandle = (ClpTableLayoutHandle) layout; + if (!(layout instanceof ClpTableLayoutHandle)) { + throw new IllegalArgumentException("Expected ClpTableLayoutHandle but got " + layout.getClass().getName()); + } + ClpTableLayoutHandle layoutHandle = (ClpTableLayoutHandle) layout;presto-clp/src/main/java/com/yscope/presto/ClpModule.java (6)
1-13: Licence header looks correct
No concerns with the Apache 2.0 licence block.
14-26: Imports are consistent
The imported classes and packages appear to align with the Presto framework and your use-cases.
27-29: Class definition is well-structured
TheClpModuleclass correctly extendsAbstractConfigurationAwareModule.
30-36: Configuration binding is straightforward
The binder usage and config binding withClpConfigis aligned with Presto’s typical patterns.
38-45: Conditional binding for metadata provider
This logic cleanly handles the single supported provider type (MySQL) and raises an exception if unsupported. Future expansions could be handled via a switch statement if more providers are introduced.
47-53: Conditional binding for split provider
Similar to the metadata provider logic, this is well-handled. The explicit exception for unsupported providers ensures clear error reporting.presto-clp/src/main/java/com/yscope/presto/ClpPlanOptimizer.java (2)
89-89: Be mindful of potential sensitive data in logs.Logging the raw KQL query might present a privacy or security concern if user-supplied values end up in logs. Confirm that these queries do not contain user PII or confidential information.
60-114: Well-structured rewrite logic.The Rewriter class cleanly handles table scans and updates the plan accordingly. The approach of returning the original node if conditions do not match ensures minimal disruption.
presto-clp/src/main/java/com/yscope/presto/ClpConfig.java (1)
23-27: Confirm usage ofArchiveSourceandgetInputSource().It appears this functionality may not be used within this file. A previous reviewer recommended verifying whether we genuinely require
ArchiveSourceandgetInputSource()or if they should be consolidated, since each archive may be treated as a split.Also applies to: 166-176
presto-clp/src/main/java/com/yscope/presto/ClpFilterToKqlConverter.java (4)
52-54: Looks good – well-structured use of RowExpressionVisitor.
The class implementation and interface usage are clear and provide a cohesive structure for converting filter expressions to KQL.
80-83: Reintroduce null checks on the column handle.
Castingassignments.get(variable)toClpColumnHandlewithout verifying it can lead toNullPointerExceptionorClassCastException. A prior review flagged this issue as well, so kindly reintroduce or confirm the null checks to ensure safety.
254-293: Validate edge cases for the LIKE operator conversion.
The current path translates%to*and_to?, yet it may not handle multiple wildcards or boundary cases (e.g. leading/trailing%). Ensure end-to-end test coverage for patterns such as'%abc%','_abc', and'abc%'.
342-368: Verify number format handling in parseIntValue.
The code attempts to parse integers and silently ignoresNumberFormatException. Confirm that silently ignoring the exception is desired behaviour, and ensure test coverage for erroneous or out-of-bounds values (e.g. large integers).
| StringBuilder queryBuilder = new StringBuilder(); | ||
| queryBuilder.append("("); | ||
| for (RowExpression argument : node.getArguments().subList(1, node.getArguments().size())) { | ||
| if (!(argument instanceof ConstantExpression)) { | ||
| return new ClpExpression(node); | ||
| } | ||
| ConstantExpression literal = (ConstantExpression) argument; | ||
| String literalString = getLiteralString(literal); | ||
| queryBuilder.append(variableName).append(": "); | ||
| if (literal.getType() instanceof VarcharType) { | ||
| queryBuilder.append("\"").append(literalString).append("\""); | ||
| } | ||
| else { | ||
| queryBuilder.append(literalString); | ||
| } | ||
| queryBuilder.append(" OR "); | ||
| } | ||
| // Remove the last " OR " from the query | ||
| return new ClpExpression(queryBuilder.substring(0, queryBuilder.length() - 4) + ")"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ensure secure handling of user input when constructing KQL queries.
Using raw string concatenation for "variableName": "literalString" can expose the system to injection or parsing issues if the literal contains KQL metacharacters. Consider adding proper escaping or validation to avoid malicious or malformed inputs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
presto-clp/src/main/java/com/yscope/presto/metadata/ClpSchemaTree.java (3)
37-48: Add a constructor forClpNodeto enforce clearer node usage.
It might be beneficial to provide a constructor (for example, acceptingtypeandoriginalName) and explicitly prevent children from being populated in leaf nodes. This can help avert confusion regarding leaf vs. non-leaf usage.
58-77: Consider supporting more diverse array element types.
Currently,UnstructuredArrayis mapped tonew ArrayType(VarcharType.VARCHAR). If there is a likelihood of arrays containing numeric or boolean elements, you could introduce logic to differentiate the inner types.
105-108: Adhere to the code style preference for boolean expressions.
Per the coding guidelines, preferfalse == existing.type.equals(prestoType)rather than using!existing.type.equals(prestoType).- if (existing.type != null && !existing.type.equals(prestoType)) { + if (existing.type != null && false == existing.type.equals(prestoType)) {
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
presto-clp/src/main/java/com/yscope/presto/metadata/ClpSchemaTree.java(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}`: - Prefer `false == ` rather than `!`.
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}: - Preferfalse == <expression>rather than!<expression>.
presto-clp/src/main/java/com/yscope/presto/metadata/ClpSchemaTree.java
⏰ Context from checks skipped due to timeout of 90000ms (12)
- GitHub Check: product-tests-basic-environment (8)
- GitHub Check: product-tests-specific-environment2 (17.0.13)
- GitHub Check: singlestore-dockerized-tests (8)
- GitHub Check: singlestore-dockerized-tests (17.0.13)
- GitHub Check: hive-dockerized-tests (17.0.13)
- GitHub Check: spark-integration (8)
- GitHub Check: kudu (17.0.13)
- GitHub Check: kudu (8)
- GitHub Check: test (:presto-docs)
- GitHub Check: test (:presto-base-arrow-flight)
- GitHub Check: maven-checks (8)
- GitHub Check: maven-checks (17.0.13)
🔇 Additional comments (1)
presto-clp/src/main/java/com/yscope/presto/metadata/ClpSchemaTree.java (1)
159-169: Add a brief method-level comment explaining the DFS approach.
This method implements a depth-first traversal to constructRowTypefrom nested child nodes. Documenting this will improve clarity.
presto-clp/src/main/java/com/yscope/presto/ClpFilterToKqlConverter.java
Outdated
Show resolved
Hide resolved
presto-clp/src/main/java/com/yscope/presto/ClpFilterToKqlConverter.java
Outdated
Show resolved
Hide resolved
presto-clp/src/main/java/com/yscope/presto/split/ClpMySQLSplitProvider.java
Outdated
Show resolved
Hide resolved
presto-clp/src/main/java/com/yscope/presto/ClpColumnHandle.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (37)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpSplitProvider.java (1)
21-24: Consider adding Javadoc documentation.The interface and its method lack Javadoc documentation. Adding documentation would improve clarity about the purpose of this interface and the expected behavior of the
listSplitsmethod, especially regarding any potential side effects or exceptions that implementations might throw.public interface ClpSplitProvider { + /** + * Lists the splits for a given table layout handle. + * + * @param clpTableLayoutHandle the table layout handle + * @return a list of splits + */ List<ClpSplit> listSplits(ClpTableLayoutHandle clpTableLayoutHandle); }presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpExpression.java (2)
43-47: Fix spacing in constructor delegationThere's an extra space between "this" and the opening parenthesis on line 46.
- this (Optional.empty(), Optional.empty()); + this(Optional.empty(), Optional.empty());
61-69: Consider adding additional utility methodsThe getter methods are appropriately implemented. Consider adding helper methods to improve usability:
+ /** + * Returns true if the entire expression was converted to KQL. + */ + public boolean isFullyTranslatable() + { + return definition.isPresent() && !remainingExpression.isPresent(); + } + + /** + * Returns true if part of the expression was converted to KQL. + */ + public boolean isPartiallyTranslatable() + { + return definition.isPresent() && remainingExpression.isPresent(); + } + + /** + * Returns true if none of the expression could be converted to KQL. + */ + public boolean isNonTranslatable() + { + return !definition.isPresent() && remainingExpression.isPresent(); + } + + @Override + public String toString() + { + return "ClpExpression{" + + "definition=" + definition + + ", remainingExpression=" + remainingExpression + + '}'; + } + + @Override + public boolean equals(Object o) + { + if (this == o) { + return true; + } + if (o == null || getClass() != o.getClass()) { + return false; + } + ClpExpression that = (ClpExpression) o; + return Objects.equals(definition, that.definition) && + Objects.equals(remainingExpression, that.remainingExpression); + } + + @Override + public int hashCode() + { + return Objects.hash(definition, remainingExpression); + }Don't forget to add the import for java.util.Objects if you implement these methods.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizerProvider.java (3)
26-28: Enhance with class documentationConsider adding Javadoc to the class describing its purpose and responsibilities within the CLP connector ecosystem. This would help future maintainers understand how this class fits into the overall architecture.
+/** + * Provides plan optimizers for the CLP connector. + * This class is responsible for creating and providing logical and physical + * plan optimizers that are specific to CLP query execution. + */ public class ClpPlanOptimizerProvider implements ConnectorPlanOptimizerProvider {
40-44: Add method documentationConsider adding Javadoc to document that this method intentionally returns an empty set of logical plan optimizers.
+/** + * Returns an empty set since no logical plan optimizations are needed for the CLP connector. + */ @Override public Set<ConnectorPlanOptimizer> getLogicalPlanOptimizers() { return ImmutableSet.of(); }
46-50: Add method documentationConsider adding Javadoc to describe the purpose of the physical plan optimizers being provided.
+/** + * Returns a set of physical plan optimizers for the CLP connector. + * Currently provides a ClpPlanOptimizer that handles operations + * such as filter pushdown or conversion to KQL expressions. + */ @Override public Set<ConnectorPlanOptimizer> getPhysicalPlanOptimizers() { return ImmutableSet.of(new ClpPlanOptimizer(functionManager, functionResolution)); }presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpSplit.java (5)
41-42: Potential null return value not properly documented in getter methodThe
archivePathparameter is marked as@Nullablein the constructor, but the corresponding getter methodgetArchivePath()is missing the@Nullableannotation. This inconsistency could lead to NullPointerExceptions.@JsonProperty +@Nullable public String getArchivePath() { return archivePath; }
39-47: Add validation to constructor parametersThe constructor accepts nullable parameters but doesn't perform any validation. Consider adding validation checks for the parameters to ensure they meet your requirements, especially if certain formats are expected for
archivePath.@JsonCreator public ClpSplit(@JsonProperty("schemaTableName") @Nullable SchemaTableName schemaTableName, @JsonProperty("archivePath") @Nullable String archivePath, @JsonProperty("query") Optional<String> query) { + // Optional validation logic here + // Example: if (archivePath != null && !isValidArchivePath(archivePath)) { + // throw new IllegalArgumentException("Invalid archive path format"); + // } this.schemaTableName = schemaTableName; this.archivePath = archivePath; this.query = query; }
32-85: Add equals, hashCode, and toString methods for better object handlingThe class lacks
equals(),hashCode(), andtoString()methods, which are important for properly handling instances in collections and for debugging purposes.Consider adding these methods to improve usability:
@Override public boolean equals(Object o) { if (this == o) { return true; } if (o == null || getClass() != o.getClass()) { return false; } ClpSplit that = (ClpSplit) o; return Objects.equals(schemaTableName, that.schemaTableName) && Objects.equals(archivePath, that.archivePath) && Objects.equals(query, that.query); } @Override public int hashCode() { return Objects.hash(schemaTableName, archivePath, query); } @Override public String toString() { return "ClpSplit{" + "schemaTableName=" + schemaTableName + ", archivePath='" + archivePath + '\'' + ", query=" + query + '}'; }Remember to add
import java.util.Objects;to your imports.
32-34: Add Javadoc comments for better code documentationThe class lacks Javadoc comments that would help users understand its purpose and usage. Consider adding class-level and method-level documentation.
+/** + * Represents a split in the CLP connector. + * A split corresponds to an archive path that can be processed by a Presto worker. + */ public class ClpSplit implements ConnectorSplit {
80-84: Consider adding type safety to getInfo() return valueWhile returning
thisfromgetInfo()is technically correct, it might be clearer to explicitly document that this method returns theClpSplitinstance itself by adding a comment explaining this behavior.@Override public Object getInfo() { + // Return this instance to provide information about the split return this; }presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpSplitManager.java (2)
43-43: Add type safety check before casting.Direct casting without type checking can lead to ClassCastException at runtime. Consider adding an instanceof check before casting.
- ClpTableLayoutHandle layoutHandle = (ClpTableLayoutHandle) layout; + if (!(layout instanceof ClpTableLayoutHandle)) { + throw new IllegalArgumentException("Expected ClpTableLayoutHandle but got " + layout.getClass().getName()); + } + ClpTableLayoutHandle layoutHandle = (ClpTableLayoutHandle) layout;
44-44: Consider adding exception handling.The
listSplitsmethod might throw exceptions that aren't being caught or handled here. This could potentially cause runtime failures without proper error context.- return new FixedSplitSource(clpSplitProvider.listSplits(layoutHandle)); + try { + return new FixedSplitSource(clpSplitProvider.listSplits(layoutHandle)); + } + catch (Exception e) { + throw new RuntimeException("Failed to get splits for " + layoutHandle, e); + }pom.xml (1)
215-215: Module Addition: New presto-clp Module Registered
The new<module>presto-clp</module>is added as expected in the<modules>list. This aligns with the PR objectives to introduce the CLP connector module. Please verify that its ordering is consistent with project conventions and that any associated build or integration configurations (such as plugin registration) are updated accordingly.presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java (1)
21-26: Interface methods are implicitly publicThe
publicmodifier on interface methods is redundant since methods declared in an interface are implicitly public. Consider removing thepublicmodifier for both methods to follow Java best practices.public interface ClpMetadataProvider { - public List<ClpColumnHandle> listColumnHandles(SchemaTableName schemaTableName); + List<ClpColumnHandle> listColumnHandles(SchemaTableName schemaTableName); - public List<String> listTableNames(String schema); + List<String> listTableNames(String schema); }presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableHandle.java (1)
48-57: Consider using Objects.equals for null safetyIn the
equalsmethod, you're directly callingequalsonschemaTableName. While the class check protects againstnullforobj, it's generally safer to useObjects.equals()for comparing object fields to handle potential null values.@Override public boolean equals(Object obj) { if (this == obj) { return true; } if (obj == null || getClass() != obj.getClass()) { return false; } ClpTableHandle other = (ClpTableHandle) obj; - return this.schemaTableName.equals(other.schemaTableName); + return Objects.equals(this.schemaTableName, other.schemaTableName); }presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java (1)
94-94: Adhere to coding guideline for negation.Your code uses
!expressionin these sections, but the coding guidelines for this repository specify usingfalse == expressioninstead of!expression. Please consider updating for consistency.Also applies to: 145-145, 190-190, 244-244, 262-262, 276-276
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpColumnHandle.java (2)
35-45: Consider validating constructor arguments for null values.It's a good practice to enforce non-null arguments for critical fields like columnName and columnType to help prevent unexpected runtime errors.
Below is a possible change to add argument checks:
public ClpColumnHandle( @JsonProperty("columnName") String columnName, @JsonProperty("originalColumnName") String originalColumnName, @JsonProperty("columnType") Type columnType, @JsonProperty("nullable") boolean nullable) { - this.columnName = columnName; - this.originalColumnName = originalColumnName; - this.columnType = columnType; + this.columnName = java.util.Objects.requireNonNull(columnName, "columnName is null"); + this.originalColumnName = java.util.Objects.requireNonNull(originalColumnName, "originalColumnName is null"); + this.columnType = java.util.Objects.requireNonNull(columnType, "columnType is null"); this.nullable = nullable; }
47-50: Validate constructor arguments in the convenience constructor as well.For consistency with the primary constructor, consider adding non-null checks.
public ClpColumnHandle(String columnName, Type columnType, boolean nullable) { - this(columnName, columnName, columnType, nullable); + this( + java.util.Objects.requireNonNull(columnName, "columnName is null"), + columnName, + java.util.Objects.requireNonNull(columnType, "columnType is null"), + nullable + ); }presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizer.java (4)
73-73: Use 'false == expression' instead of '!expression' per coding guidelines.Replace
!(node.getSource() instanceof TableScanNode)withfalse == (node.getSource() instanceof TableScanNode).-if (!(node.getSource() instanceof TableScanNode)) { +if (false == (node.getSource() instanceof TableScanNode)) { return node; }
86-86: Use 'false == expression' instead of '!expression' per coding guidelines.Replace
if (!kqlQuery.isPresent()) { ... }withif (false == kqlQuery.isPresent()) { ... }.-if (!kqlQuery.isPresent()) { +if (false == kqlQuery.isPresent()) { return node; }
105-105: Use 'false == expression' instead of '!expression' per coding guidelines.Replace
if (!remainingPredicate.isPresent()) { ... }withif (false == remainingPredicate.isPresent()) { ... }.-if (!remainingPredicate.isPresent()) { +if (false == remainingPredicate.isPresent()) { return newTableScanNode; }
89-89: Potentially sanitize logged content.Printing the entire KQL query to the logs may reveal sensitive application data. Consider masking or sanitizing the query if it might contain private user information.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java (1)
38-54: Conditional provider binding is robust.
Handling unsupported provider types by throwing aPrestoExceptionis appropriate here. However, if you foresee adding more providers in the future, consider using a pluggable registry-like approach to avoid frequent modifications in this module.presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpMetadata.java (2)
57-74: Database setup is well-organized.
Creating H2 tables in the setup routine is a good approach for test isolation. Consider using an in-memory database for faster tests, unless file-based persistence is required for your test scenario.
126-202: Integration tests for table metadata are comprehensive.
You’re validating the returned columns, types, and nullability. Consider adding a negative test case that verifies behaviour when the table does not exist or the metadata is partially missing.presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java (1)
126-139: Split listing test coverage.
Validating the number of splits and ensuring archive paths align with expectations is excellent. Consider an additional test verifying the scenario where no archives exist for a table.presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpPlanOptimizer.java (1)
171-174: Consider extending tests for deeper edge cases
Nested negations (line 171) and more intricate filtering expressions are well covered, but you might consider additional scenarios such as handling empty strings, overly long substrings, or special characters in string expressions for added robustness.presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnector.java (1)
63-66: Clarify read-only transaction handling
All transactions default to the same handle (ClpTransactionHandle.INSTANCE), ignoring thereadOnlyflag. If different behaviour is eventually needed for read-write vs. read-only, consider adding appropriate logic.presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java (1)
59-60: Apply style guideline for negation
Per the coding guidelines, prefer usingfalse == expressionrather than!expression. You can rewrite:-if (dbName != null && !dbName.isEmpty()) { +if (dbName != null && false == dbName.isEmpty()) {presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMySqlMetadataProvider.java (2)
42-52: Consider updating to the current MySQL driver class name.Using
Class.forName("com.mysql.jdbc.Driver")may lead to warnings in modern MySQL driver versions. It is often recommended to use"com.mysql.cj.jdbc.Driver"instead for newer MySQL distributions.
69-86: Minor concern about the error log statement.At line 83, concatenating log messages (
"Failed to load table schema for %s: %s" + schemaTableName.getTableName()) could lead to confusing output. Consider using a standard logger format with placeholders instead, for example:log.error("Failed to load table schema for {}: {}", schemaTableName.getTableName(), e);presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnectorFactory.java (1)
48-69: Initial bootstrap logic looks correct.Dependency injection with the
Bootstrapapproach should integrate well. Make sure to log or track errors for easier debugging if the plugin fails to initialize.presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpSchemaTree.java (1)
79-133: Align with the code style guideline by avoiding negation with “!”.At line 108, consider using
false == existing.type.equals(prestoType)rather than!existing.type.equals(prestoType), in keeping with the stated preferences for Java code. This is only a minor style point and not functionally problematic.Otherwise, polymorphic conflict resolution logic is well-designed.
Suffixing with the type display name resolves naming collisions neatly, especially when polymorphic types are enabled.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadata.java (2)
99-101: Adhere to coding guidelines for negation checks.
In Java code, prefer usingfalse == <expression>instead of!<expression>. Consider the following changes for clarity:- if (!listSchemaNames(session).contains(schemaNameValue)) { + if (false == listSchemaNames(session).contains(schemaNameValue)) {- if (!listSchemaNames(session).contains(schemaName)) { + if (false == listSchemaNames(session).contains(schemaName)) {- if (!listTables(schemaName).contains(tableName.getTableName())) { + if (false == listTables(schemaName).contains(tableName.getTableName())) {Also applies to: 112-114, 116-118
158-160: Apply negation style rule for schema comparison.
The condition uses!schemaName.equals(...). As per the Java coding guidelines, prefer:- if (schemaName != null && !schemaName.equals(DEFAULT_SCHEMA_NAME)) { + if (schemaName != null && false == schemaName.equals(DEFAULT_SCHEMA_NAME)) {presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java (1)
133-135: Negation style improvement for metadata table prefix check.
Usefalse == SAFE_SQL_IDENTIFIER.matcher(...)over!SAFE_SQL_IDENTIFIER.matcher(...)for consistency:- if (metadataTablePrefix == null || !SAFE_SQL_IDENTIFIER.matcher(metadataTablePrefix).matches()) { + if (metadataTablePrefix == null || false == SAFE_SQL_IDENTIFIER.matcher(metadataTablePrefix).matches()) {
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (31)
pom.xml(2 hunks)presto-clp/pom.xml(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpColumnHandle.java(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnector.java(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnectorFactory.java(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpErrorCode.java(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpExpression.java(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpHandleResolver.java(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadata.java(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizer.java(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizerProvider.java(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlugin.java(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpSplit.java(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpSplitManager.java(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableHandle.java(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableLayoutHandle.java(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTransactionHandle.java(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMySqlMetadataProvider.java(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpNodeType.java(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpSchemaTree.java(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpSplitProvider.java(1 hunks)presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpMetadata.java(1 hunks)presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpPlanOptimizer.java(1 hunks)presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpQueryBase.java(1 hunks)presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java(1 hunks)presto-server/src/main/provisio/presto.xml(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTransactionHandle.java
🚧 Files skipped from review as they are similar to previous changes (2)
- presto-server/src/main/provisio/presto.xml
- presto-clp/pom.xml
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}`: - Prefer `false == ` rather than `!`.
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}: - Preferfalse == <expression>rather than!<expression>.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlugin.javapresto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpSplitProvider.javapresto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpSplitManager.javapresto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.javapresto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizerProvider.javapresto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableLayoutHandle.javapresto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.javapresto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpMetadata.javapresto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.javapresto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableHandle.javapresto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpErrorCode.javapresto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpHandleResolver.javapresto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.javapresto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpPlanOptimizer.javapresto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnector.javapresto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMySqlMetadataProvider.javapresto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpSchemaTree.javapresto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpExpression.javapresto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnectorFactory.javapresto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpColumnHandle.javapresto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.javapresto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpNodeType.javapresto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpSplit.javapresto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.javapresto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpQueryBase.javapresto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadata.javapresto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizer.java
🧬 Code Definitions (8)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpSplitProvider.java (2)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpSplit.java (1)
ClpSplit(32-85)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableLayoutHandle.java (1)
ClpTableLayoutHandle(23-73)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java (1)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpColumnHandle.java (1)
ClpColumnHandle(26-114)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java (3)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMySqlMetadataProvider.java (1)
ClpMySqlMetadataProvider(30-112)presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java (1)
ClpMySqlSplitProvider(31-108)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java (1)
ClpConfig(21-189)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java (3)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java (1)
ClpConfig(21-189)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpSplit.java (1)
ClpSplit(32-85)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableLayoutHandle.java (1)
ClpTableLayoutHandle(23-73)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpPlanOptimizer.java (2)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpMetadata.java (1)
Test(49-203)presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java (1)
Test(36-140)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMySqlMetadataProvider.java (2)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpColumnHandle.java (1)
ClpColumnHandle(26-114)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java (1)
ClpConfig(21-189)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpSchemaTree.java (1)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpColumnHandle.java (1)
ClpColumnHandle(26-114)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadata.java (1)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpColumnHandle.java (1)
ClpColumnHandle(26-114)
⏰ Context from checks skipped due to timeout of 90000ms (20)
- GitHub Check: spark-integration (8)
- GitHub Check: product-tests-basic-environment (17.0.13)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-distributed-queries)
- GitHub Check: test (8, :presto-iceberg)
- GitHub Check: test (17.0.13, :presto-tests -P presto-tests-general)
- GitHub Check: product-tests-specific-environment1 (8)
- GitHub Check: test (17.0.13, :presto-redis -P test-redis-integration-smoke-test)
- GitHub Check: product-tests-specific-environment2 (8)
- GitHub Check: singlestore-dockerized-tests (8)
- GitHub Check: singlestore-dockerized-tests (17.0.13)
- GitHub Check: kudu (8)
- GitHub Check: kudu (17.0.13)
- GitHub Check: hive-tests (17.0.13)
- GitHub Check: hive-dockerized-tests (8)
- GitHub Check: hive-dockerized-tests (17.0.13)
- GitHub Check: hive-tests (8)
- GitHub Check: test (:presto-docs)
- GitHub Check: test (:presto-base-arrow-flight)
- GitHub Check: maven-checks (8)
- GitHub Check: maven-checks (17.0.13)
🔇 Additional comments (50)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpSplitProvider.java (1)
21-24: Well-defined interface with clear responsibility.The
ClpSplitProviderinterface has a clear and focused responsibility, following good design principles. The method signature is appropriate for its purpose of retrieving splits based on a table layout handle.presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpExpression.java (4)
1-28: License and documentation look goodThe class has a clear and detailed Javadoc that explains the purpose and the three possible scenarios for the conversion results. This is excellent documentation that will help other developers understand how to use this class.
29-35: Clear class definition with well-explained fieldsThe class structure is clean with immutable fields, which is good for thread safety. The comments for each field clearly explain their purpose.
37-41: Primary constructor looks goodThis constructor properly initializes both fields and follows an immutable design pattern.
49-59: Specialized constructors are well implementedThe specialized constructors for fully translatable and non-translatable expressions make the API more convenient to use. They delegate to the primary constructor correctly.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizerProvider.java (1)
29-38: Implementation looks goodThe dependency injection pattern is correctly implemented with final fields and the @Inject annotation. The constructor properly initializes all fields.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpSplit.java (1)
68-72: LGTM: Node selection strategy implementationThe implementation properly returns NO_PREFERENCE for node selection strategy, which is appropriate for connectors that don't have specific node placement requirements.
pom.xml (1)
765-770: Dependency Addition: presto-clp Dependency Introduced
This new dependency block forpresto-clpis correctly added under<dependencyManagement>, with its version set as${project.version}. Ensure that this internal dependency is properly integrated with the module build and that documentation (if any) reflects its addition.presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpErrorCode.java (1)
22-43: Error handling implementation looks goodThe
ClpErrorCodeenum properly implements theErrorCodeSupplierinterface with well-defined error codes and follows standard Presto error handling patterns. The error code offset of0x0400_0000helps avoid conflicts with other error codes in the system.presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpHandleResolver.java (1)
23-55: Handle resolver implementation is appropriateThe
ClpHandleResolvercorrectly implements all required methods from theConnectorHandleResolverinterface, returning the appropriate CLP-specific handle classes. This class is essential for Presto to correctly work with the CLP connector's custom types.presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableLayoutHandle.java (1)
50-60: Potential oversight inequalsandhashCodemethods.The
equalsandhashCodemethods only compare thetablefield and exclude thequeryfield. Ifqueryis expected to participate in equality checks, consider updating these methods to include it.Could you confirm if this is an intentional design choice?
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpNodeType.java (1)
16-66: All good here!This enumeration is straightforward, and the lookup table logic for byte-based indexing is well implemented.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java (1)
169-169: Confirm potential KQL injection risk.Literal strings are appended directly to KQL expressions. If any user-controlled input can reach these code paths, confirm whether escaping or sanitization is required to prevent a possible injection vulnerability.
Also applies to: 292-292, 366-366, 403-403
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlugin.java (1)
20-28: Straightforward plugin implementation looks good.No issues spotted. Implementation is concise and aligns with Presto plugin guidelines.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java (4)
1-13: Licence header is correct and follows standard practice.
No changes required.
14-26: Imports and package statement are well-organized.
No issues found with naming or usage.
27-29: Class declaration clarity.
The extension ofAbstractConfigurationAwareModulealigns well with Presto’s connector module design.
30-36: Configuration binding is implemented correctly.
UsingconfigBinder(binder).bindConfig(ClpConfig.class)ensures that properties are injected from the server configuration seamlessly.presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpMetadata.java (3)
1-13: Licence header is in compliance.
No changes needed.
14-33: Imports and test annotations.
Imports appear correct, and usage of TestNG’s@BeforeMethodand@AfterMethodis appropriate for test lifecycle management.
78-105: Batch insertion logic.
Batch inserts reduce overhead and improve test setup performance. This is efficient.presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java (4)
1-13: Licence header verified.
No changes required.
14-35: Imports and test structure.
Your usage of TestNG annotations is consistent, and the test structure is clean.
47-62: Configuration creation in setup.
Using a dedicated configuration with test-specific settings for the database is good practice.
66-106: Table creation and batch insertion.
Same as withTestClpMetadata, this design is effective for a consistent test environment.presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpPlanOptimizer.java (1)
25-28: Comprehensive coverage of pushdown tests
These test cases thoroughly validate various scenarios (substring matching, LIKE patterns, numeric comparisons, AND/OR/NOT logic, etc.), providing strong confidence in pushdown correctness. Good job!presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnector.java (1)
43-54: Constructor injection and null checks are well implemented
UsingrequireNonNullin the constructor ensures dependencies are valid, which helps avoid runtime null pointer issues.presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java (1)
102-104: Re-examine partial or empty results handling
When an SQLException occurs, the method logs the error but still returns any splits collected so far. If partial results are not desired, consider rethrowing an exception or returning an empty list for clarity.presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMySqlMetadataProvider.java (3)
54-62: Connection usage looks good.The connection management and database selection logic are straightforward. Ensure any external usage of
ClpConfigis well-documented so users configure valid credentials.
64-67: Identifier validation is concise.This method cleanly enforces the safe identifier check, minimizing the risk of SQL injection.
88-111: Overall logic is sound.The approach to filtering invalid table names aligns well with the safe identifier pattern. No major concerns here.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnectorFactory.java (2)
36-40: Naming convention is clear.Returning
"clp"for the connector name is consistent and self-explanatory.
42-46: Handle resolver implementation is straightforward.Delegating to
ClpHandleResolveris correct.presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpSchemaTree.java (5)
37-48: Internal node structure is well-defined.Defining a nested
ClpNodeclass for the schema tree clarifies the approach to schema nesting.
50-56: Constructor logic is minimal and effective.Instantiating the root node and setting the polymorphic type flag is clear.
58-77: Mapping of custom type codes to Presto types is robust.The switch statement covers expected cases. The fallback to an exception on unsupported types is good, ensuring unknown cases are captured.
135-157: Column collection for leaves and nested rows is sensible.Building
RowTypefor children ensures proper representation of nested structures.
159-169: Recursive row type construction is well-structured.The method correctly handles nested nodes, generating fields with optional names. This aligns well with Presto’s row type model.
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpQueryBase.java (6)
1-13: Licence header and package declaration look good.
No issues found with the Apache license header.
14-55: Static fields and initial declarations are well-defined.
The usage of nestedRowTypefields forcityis a good approach for testing complex types.
57-96: SessionHolder encapsulation is clear.
TheSessionHolderclass appropriately bundles both theConnectorSessionandSession, facilitating simpler testing contexts.
98-101: SQL expression parsing is correct.
Theexpression(String sql)method properly rewrites identifiers to symbol references usingExpressionUtils.
103-114: RowExpression translation is properly implemented.
The usage ofgetExpressionTypesandSqlToRowExpressionTranslatoraligns with Presto’s recommended practices.
116-119: Helper method getRowExpression is concise and coherent.
No issues found.presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadata.java (4)
1-13: License block validated.
No concerns found with the licensing header.
140-150: Table metadata retrieval logic is consistent.
Building aConnectorTableMetadatainstance from the cached column handles is appropriate.
176-184: Column handle mapping is straightforward.
No further issues detected.
186-193: Column metadata retrieval is correct.
MappingClpColumnHandletoColumnMetadatais appropriate.presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java (2)
1-13: License block validated.
No concerns found here.
19-190: General configuration approach is solid.
Configuration setters and validation logic appear correct and adhere to Presto’s best practices.
| public ConnectorSplitSource getSplits(ConnectorTransactionHandle transactionHandle, | ||
| ConnectorSession session, | ||
| ConnectorTableLayoutHandle layout, | ||
| SplitSchedulingContext splitSchedulingContext) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Add missing import for SplitSchedulingContext.
The SplitSchedulingContext parameter is used in the method signature, but there's no import statement for this class.
🏁 Script executed:
#!/bin/bash
# Verify if SplitSchedulingContext is a top-level class that needs to be imported
rg -g "**/*.java" "class SplitSchedulingContext" --type javaLength of output: 181
Action Required: Add missing import for SplitSchedulingContext
Based on our search, the class SplitSchedulingContext is defined as an inner class in the file
presto-spi/src/main/java/com/facebook/presto/spi/connector/ConnectorSplitManager.java.
To properly reference this class in the ClpSplitManager.java file, please insert the following import:
import com.facebook.presto.spi.connector.ConnectorSplitManager.SplitSchedulingContext;This change will resolve the missing import issue noted in the method signature.
| @Override | ||
| public boolean equals(Object obj) | ||
| { | ||
| if (this == obj) { | ||
| return true; | ||
| } | ||
| if (obj == null || getClass() != obj.getClass()) { | ||
| return false; | ||
| } | ||
| ClpColumnHandle other = (ClpColumnHandle) obj; | ||
| return Objects.equals(this.columnName, other.columnName) && | ||
| Objects.equals(this.columnType, other.columnType); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Revisit equality checks to include all relevant fields.
Only columnName and columnType are considered in equality, but originalColumnName and nullable are omitted. This can lead to inconsistent behaviour.
return Objects.equals(this.columnName, other.columnName)
- && Objects.equals(this.columnType, other.columnType);
+ && Objects.equals(this.originalColumnName, other.originalColumnName)
+ && Objects.equals(this.columnType, other.columnType)
+ && this.nullable == other.nullable;📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| @Override | |
| public boolean equals(Object obj) | |
| { | |
| if (this == obj) { | |
| return true; | |
| } | |
| if (obj == null || getClass() != obj.getClass()) { | |
| return false; | |
| } | |
| ClpColumnHandle other = (ClpColumnHandle) obj; | |
| return Objects.equals(this.columnName, other.columnName) && | |
| Objects.equals(this.columnType, other.columnType); | |
| } | |
| @Override | |
| public boolean equals(Object obj) | |
| { | |
| if (this == obj) { | |
| return true; | |
| } | |
| if (obj == null || getClass() != obj.getClass()) { | |
| return false; | |
| } | |
| ClpColumnHandle other = (ClpColumnHandle) obj; | |
| return Objects.equals(this.columnName, other.columnName) | |
| && Objects.equals(this.originalColumnName, other.originalColumnName) | |
| && Objects.equals(this.columnType, other.columnType) | |
| && this.nullable == other.nullable; | |
| } |
| String tablePathQuery = String.format(QUERY_SELECT_TABLE_METADATA, config.getMetadataTablePrefix(), tableName); | ||
| String archivePathQuery = String.format(QUERY_SELECT_ARCHIVE_IDS, config.getMetadataTablePrefix(), tableName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Mitigate potential SQL injection
Concatenating tableName directly into the SQL string can pose an injection risk if tableName is not guaranteed to be sanitized. Consider validating or restricting tableName to mitigate this risk.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java (3)
178-179: Update Negation According to Coding Guidelines
The coding guidelines for Java code require usingfalse == conditioninstead of!condition. The relevant lines use!<expression>. Consider applying this change to maintain style consistency:- if (!expression.getRemainingExpression().isPresent() || !expression.getDefinition().isPresent()) { + if (false == expression.getRemainingExpression().isPresent() || false == expression.getDefinition().isPresent()) { - if (!expression.getRemainingExpression().isPresent() || !expression.getDefinition().isPresent()) { + if (false == expression.getRemainingExpression().isPresent() || false == expression.getDefinition().isPresent()) { ... // Apply similarly to all occurrences of !<expression> in lines shown.Also applies to: 244-245, 303-304, 362-363, 383-384, 450-451, 584-585, 658-659
592-628: Consolidate Comparison Logic
The logic inbuildClpExpressionfor building query strings is clear. However, you may benefit from extracting repeated string formatting into helper methods, which can improve readability and maintainability when new comparison types are introduced.
630-687: Efficient Substring Handling
ThetryInterpretSubstringEqualityandhandleLogicalBinarymethods handle substring equality thoroughly. For large queries, you might need to validate performance. An alternative approach could cache repeated substring computations or pre-check if start/length directly match short-circuit conditions.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpSplitProvider.java(1 hunks)presto-server/src/main/provisio/presto.xml(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
- presto-server/src/main/provisio/presto.xml
- presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpSplitProvider.java
- presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}`: - Prefer `false == ` rather than `!`.
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}: - Preferfalse == <expression>rather than!<expression>.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java
⏰ Context from checks skipped due to timeout of 90000ms (20)
- GitHub Check: product-tests-basic-environment (17.0.13)
- GitHub Check: product-tests-basic-environment (8)
- GitHub Check: kudu (17.0.13)
- GitHub Check: kudu (8)
- GitHub Check: spark-integration (8)
- GitHub Check: hive-tests (17.0.13)
- GitHub Check: test (8, :presto-tests -P ci-only-resource-manager)
- GitHub Check: test (17.0.13, :presto-tests -P presto-tests-execution-memory)
- GitHub Check: test (17.0.13, :presto-hive -P test-hive-materialized-aggregations)
- GitHub Check: test (17.0.13, :presto-cassandra -P test-cassandra-integration-smoke-test)
- GitHub Check: product-tests-specific-environment1 (17.0.13)
- GitHub Check: product-tests-specific-environment1 (8)
- GitHub Check: product-tests-specific-environment2 (17.0.13)
- GitHub Check: product-tests-specific-environment2 (8)
- GitHub Check: test-other-modules (8)
- GitHub Check: test-other-modules (17.0.13)
- GitHub Check: test (:presto-docs)
- GitHub Check: maven-checks (17.0.13)
- GitHub Check: maven-checks (8)
- GitHub Check: test (:presto-base-arrow-flight)
🔇 Additional comments (3)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java (3)
1-13: License Header Is Properly Included
No issues observed with the license header.
52-67: Well-Structured Class Documentation
The class-level JavaDoc is clear and thoroughly explains how expressions are converted to KQL.
262-286: Potential Unescaped String Injection
Literal strings (e.g. user inputs) are concatenated into KQL queries without apparent sanitization or escaping. Malicious characters might break queries or produce unintended behavior. Consider adding an escaping mechanism or verifying KQL handles all inputs securely.Please run a verification script or add automated tests to ensure that special or control characters inside these literal strings cannot cause KQL injection issues.
Also applies to: 368-414
Description
This PR introduces a CLP connector. The background and proposed implementation details are outlined in the associated RFC.
This PR implements one part of phase 1 of the proposed implementation, namely the Java implementation for the coordinator. The worker implementation will leverage Velox as the default query engine, so once the Velox PR is merged, we will submit another PR to this repo to add the necessary changes to
presto-native-execution.Like other connectors, we have created a
presto-clpmodule and implemented all required connector interfaces as well as a few extras to support query push downs.The important classes in the connector are described below.
Core Classes in Java
ClpConfigThe configuration class for CLP. Currently, we support the following properties:
clp.archive-source: Specifies the source of the CLP archive. Supported values includelocal(local storage) ands3(Amazon S3).clp.metadata-expire-interval: Defines the time interval after which metadata entries are considered expired and removed from the cache.clp.metadata-refresh-interval: Specifies how frequently metadata should be refreshed from the source to ensure up-to-date information.clp.polymorphic-type-enabled: Enables or disables support for polymorphic types within CLP. This determines whether dynamic type resolution is allowed.clp.metadata-provider-type: Defines the type of the metadata provider. It could be a database, a file-based store, or another external system. By default, we use MySQL.clp.metadata-db-url: The connection URL for the metadata database, used whenclp.metadata-provider-typeis configured to use a database.clp.metadata-db-name: The name of the metadata database.clp.metadata-db-user: The database user with access to the metadata database.clp.metadata-db-password: The password for the metadata database user.clp.metadata-table-prefix: A prefix applied to table names in the metadata database.clp.split-provider-type: Defines the type of split provider for query execution. By default, we use MySQL, and the connection parameters are the same as those for the metadata database.ClpSchemaTreeA helper class for constructing a nested schema representation from CLP’s column definitions. It supports hierarchical column names (e.g.,
a.b.c), handles name/type conflicts when theclp.polymorphic-type-enabledoption is enabled, and maps serialized CLP types to Presto types. The schema tree produces a flat list ofClpColumnHandleinstances, includingRowTypefor nested structures, making it suitable for dynamic or semi-structured data formats.When polymorphic types are enabled, conflicting fields are given unique names by appending a type-specific suffix to the column name. For instance, if an integer field named "a" and a
Varstring(CLP type) field named "a" coexist in CLP’s schema tree, they are represented asa_bigintanda_varcharin Presto. This approach ensures that such fields remain queryable while adhering to Presto’s constraints.ClpMetadataProviderAn interface responsible for retrieving metadata from a specified source.
We provide a default implementation called
ClpMySqlMetadataProvider, which uses two MySQL tables. One of these is the table metadata table, defined with the schema shown below. Currently, we support only a single Presto schema nameddefault, and this metadata table stores all table names and paths associated with that Presto schema.table_nameVARCHAR(512)PRIMARY KEYtable_pathVARCHAR(1024)NOT NULLThe second MySQL table contains column metadata, defined by the schema shown below. Each Presto table is associated with a corresponding MySQL table that stores metadata about its columns.
nameVARCHAR(512)NOT NULLtypeTINYINTNOT NULLname,type)ClpSplitProviderIn CLP, an archive is the fundamental unit for searching, and we treat each archive as a Presto Split. This allows independent parallel searches across archives. The
ClpSplitProviderinterface, shown below, defines how to retrieve split information from a specified source:We provide a default implementation called
ClpMySqlSplitProvider. It also uses the metadata table mentioned above as well as another set of tables to store archive IDs associated with each tableidBIGINTAUTO_INCREMENT PRIMARY KRYarchive_idVARCHAR(128)NOT NULLBy combining these two sets of tables, we can retrieve all archive paths for a table by concatenating the table’s path with the archive ID (e.g.,
table_path + "/" + archive_id).ClpMetadataThis interface enables Presto to access various metadata. All requests are delegated to
ClpMetadataProviderFor metadata management, it also maintains two caches and periodically refreshes the metadata.
columnHandleCache: ALoadingCache<SchemaTableName, Set<ClpColumnHandle>>that maps aSchemaTableNameto its corresponding set ofClpColumnHandleobjects.tableNameCache: ALoadingCache<String, Set<String>>that maps a schema name (String) to a set of table names (Set<String>).ClpConnectorPlanOptimizerandClpFilterToKqlConverterPresto exposes
PlanNodeto the connector, allowing the connector to push down relevant filters to CLP for improved query performance.There are three main steps
ClpTableLayoutHandleand constructing a newTableScanNodeFilterNodewith any remaining predicates and the newTableScanNodeClpFilterToKqlConverterimplementsRowExpressionVisitor<ClpExpression, Void>and handles expression transformation and pushdown. Since KQL is not SQL-compatible, only certain types of filters can be converted, including:LIKEpredicate (the"^%[^%_]*%$"pattern is not pushed down)AND,OR,NOT)IS NULLsubstr(a, start, length) = 'abc'andsubstr(a, start) = 'abc'formsFor comparison or match expressions, one side must contain a
VariableReferenceExpression, and the other must be aConstantExpression--specifically a string or numeric literal. Expressions likea > corsubstr(a, 2) = lower(c)are not eligible for pushdown. In simple cases without logical operators, the SQL plan can be directly translated into a KQL query. However, for expressions involving logical operators, it's critical to ensure that all conditions are handled correctly.A naive approach would be to convert only the pushdown-eligible parts of the SQL query into KQL, letting Presto or Prestissimo handle the rest. But this can lead to incorrect execution plans and unintended behavior.
Consider the following query:
Since CLP doesn’t currently support
regexp_like, if we simply push downb = 2to CLP, Presto will only receive results whereb = 2, effectively changing the query semantics toregexp_like(a, '\d+b') AND b = 2.To prevent such issues, the pushdown logic follows these rules:
ORpushdown: AnORcondition is pushable if and only if all child expressions can be pushed down. In this case, all child expressions are pushed down together with the OR operator.ANDpushdown: AnANDcondition is not pushable if and only if none of its child expressions are pushable. Otherwise, pushable expressions are pushed down with theANDoperator, while non-pushable expressions remain in the original plan.CallExpressionand can be transformed into KQL syntax.Example transformations:
SQL WHERE Clause:
a = 2 AND b = '3'a: 2 AND b: "3"FilterNodeis removed.SQL WHERE Clause:
a.b LIKE '%string another_string' OR "c.d" = TRUEa.b: "*string another_string" OR c.d: trueFilterNodeis removed.SQL WHERE Clause:
a > 2 OR regexp_like(b, '\d+b')"*"a > 2 OR regexp_like(b, '\d+b')remains in theFilterNode.SQL WHERE Clause:
a > 2 AND regexp_like(b, '\d+b')a > 2regexp_like(b, '\d+b')remains in theFilterNode.SQL WHERE Clause:
NOT substr(a, 2, 5) = 'Hello' and b IS NULLNOT a: "?Hello*" AND NOT b: *FilterNodeis removed.Motivation and Context
See the associated RFC.
Impact
This module is independent from other modules and will not affect any existing functionality.
Test Plan
Unit tests are included in this PR, and we have also done end-to-end tests.
Contributor checklist
Release Notes
Please follow release notes guidelines and fill in the release notes below.
Summary by CodeRabbit
Summary by CodeRabbit
New Features
clpconnector option.Documentation
Tests