-
Notifications
You must be signed in to change notification settings - Fork 3
feat: Add support for defining and using specific columns for metadata filtering. #19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add support for defining and using specific columns for metadata filtering. #19
Conversation
WalkthroughThis change introduces support for metadata filter pushdown in the CLP Presto connector. It adds a new configuration property for specifying a JSON metadata filter config, extends the filter conversion and planning logic to generate and validate metadata SQL filters, and updates related classes, tests, and documentation to support and verify this feature. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Presto
participant ClpPlanOptimizer
participant ClpFilterToKqlConverter
participant ClpMetadataFilterProvider
participant ClpTableLayoutHandle
participant ClpMySqlSplitProvider
User->>Presto: Submit query with filters
Presto->>ClpPlanOptimizer: Plan query
ClpPlanOptimizer->>ClpFilterToKqlConverter: Convert RowExpression to KQL and metadata SQL
ClpPlanOptimizer->>ClpMetadataFilterProvider: Validate required filters in metadata SQL
ClpPlanOptimizer->>ClpMetadataFilterProvider: Remap metadata SQL (range mapping)
ClpPlanOptimizer->>ClpTableLayoutHandle: Create with KQL and metadata SQL
Presto->>ClpMySqlSplitProvider: Request splits (with layout handle)
ClpMySqlSplitProvider->>ClpTableLayoutHandle: Retrieve metadata SQL
ClpMySqlSplitProvider->>ClpMySqlSplitProvider: Append metadata SQL to archive query
ClpMySqlSplitProvider->>Presto: Return filtered splits
Possibly related PRs
📜 Recent review detailsConfiguration used: .coderabbit.yaml 📒 Files selected for processing (2)
🧰 Additional context used🧠 Learnings (3)📓 Common learningspresto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizer.java (1)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadataFilterProvider.java (4)🧬 Code Graph Analysis (2)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizer.java (1)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadataFilterProvider.java (2)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
🔇 Additional comments (3)
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 15
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (20)
presto-clp/pom.xml(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java(2 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnectorFactory.java(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpErrorCode.java(2 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpExpression.java(4 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java(8 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadata.java(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadataFilterProvider.java(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizer.java(3 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizerProvider.java(2 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableLayoutHandle.java(4 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpUtils.java(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpSchemaTree.java(2 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java(3 hunks)presto-clp/src/test/java/com/facebook/presto/plugin/clp/ClpMetadataDbSetUp.java(4 hunks)presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java(2 hunks)presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpMetadataFilterConfig.java(1 hunks)presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpQueryBase.java(1 hunks)presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java(4 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (20)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-tpch-distributed-queries)
- GitHub Check: test (17.0.13, :presto-tests -P presto-tests-general)
- GitHub Check: test (17.0.13, :presto-main-base)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-aggregation-queries)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-local-queries)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-distributed-queries)
- GitHub Check: test (8.0.442, :presto-main)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-distributed-non-hash-gen)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-aggregation-queries)
- GitHub Check: test (8.0.442, :presto-main-base)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-plan-determinism)
- GitHub Check: test (17.0.13, :presto-tests -P presto-tests-execution-memory)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-resource-manager)
- GitHub Check: test (8.0.442, :presto-tests -P presto-tests-execution-memory)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-local-queries)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-distributed-non-hash-gen)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-distributed-queries)
- GitHub Check: test (8.0.442, :presto-tests -P presto-tests-general)
- GitHub Check: maven-checks (8.0.442)
- GitHub Check: maven-checks (17.0.13)
🔇 Additional comments (25)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java (1)
40-40: LGTM! Proper dependency injection setup.The singleton binding for
ClpMetadataFilterProviderfollows the established pattern and is appropriately placed among other core component bindings.presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpSchemaTree.java (2)
18-18: Good addition for timestamp support.The
TimestampTypeimport is properly added to support the enhanced type mapping.
125-126: Excellent semantic improvement for DateString mapping.Changing
DateStringfromVARCHARtoTimestampType.TIMESTAMPis semantically correct and aligns with the metadata filtering functionality that requires proper timestamp handling.presto-clp/pom.xml (1)
68-76: Appropriate dependencies for JSON processing.The addition of
jackson-coreandjackson-databindcomplements the existingjackson-annotationsdependency and properly supports the JSON configuration processing needs of the new metadata filtering functionality.presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnectorFactory.java (2)
36-36: Good practice: Extract magic string to named constant.Creating the
CONNECTOR_NAMEconstant improves maintainability and provides consistent access to the connector identifier throughout the codebase.
41-41: Consistent usage of the extracted constant.Using
CONNECTOR_NAMEinstead of the hardcoded string follows good practices and ensures consistency.presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadata.java (1)
122-122: Proper adaptation to enhanced constructor signature.The addition of
Optional.empty()as the third parameter aligns with the enhancedClpTableLayoutHandleconstructor that now supports metadata SQL. UsingOptional.empty()maintains existing behaviour while enabling the new functionality in other parts of the system.presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpErrorCode.java (1)
21-21: LGTM! Well-designed error codes for metadata filtering.The new error codes are appropriately named, use the correct USER_ERROR type for configuration-related errors, and follow consistent numbering. The required USER_ERROR import has been added correctly.
Also applies to: 32-33
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java (3)
44-44: LGTM! Improved SQL template for dynamic condition appending.The "WHERE 1 = 1" base condition is a good practice that allows easy appending of additional conditions without complex conditional logic.
92-94: LGTM! Clear variable naming and appropriate debug logging.The variable renaming to
filteredSplitsimproves code clarity, and debug-level logging for split count is appropriate for troubleshooting.
72-77: LGTM! Proper metadata SQL integration with appropriate logging.The implementation correctly:
- Checks for optional metadata SQL using
isPresent()- Safely appends the condition with AND clause and proper parentheses grouping
- Adds info-level logging for query debugging
Verify that the metadata SQL from
ClpMetadataFilterProvideris properly sanitized to prevent SQL injection attacks:#!/bin/bash # Description: Check if ClpMetadataFilterProvider implements SQL injection protection # Expected: Find SQL sanitization or parameterized query usage # Search for ClpMetadataFilterProvider implementation ast-grep --pattern 'class ClpMetadataFilterProvider { $$$ }' # Look for SQL sanitization methods rg -A 10 -B 5 "sanitize|escape|quote|parameterized|prepared" --type javapresto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpQueryBase.java (1)
78-78: LGTM! Constructor simplification aligns with refactoring.The removal of the
StorageTypeparameter from theClpTableHandleconstructor simplifies the test setup while maintaining the same test functionality.presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizerProvider.java (1)
31-31: LGTM! Proper dependency injection implementation.The
ClpMetadataFilterProviderdependency is correctly:
- Declared as a private final field
- Injected via constructor with @Inject annotation
- Stored in the constructor
- Passed to the
ClpPlanOptimizerconstructorThis follows standard dependency injection patterns and integrates well with the metadata filtering feature.
Also applies to: 34-34, 38-38, 50-50
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableLayoutHandle.java (1)
30-85: LGTM! Clean implementation of metadata SQL support.The addition of the
metadataSqlfield is well-implemented with proper immutability, JSON serialization, and consistent updates to equals/hashCode/toString methods.presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpExpression.java (1)
35-89: Well-structured implementation of metadata SQL support.The constructor updates maintain backward compatibility while adding support for metadata SQL. The new two-parameter constructor provides a clean API for cases where both pushdown expression and metadata SQL are needed.
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpMetadataFilterConfig.java (2)
78-118: Comprehensive test coverage for metadata filter functionality.The tests effectively validate hierarchical filter resolution and SQL remapping with various operators including range queries and equality checks.
120-126: Make tearDown more defensive.The null check should come before accessing tempFile.exists().
@AfterMethod public void tearDown() { - if (null != tempFile && tempFile.exists()) { + if (tempFile != null && tempFile.exists()) { tempFile.delete(); } }Likely an incorrect or invalid review comment.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizer.java (1)
81-106: Well-integrated metadata filter support in plan optimization.The implementation correctly:
- Constructs scope for hierarchical filter resolution
- Validates mandatory filters before proceeding
- Remaps metadata SQL according to configuration
- Provides appropriate logging for debugging
The comment on lines 92-93 correctly explains why validation must happen before the KQL check.
presto-clp/src/test/java/com/facebook/presto/plugin/clp/ClpMetadataDbSetUp.java (1)
231-261: Good implementation of the ArchiveTableRow classThe
ArchiveTableRowclass is well-structured with appropriate getters for all fields. The constructor properly initializes all final fields.presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java (2)
79-138: Comprehensive test coverage for timestamp filteringThe test method thoroughly covers various timestamp filtering scenarios including:
- No filters
- Overlapping ranges
- Non-overlapping ranges
- Exact matches
- Edge cases
This provides excellent coverage for the metadata filtering functionality.
140-167: Well-structured helper method for result comparisonThe
compareListSplitsResultmethod effectively validates split filtering results by:
- Creating the appropriate layout handle with metadata SQL
- Comparing the count of splits
- Verifying the actual split paths match expected paths
The use of immutable sets for comparison ensures order-independent validation.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java (2)
238-243: Ensure consistent NULL handling in metadata SQL propagationThe method correctly propagates metadata SQL through NOT operations. The null-safe handling with
Optionalis appropriate.
624-667: Good implementation of metadata SQL propagation in AND operationsThe implementation correctly handles metadata SQL propagation for AND operations:
- Generates metadata SQL only when at least one child has it
- Properly concatenates multiple metadata SQL expressions
- Maintains the structure parallel to KQL generation
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java (2)
194-204: Good refactoring with the tryPushDown helper methodThe helper method encapsulates the conversion logic nicely and makes tests more readable. The explicit specification of metadata filter columns (
ImmutableSet.of("fare")) is clear.
227-240: Well-structured helper for metadata SQL validationThe
testFilterWithMetadataSqlhelper method properly validates both the KQL expression and the optional metadata SQL, with appropriate null handling.
| private String metadataDbName; | ||
| private String metadataDbUser; | ||
| private String metadataDbPassword; | ||
| private String metadataFilterConfig; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
LGTM! Consistent configuration property implementation.
The new metadataFilterConfig property follows the established pattern with proper getter/setter methods and @config annotation.
Consider adding validation for the metadata filter config path, similar to how metadataTablePrefix is validated:
@Config("clp.metadata-filter-config")
public ClpConfig setMetadataFilterConfig(String metadataFilterConfig)
{
+ if (metadataFilterConfig != null && !metadataFilterConfig.trim().isEmpty()) {
+ // Validate file path format or existence if needed
+ }
this.metadataFilterConfig = metadataFilterConfig;
return this;
}Also applies to: 111-121
🤖 Prompt for AI Agents
In presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java around
lines 32 and 111-121, the metadataFilterConfig property lacks validation similar
to metadataTablePrefix. Add validation logic in the setter method for
metadataFilterConfig to ensure the config path meets expected criteria, throwing
an appropriate exception if invalid. This will maintain consistency and prevent
misconfiguration.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpExpression.java
Outdated
Show resolved
Hide resolved
| Pattern stringFilterPattern = Pattern.compile("([a-zA-Z_][a-zA-Z0-9_]*):\\s*\"([^\"]+)\""); | ||
| Matcher stringFilterMatcher = stringFilterPattern.matcher(sqlQuery); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Optimize regex pattern compilation for better performance.
Compiling regex patterns on every method call is inefficient. Consider making them static constants.
public static class KqlUtils
{
private static final Logger log = Logger.get(KqlUtils.class);
+ private static final Pattern STRING_FILTER_PATTERN = Pattern.compile("([a-zA-Z_][a-zA-Z0-9_]*):\\s*\"([^\"]+)\"");
+ private static final Pattern NUMERIC_FILTER_PATTERN = Pattern.compile("([a-zA-Z_][a-zA-Z0-9_]*)\\s*([><]=?|:)\\s*(-?\\d+(?:\\.\\d+)?)");
public static String toSql(String kqlQuery)
{
String sqlQuery = kqlQuery;
// Translate string filters
- Pattern stringFilterPattern = Pattern.compile("([a-zA-Z_][a-zA-Z0-9_]*):\\s*\"([^\"]+)\"");
- Matcher stringFilterMatcher = stringFilterPattern.matcher(sqlQuery);
+ Matcher stringFilterMatcher = STRING_FILTER_PATTERN.matcher(sqlQuery);And similarly for the numeric filter pattern.
Also applies to: 65-66
🤖 Prompt for AI Agents
In presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpUtils.java around
lines 32-33 and 65-66, the regex patterns are compiled inside methods causing
repeated compilation and performance overhead. To fix this, declare these
Pattern objects as static final constants at the class level so they are
compiled once and reused across method calls, improving efficiency.
| public static String toSql(String kqlQuery) | ||
| { | ||
| String sqlQuery = kqlQuery; | ||
|
|
||
| // Translate string filters | ||
| Pattern stringFilterPattern = Pattern.compile("([a-zA-Z_][a-zA-Z0-9_]*):\\s*\"([^\"]+)\""); | ||
| Matcher stringFilterMatcher = stringFilterPattern.matcher(sqlQuery); | ||
| StringBuilder translateStringFiltersSb = new StringBuilder(); | ||
|
|
||
| int lastStringFilterEnd = 0; | ||
| while (stringFilterMatcher.find()) { | ||
| String columnName = stringFilterMatcher.group(1); | ||
| String value = stringFilterMatcher.group(2); | ||
|
|
||
| // Append everything between last match and this one | ||
| translateStringFiltersSb.append(sqlQuery, lastStringFilterEnd, stringFilterMatcher.start()); | ||
|
|
||
| if (value.contains("*")) { | ||
| translateStringFiltersSb.append("\"") | ||
| .append(columnName) | ||
| .append("\" LIKE '") | ||
| .append(value.replace('*', '%')) | ||
| .append("'"); | ||
| } | ||
| else { | ||
| translateStringFiltersSb.append("\"") | ||
| .append(columnName) | ||
| .append("\" = '") | ||
| .append(value) | ||
| .append("'"); | ||
| } | ||
|
|
||
| lastStringFilterEnd = stringFilterMatcher.end(); | ||
| } | ||
| translateStringFiltersSb.append(sqlQuery.substring(lastStringFilterEnd)); | ||
| sqlQuery = translateStringFiltersSb.toString(); | ||
|
|
||
| // Translate numeric filters | ||
| Pattern numericFilterPattern = Pattern.compile("([a-zA-Z_][a-zA-Z0-9_]*)\\s*([><]=?|:)\\s*(-?\\d+(?:\\.\\d+)?)"); | ||
| Matcher numericFilterMatcher = numericFilterPattern.matcher(sqlQuery); | ||
| StringBuilder translateNumericFilterSb = new StringBuilder(); | ||
|
|
||
| int lastNumericFilterEnd = 0; | ||
| while (numericFilterMatcher.find()) { | ||
| String columnName = numericFilterMatcher.group(1); | ||
| String operator = numericFilterMatcher.group(2); | ||
| if (":".equals(operator)) { | ||
| operator = "="; | ||
| } | ||
| String value = numericFilterMatcher.group(3); | ||
|
|
||
| // Append everything between last match and this one | ||
| translateNumericFilterSb.append(sqlQuery, lastNumericFilterEnd, numericFilterMatcher.start()); | ||
|
|
||
| translateNumericFilterSb.append("\"") | ||
| .append(columnName) | ||
| .append("\" ") | ||
| .append(operator) | ||
| .append(" ") | ||
| .append(value); | ||
|
|
||
| lastNumericFilterEnd = numericFilterMatcher.end(); | ||
| } | ||
| translateNumericFilterSb.append(sqlQuery.substring(lastNumericFilterEnd)); | ||
| sqlQuery = translateNumericFilterSb.toString(); | ||
|
|
||
| return sqlQuery; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Consider refactoring for improved maintainability.
The toSql method is quite long and handles multiple responsibilities. Consider extracting the string and numeric filter translation logic into separate methods for better readability and testability.
Would you like me to help refactor this into smaller, more focused methods with proper error handling for malformed input?
🤖 Prompt for AI Agents
In presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpUtils.java between
lines 27 and 94, the toSql method is lengthy and mixes string and numeric filter
translation logic. Refactor by extracting the string filter translation and
numeric filter translation into two separate private methods. Each method should
take the current query string and return the transformed string. Then, update
toSql to call these methods sequentially. This will improve readability,
maintainability, and allow easier unit testing of each translation step.
presto-clp/src/test/java/com/facebook/presto/plugin/clp/ClpMetadataDbSetUp.java
Outdated
Show resolved
Hide resolved
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadataFilterProvider.java
Outdated
Show resolved
Hide resolved
| @Inject | ||
| public ClpMetadataFilterProvider(ClpConfig config) | ||
| { | ||
| requireNonNull(config, "config is null"); | ||
|
|
||
| ObjectMapper mapper = new ObjectMapper(); | ||
| try { | ||
| filterMap = mapper.readValue( | ||
| new File(config.getMetadataFilterConfig()), | ||
| new TypeReference<Map<String, TableConfig>>() {}); | ||
| } | ||
| catch (IOException e) { | ||
| throw new PrestoException(CLP_METADATA_FILTER_CONFIG_NOT_FOUND, "Failed to metadata filter config file open."); | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Improve exception handling and add null check
The constructor should validate that the metadata filter config path is not null before attempting to read the file. Also, the exception message has a grammatical error.
@Inject
public ClpMetadataFilterProvider(ClpConfig config)
{
requireNonNull(config, "config is null");
+ requireNonNull(config.getMetadataFilterConfig(), "metadata filter config path is null");
ObjectMapper mapper = new ObjectMapper();
try {
filterMap = mapper.readValue(
new File(config.getMetadataFilterConfig()),
new TypeReference<Map<String, TableConfig>>() {});
}
catch (IOException e) {
- throw new PrestoException(CLP_METADATA_FILTER_CONFIG_NOT_FOUND, "Failed to metadata filter config file open.");
+ throw new PrestoException(CLP_METADATA_FILTER_CONFIG_NOT_FOUND, "Failed to open metadata filter config file: " + e.getMessage(), e);
}
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| @Inject | |
| public ClpMetadataFilterProvider(ClpConfig config) | |
| { | |
| requireNonNull(config, "config is null"); | |
| ObjectMapper mapper = new ObjectMapper(); | |
| try { | |
| filterMap = mapper.readValue( | |
| new File(config.getMetadataFilterConfig()), | |
| new TypeReference<Map<String, TableConfig>>() {}); | |
| } | |
| catch (IOException e) { | |
| throw new PrestoException(CLP_METADATA_FILTER_CONFIG_NOT_FOUND, "Failed to metadata filter config file open."); | |
| } | |
| } | |
| @Inject | |
| public ClpMetadataFilterProvider(ClpConfig config) | |
| { | |
| requireNonNull(config, "config is null"); | |
| requireNonNull(config.getMetadataFilterConfig(), "metadata filter config path is null"); | |
| ObjectMapper mapper = new ObjectMapper(); | |
| try { | |
| filterMap = mapper.readValue( | |
| new File(config.getMetadataFilterConfig()), | |
| new TypeReference<Map<String, TableConfig>>() {}); | |
| } | |
| catch (IOException e) { | |
| throw new PrestoException(CLP_METADATA_FILTER_CONFIG_NOT_FOUND, | |
| "Failed to open metadata filter config file: " + e.getMessage(), e); | |
| } | |
| } |
🤖 Prompt for AI Agents
In
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadataFilterProvider.java
around lines 46 to 60, add a null check to ensure the metadata filter config
path from config.getMetadataFilterConfig() is not null before reading the file.
Also, improve the exception message in the catch block to correct the grammar
and make it clearer, for example by changing it to "Failed to open metadata
filter config file."
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadataFilterProvider.java
Outdated
Show resolved
Hide resolved
| private static class TableConfig | ||
| { | ||
| public List<Filter> filters; | ||
| } | ||
|
|
||
| private static class RangeMapping | ||
| { | ||
| @JsonProperty("lowerBound") | ||
| public String lowerBound; | ||
|
|
||
| @JsonProperty("upperBound") | ||
| public String upperBound; | ||
|
|
||
| @Override | ||
| public boolean equals(Object o) | ||
| { | ||
| if (this == o) { | ||
| return true; | ||
| } | ||
| if (!(o instanceof RangeMapping)) { | ||
| return false; | ||
| } | ||
| RangeMapping that = (RangeMapping) o; | ||
| return Objects.equals(lowerBound, that.lowerBound) && | ||
| Objects.equals(upperBound, that.upperBound); | ||
| } | ||
|
|
||
| @Override | ||
| public int hashCode() | ||
| { | ||
| return Objects.hash(lowerBound, upperBound); | ||
| } | ||
| } | ||
|
|
||
| private static class Filter | ||
| { | ||
| @JsonProperty("filterName") | ||
| public String filterName; | ||
|
|
||
| @JsonProperty("rangeMapping") | ||
| public RangeMapping rangeMapping; | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add proper access modifiers and validation to nested classes
The nested classes use package-private fields which could lead to unintended modifications. Consider making them private with proper accessors or using immutable patterns.
private static class TableConfig
{
- public List<Filter> filters;
+ @JsonProperty("filters")
+ private List<Filter> filters;
+
+ public List<Filter> getFilters()
+ {
+ return filters != null ? ImmutableList.copyOf(filters) : ImmutableList.of();
+ }
}
private static class RangeMapping
{
@JsonProperty("lowerBound")
- public String lowerBound;
+ private String lowerBound;
@JsonProperty("upperBound")
- public String upperBound;
+ private String upperBound;
+
+ public String getLowerBound()
+ {
+ return lowerBound;
+ }
+
+ public String getUpperBound()
+ {
+ return upperBound;
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| private static class TableConfig | |
| { | |
| public List<Filter> filters; | |
| } | |
| private static class RangeMapping | |
| { | |
| @JsonProperty("lowerBound") | |
| public String lowerBound; | |
| @JsonProperty("upperBound") | |
| public String upperBound; | |
| @Override | |
| public boolean equals(Object o) | |
| { | |
| if (this == o) { | |
| return true; | |
| } | |
| if (!(o instanceof RangeMapping)) { | |
| return false; | |
| } | |
| RangeMapping that = (RangeMapping) o; | |
| return Objects.equals(lowerBound, that.lowerBound) && | |
| Objects.equals(upperBound, that.upperBound); | |
| } | |
| @Override | |
| public int hashCode() | |
| { | |
| return Objects.hash(lowerBound, upperBound); | |
| } | |
| } | |
| private static class Filter | |
| { | |
| @JsonProperty("filterName") | |
| public String filterName; | |
| @JsonProperty("rangeMapping") | |
| public RangeMapping rangeMapping; | |
| } | |
| } | |
| private static class TableConfig | |
| { | |
| @JsonProperty("filters") | |
| private List<Filter> filters; | |
| public List<Filter> getFilters() | |
| { | |
| return filters != null | |
| ? ImmutableList.copyOf(filters) | |
| : ImmutableList.of(); | |
| } | |
| } | |
| private static class RangeMapping | |
| { | |
| @JsonProperty("lowerBound") | |
| private String lowerBound; | |
| @JsonProperty("upperBound") | |
| private String upperBound; | |
| @Override | |
| public boolean equals(Object o) | |
| { | |
| if (this == o) { | |
| return true; | |
| } | |
| if (!(o instanceof RangeMapping)) { | |
| return false; | |
| } | |
| RangeMapping that = (RangeMapping) o; | |
| return Objects.equals(lowerBound, that.lowerBound) && | |
| Objects.equals(upperBound, that.upperBound); | |
| } | |
| @Override | |
| public int hashCode() | |
| { | |
| return Objects.hash(lowerBound, upperBound); | |
| } | |
| public String getLowerBound() | |
| { | |
| return lowerBound; | |
| } | |
| public String getUpperBound() | |
| { | |
| return upperBound; | |
| } | |
| } | |
| private static class Filter | |
| { | |
| @JsonProperty("filterName") | |
| public String filterName; | |
| @JsonProperty("rangeMapping") | |
| public RangeMapping rangeMapping; | |
| } | |
| } |
🤖 Prompt for AI Agents
In
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadataFilterProvider.java
between lines 150 and 192, the nested classes TableConfig, RangeMapping, and
Filter have package-private fields that risk unintended external modification.
To fix this, make the fields private and add appropriate getter methods to
provide controlled access. Additionally, consider making the classes immutable
by removing setters or making fields final where possible to enforce
immutability and encapsulation.
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java
Show resolved
Hide resolved
wraymo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed most of the files except ClpMetadataFilterProvider and the unit test. The main confusing part is introducing a new tryPushDown method in TestClpFilterToKql
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpExpression.java
Outdated
Show resolved
Hide resolved
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java
Outdated
Show resolved
Hide resolved
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java
Outdated
Show resolved
Hide resolved
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java
Outdated
Show resolved
Hide resolved
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java
Outdated
Show resolved
Hide resolved
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java
Outdated
Show resolved
Hide resolved
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizer.java
Outdated
Show resolved
Hide resolved
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java
Outdated
Show resolved
Hide resolved
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java
Show resolved
Hide resolved
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java
Outdated
Show resolved
Hide resolved
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
♻️ Duplicate comments (9)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpExpression.java (1)
30-31: Fix typo in comment.- // Optinal SQL string extracted from the definition, which is only made of given metadata columns. + // Optional SQL string extracted from the definition, which is only made of given metadata columns.presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java (1)
150-153: Fix indentation in constructor call.Apply this diff to fix the indentation:
ClpTableLayoutHandle layoutHandle = new ClpTableLayoutHandle( - new ClpTableHandle(new SchemaTableName(DEFAULT_SCHEMA_NAME, tableName), tablePath), + new ClpTableHandle(new SchemaTableName(DEFAULT_SCHEMA_NAME, tableName), tablePath), Optional.empty(), metadataSql);presto-clp/src/test/java/com/facebook/presto/plugin/clp/ClpMetadataDbSetUp.java (1)
221-229: DbHandle should have an accessor for dbPath.The
DbHandleclass has a privatedbPathfield but no getter method. Since the field is used in other methods (e.g., line 71, 125), consider adding a getter method for better encapsulation.static final class DbHandle { private final String dbPath; DbHandle(String dbPath) { this.dbPath = dbPath; } + + String getDbPath() + { + return dbPath; + } }Then update usages:
- final String metadataDbUrl = format(METADATA_DB_URL_TEMPLATE, dbHandle.dbPath); + final String metadataDbUrl = format(METADATA_DB_URL_TEMPLATE, dbHandle.getDbPath());presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadataFilterProvider.java (4)
73-87: Fix exception handling and add null check.The constructor should validate that the metadata filter config path is not null before attempting to read the file. Also, the exception message has a grammatical error.
@Inject public ClpMetadataFilterProvider(ClpConfig config) { requireNonNull(config, "config is null"); + requireNonNull(config.getMetadataFilterConfig(), "metadata filter config path is null"); ObjectMapper mapper = new ObjectMapper(); try { filterMap = mapper.readValue( new File(config.getMetadataFilterConfig()), new TypeReference<Map<String, TableConfig>>() {}); } catch (IOException e) { - throw new PrestoException(CLP_METADATA_FILTER_CONFIG_NOT_FOUND, "Failed to metadata filter config file open."); + throw new PrestoException(CLP_METADATA_FILTER_CONFIG_NOT_FOUND, "Failed to open metadata filter config file: " + e.getMessage(), e); } }
110-140: Security concern: SQL injection risk in remapFilterSql.The
remapFilterSqlmethod uses regex-based string replacement to modify SQL expressions, which is inherently risky and could lead to SQL injection vulnerabilities. The regex patterns don't properly validate the SQL structure and could match unintended parts of the query.Consider using a proper SQL parser to safely manipulate SQL expressions instead of regex replacements. If regex must be used, ensure proper escaping and validation:
- remappedSql = remappedSql.replaceAll( - format("\"(%s)\"\\s(>=?)\\s([0-9]*)", entry.getKey()), - format("%s $2 $3", entry.getValue().upperBound)); + // Escape special regex characters in column names + String escapedColumnName = Pattern.quote(entry.getKey()); + // Use word boundaries to ensure exact column matches + remappedSql = remappedSql.replaceAll( + "\\b\"" + escapedColumnName + "\"\\s+(>=?)\\s+([0-9]+)\\b", + entry.getValue().upperBound + " $1 $2");
161-166: Fix typo in method name.The method name has a typo: "Filer" should be "Filter".
- private Set<String> getAllFilerNamesFromTableConfig(TableConfig tableConfig) + private Set<String> getAllFilterNamesFromTableConfig(TableConfig tableConfig)Also update all references to this method.
179-221: Add proper access modifiers and validation to nested classes.The nested classes use package-private fields which could lead to unintended modifications. Consider making them private with proper accessors or using immutable patterns.
private static class TableConfig { - public List<Filter> filters; + @JsonProperty("filters") + private List<Filter> filters; + + public List<Filter> getFilters() + { + return filters != null ? ImmutableList.copyOf(filters) : ImmutableList.of(); + } } private static class RangeMapping { @JsonProperty("lowerBound") - public String lowerBound; + private String lowerBound; @JsonProperty("upperBound") - public String upperBound; + private String upperBound; + + public String getLowerBound() + { + return lowerBound; + } + + public String getUpperBound() + { + return upperBound; + }presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java (1)
415-449: Consider extracting metadata SQL generation logic.The pattern of checking if a variable is in
metadataFilterColumnsand generating metadata SQL is repeated throughout the class. Consider extracting this into a helper method to reduce duplication.private Optional<String> generateMetadataSqlIfApplicable(String variableName, String sqlExpression) { if (metadataFilterColumns.contains(variableName)) { return Optional.of(sqlExpression); } return Optional.empty(); }Then use it like:
- String metadataSql = null; String kql = String.format( "\"%s\" >= %s AND \"%s\" <= %s", variableReferenceDefinition.get(), lowerBoundConstantDefinition.get(), variableReferenceDefinition.get(), upperBoundConstantDefinition.get()); - if (metadataFilterColumns.contains(variableReferenceDefinition.get())) { - metadataSql = kql; - } - return new ClpExpression(kql, metadataSql); + Optional<String> metadataSql = generateMetadataSqlIfApplicable(variableReferenceDefinition.get(), kql); + return new ClpExpression(kql, metadataSql.orElse(null));presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java (1)
281-309: Comprehensive metadata SQL generation testing.The new test method provides good coverage of metadata SQL generation scenarios:
- AND operations that can generate metadata SQL
- OR operations that cannot generate metadata SQL (correctly returns null)
- Mixed logical operations
However, consider adding the missing
@Testannotation as noted in past review comments.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (11)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpExpression.java(4 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java(9 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadataFilterProvider.java(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizer.java(3 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpUtils.java(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java(3 hunks)presto-clp/src/test/java/com/facebook/presto/plugin/clp/ClpMetadataDbSetUp.java(4 hunks)presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java(1 hunks)presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpMetadataFilterConfig.java(1 hunks)presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java(4 hunks)presto-clp/src/test/resources/test-metadata-filter.json(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (20)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-distributed-queries)
- GitHub Check: test (8.0.442, :presto-main-base)
- GitHub Check: test (17.0.13, :presto-main-base)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-plan-determinism)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-aggregation-queries)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-tpch-distributed-queries)
- GitHub Check: test (17.0.13, :presto-tests -P presto-tests-execution-memory)
- GitHub Check: test (17.0.13, :presto-main)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-tpch-distributed-queries)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-resource-manager)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-local-queries)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-aggregation-queries)
- GitHub Check: test (17.0.13, :presto-tests -P presto-tests-general)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-resource-manager)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-distributed-non-hash-gen)
- GitHub Check: test (8.0.442, :presto-main)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-distributed-non-hash-gen)
- GitHub Check: test (8.0.442, :presto-tests -P presto-tests-execution-memory)
- GitHub Check: maven-checks (17.0.13)
- GitHub Check: maven-checks (8.0.442)
🔇 Additional comments (20)
presto-clp/src/test/resources/test-metadata-filter.json (1)
1-30: Configuration structure looks well-designed for hierarchical filtering.The JSON configuration properly defines a three-tier hierarchy (catalog → schema → table) for metadata filters. The range mapping configuration for timestamp handling is a thoughtful approach to translate point-in-time queries to archive-level timestamp ranges.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java (3)
44-44: Good use of "WHERE 1 = 1" pattern for dynamic query building.The SQL template modification to include "WHERE 1 = 1" is a standard approach that simplifies appending additional conditions with "AND" clauses.
76-76: Appropriate debug logging level.The debug logging aligns with the previous review feedback and provides useful troubleshooting information without being too verbose in production.
72-76: Verify SQL injection protection for metadata filters.The metadata SQL is appended directly to the query without parameterization. Ensure that the metadata SQL generation upstream properly sanitizes inputs to prevent SQL injection vulnerabilities.
#!/bin/bash # Search for where metadataSql is constructed to verify input sanitization ast-grep --pattern 'metadataSql = $_' rg -A 5 -B 5 "metadataSql.*=" --type javapresto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpUtils.java (1)
24-32: Escaping logic is correct and properly ordered.The escape sequence ordering is appropriate - backslashes are escaped first to prevent double-escaping of subsequent characters. The method covers the essential KQL special characters that need escaping in string literals.
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpMetadataFilterConfig.java (3)
37-43: Good resource loading pattern with proper error handling.The resource loading logic correctly handles the case where the resource file is not found and provides a clear error message.
45-57: Comprehensive test coverage for hierarchical filter aggregation.The test properly validates that filters are correctly aggregated across the hierarchy (catalog → schema → table levels), ensuring that child levels inherit parent filters while adding their own.
59-85: Thorough validation of SQL remapping logic.The test cases cover all essential comparison operators and validate the correct transformation from point-in-time queries to range-based queries. The equality case properly handles the overlap condition with both begin and end timestamp bounds.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpExpression.java (3)
35-40: Constructor implementation is correct.The primary constructor properly initializes all fields using Optional.ofNullable, maintaining consistency with the existing pattern for handling nullable values.
45-69: Well-designed constructor overloading for backward compatibility.The constructor overloads properly delegate to the primary constructor and maintain backward compatibility while enabling the new metadata SQL functionality. The progression from no parameters to full parameters is logical.
86-90: Consistent getter implementation.The getMetadataSql() method follows the same pattern as other getters in the class, returning an Optional to handle the nullable field appropriately.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizer.java (1)
45-51: LGTM!The dependency injection of
ClpMetadataFilterProvideris properly implemented with null checks.presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java (1)
61-67: Well-documented timestamp values!The synthetic timestamp generation is clearly explained with comments, making the test logic easy to understand.
presto-clp/src/test/java/com/facebook/presto/plugin/clp/ClpMetadataDbSetUp.java (1)
51-57: Consistent naming convention applied.The constant names now consistently use the
ARCHIVESprefix, improving code clarity.presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java (6)
17-18: Import additions look appropriate for enhanced testing capabilities.The new imports support the enhanced testing functionality:
SemanticExceptionfor exception testing,ImmutableSetfor metadata filter columns, and additional assertion methods.Also applies to: 22-22, 26-27
40-56: Consistent refactoring to usetryPushDownhelper method.Good refactoring that provides a consistent pattern for converting SQL expressions to
ClpExpressionobjects. The empty set parameterImmutableSet.of()is appropriate for tests that don't require metadata filtering.
100-117: Enhanced BETWEEN test coverage with proper exception handling.Excellent addition of semantic exception testing for invalid BETWEEN expressions. The test properly verifies that incompatible types (string range on numeric field) throw a
SemanticException.
311-320: Well-structured helper method for consistent test setup.The
tryPushDownmethod provides a clean abstraction for creatingClpExpressionobjects with configurable metadata filter columns. This promotes consistency across tests and makes the intent clearer.
322-351: Improved test method signature with better parameter handling.The updated
testFiltermethod signature acceptingClpExpressiondirectly is more appropriate than the previous string-based approach. The addition of theremainingToRowExpressionparameter provides flexibility for different assertion strategies.
353-366: Focused helper method for metadata SQL validation.The
testFilterWithMetadataSqlmethod provides a clean way to test both KQL generation and metadata SQL extraction. The null-safe assertion logic properly handles cases where metadata SQL is not expected.
| ImmutableList<ClpSplit> filteredSplits = splits.build(); | ||
| log.debug("Number of splits: %s", filteredSplits.size()); | ||
| return filteredSplits; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Consider using more descriptive variable names.
While functionally correct, consider renaming filteredSplits to simply splits since no actual filtering occurs at this level - the filtering happens at the SQL query level.
- ImmutableList<ClpSplit> filteredSplits = splits.build();
- log.debug("Number of splits: %s", filteredSplits.size());
- return filteredSplits;
+ ImmutableList<ClpSplit> result = splits.build();
+ log.debug("Number of splits: %s", result.size());
+ return result;📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| ImmutableList<ClpSplit> filteredSplits = splits.build(); | |
| log.debug("Number of splits: %s", filteredSplits.size()); | |
| return filteredSplits; | |
| ImmutableList<ClpSplit> result = splits.build(); | |
| log.debug("Number of splits: %s", result.size()); | |
| return result; |
🤖 Prompt for AI Agents
In
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java
around lines 92 to 94, rename the variable filteredSplits to splits to better
reflect that no filtering is done at this stage and the filtering happens at the
SQL query level. Update all references accordingly to maintain clarity and
consistency.
| public static String escapeKqlSpecialCharsForStringValue(String literalString) | ||
| { | ||
| String escaped = literalString; | ||
| escaped = escaped.replace("\\", "\\\\"); | ||
| escaped = escaped.replace("\"", "\\\""); | ||
| escaped = escaped.replace("?", "\\?"); | ||
| escaped = escaped.replace("*", "\\*"); | ||
| return escaped; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Consider adding input validation.
The method doesn't handle null input which could cause a NullPointerException. Consider adding a null check or documenting the precondition.
public static String escapeKqlSpecialCharsForStringValue(String literalString)
{
+ if (literalString == null) {
+ return null;
+ }
String escaped = literalString;
escaped = escaped.replace("\\", "\\\\");
escaped = escaped.replace("\"", "\\\"");
escaped = escaped.replace("?", "\\?");
escaped = escaped.replace("*", "\\*");
return escaped;
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| public static String escapeKqlSpecialCharsForStringValue(String literalString) | |
| { | |
| String escaped = literalString; | |
| escaped = escaped.replace("\\", "\\\\"); | |
| escaped = escaped.replace("\"", "\\\""); | |
| escaped = escaped.replace("?", "\\?"); | |
| escaped = escaped.replace("*", "\\*"); | |
| return escaped; | |
| } | |
| public static String escapeKqlSpecialCharsForStringValue(String literalString) | |
| { | |
| if (literalString == null) { | |
| return null; | |
| } | |
| String escaped = literalString; | |
| escaped = escaped.replace("\\", "\\\\"); | |
| escaped = escaped.replace("\"", "\\\""); | |
| escaped = escaped.replace("?", "\\?"); | |
| escaped = escaped.replace("*", "\\*"); | |
| return escaped; | |
| } |
🤖 Prompt for AI Agents
In presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpUtils.java around
lines 24 to 32, the method escapeKqlSpecialCharsForStringValue does not handle
null input, which can cause a NullPointerException. Add a null check at the
start of the method to either return null or throw an IllegalArgumentException
with a clear message, or document that the input must not be null as a
precondition.
| String metadataFilterSql5 = "(\"msg.timestamp\" = 1234)"; | ||
| String remappedSql5 = filterProvider.remapFilterSql("clp.default.table_1", metadataFilterSql5); | ||
| assertEquals(remappedSql5, "((begin_timestamp <= 1234 AND end_timestamp >= 1234))"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Consider adding edge case tests.
While the current test coverage is good, consider adding tests for:
- Invalid SQL expressions
- Filters without range mapping
- Empty or malformed configuration files
🤖 Prompt for AI Agents
In
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpMetadataFilterConfig.java
around lines 82 to 84, add new test cases to cover edge scenarios such as
invalid SQL expressions, filters that do not require range mapping, and cases
with empty or malformed configuration files. Create separate test methods or
extend existing ones to include these inputs and verify that the remapFilterSql
method handles them gracefully without errors or unexpected behavior.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizer.java
Outdated
Show resolved
Hide resolved
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadataFilterProvider.java
Outdated
Show resolved
Hide resolved
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java
Outdated
Show resolved
Hide resolved
51b16e6 to
ee181b8
Compare
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java
Outdated
Show resolved
Hide resolved
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java
Outdated
Show resolved
Hide resolved
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java
Outdated
Show resolved
Hide resolved
presto-clp/src/test/java/com/facebook/presto/plugin/clp/ClpMetadataDbSetUp.java
Outdated
Show resolved
Hide resolved
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java
Show resolved
Hide resolved
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadataFilterProvider.java
Outdated
Show resolved
Hide resolved
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadataFilterProvider.java
Outdated
Show resolved
Hide resolved
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadataFilterProvider.java
Show resolved
Hide resolved
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpExpression.java
Outdated
Show resolved
Hide resolved
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java
Outdated
Show resolved
Hide resolved
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java
Outdated
Show resolved
Hide resolved
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java
Outdated
Show resolved
Hide resolved
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java
Outdated
Show resolved
Hide resolved
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadataFilterProvider.java
Outdated
Show resolved
Hide resolved
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadataFilterProvider.java
Outdated
Show resolved
Hide resolved
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadataFilterProvider.java
Outdated
Show resolved
Hide resolved
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadataFilterProvider.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (5)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpUtils.java (1)
39-47: Add null input validation.The method doesn't handle null input which could cause a NullPointerException. This is consistent with past review feedback requesting null checks.
public static String escapeKqlSpecialCharsForStringValue(String literalString) { + if (literalString == null) { + return null; + } String escaped = literalString; escaped = escaped.replace("\\", "\\\\"); escaped = escaped.replace("\"", "\\\""); escaped = escaped.replace("?", "\\?"); escaped = escaped.replace("*", "\\*"); return escaped; }presto-clp/src/test/resources/test-metadata-filter.json (1)
31-31: Add missing newline at end of file.Consistent with past review feedback, the file should end with a newline.
} +presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java (1)
61-67: Improve timestamp documentation with meaningful values.The timestamp generation logic is well-commented now, but the base timestamp values are still arbitrary. Consider using more readable values as suggested in past reviews.
- // We generate synthetic begin_timestamp and end_timestamp values for each split - // by offsetting two base timestamps (1700000000000L and 1705000000000L) with a - // fixed increment per split (10^10 * j). + // Generate synthetic timestamps with a clear pattern: + // Begin: 1700000000000L (2023-11-15 06:13:20 UTC) + 10 billion * j + // End: 1705000000000L (2024-01-12 07:06:40 UTC) + 10 billion * j values.add(new ArchivesTableRow( "id_" + j, 1700000000000L + 10000000000L * j, 1705000000000L + 10000000000L * j));presto-clp/src/test/java/com/facebook/presto/plugin/clp/ClpMetadataDbSetUp.java (1)
221-229: Consider adding a getter method for dbPath.The DbHandle class has a private dbPath field that's accessed directly in other methods. Adding a getter would improve encapsulation, as suggested in past reviews.
static final class DbHandle { private final String dbPath; DbHandle(String dbPath) { this.dbPath = dbPath; } + + String getDbPath() + { + return dbPath; + } }presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadataFilterProvider.java (1)
182-182: Fix typo in method nameThe method name has a typo: "Filer" should be "Filter".
- private Set<String> getAllFilerNamesFromTableConfig(TableConfig tableConfig) + private Set<String> getAllFilterNamesFromTableConfig(TableConfig tableConfig)Also update the references to this method on lines 170, 173, and 177.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (8)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java(9 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadataFilterProvider.java(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpUtils.java(1 hunks)presto-clp/src/test/java/com/facebook/presto/plugin/clp/ClpMetadataDbSetUp.java(4 hunks)presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java(1 hunks)presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java(4 hunks)presto-clp/src/test/resources/test-metadata-filter.json(1 hunks)presto-docs/src/main/sphinx/connector/clp.rst(3 hunks)
🧰 Additional context used
🧠 Learnings (7)
📓 Common learnings
Learnt from: wraymo
PR: y-scope/presto#15
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java:22-33
Timestamp: 2025-06-13T12:56:06.325Z
Learning: `ClpMetadataProvider` is instantiated only once and used solely by the Presto coordinator, so concurrency/thread-safety guarantees are unnecessary.
presto-docs/src/main/sphinx/connector/clp.rst (1)
Learnt from: wraymo
PR: y-scope/presto#15
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java:22-33
Timestamp: 2025-06-13T12:56:06.325Z
Learning: `ClpMetadataProvider` is instantiated only once and used solely by the Presto coordinator, so concurrency/thread-safety guarantees are unnecessary.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpUtils.java (1)
Learnt from: anlowee
PR: y-scope/presto#0
File: :0-0
Timestamp: 2025-06-19T08:41:46.288Z
Learning: In the y-scope/presto repository, prefer static imports for commonly used methods like String.format(). Instead of writing `String.format("%s", value)`, use static import `import static java.lang.String.format;` and call `format("%s", value)` directly.
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java (1)
Learnt from: wraymo
PR: y-scope/presto#15
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java:22-33
Timestamp: 2025-06-13T12:56:06.325Z
Learning: `ClpMetadataProvider` is instantiated only once and used solely by the Presto coordinator, so concurrency/thread-safety guarantees are unnecessary.
presto-clp/src/test/java/com/facebook/presto/plugin/clp/ClpMetadataDbSetUp.java (1)
Learnt from: wraymo
PR: y-scope/presto#15
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java:22-33
Timestamp: 2025-06-13T12:56:06.325Z
Learning: `ClpMetadataProvider` is instantiated only once and used solely by the Presto coordinator, so concurrency/thread-safety guarantees are unnecessary.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java (2)
Learnt from: anlowee
PR: y-scope/presto#0
File: :0-0
Timestamp: 2025-06-19T08:41:46.288Z
Learning: In the y-scope/presto repository, prefer using Guava's Immutable collections over regular collections when possible.
Learnt from: anlowee
PR: y-scope/presto#7
File: .github/workflows/hive-tests.yml:33-35
Timestamp: 2025-06-04T14:50:43.554Z
Learning: In the y-scope/presto repository, GitHub workflows are extended from OSS Presto and should maintain consistent formatting with the upstream patterns, including YAML indentation styles for workflow steps.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadataFilterProvider.java (3)
Learnt from: wraymo
PR: y-scope/presto#15
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java:22-33
Timestamp: 2025-06-13T12:56:06.325Z
Learning: `ClpMetadataProvider` is instantiated only once and used solely by the Presto coordinator, so concurrency/thread-safety guarantees are unnecessary.
Learnt from: anlowee
PR: y-scope/presto#0
File: :0-0
Timestamp: 2025-06-19T08:41:46.288Z
Learning: In the y-scope/presto repository, prefer static imports for commonly used methods like String.format(). Instead of writing `String.format("%s", value)`, use static import `import static java.lang.String.format;` and call `format("%s", value)` directly.
Learnt from: anlowee
PR: y-scope/presto#0
File: :0-0
Timestamp: 2025-06-19T08:41:46.288Z
Learning: In the y-scope/presto repository, prefer using Guava's Immutable collections over regular collections when possible.
🧬 Code Graph Analysis (3)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpUtils.java (2)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpSchemaTree.java (1)
ClpSchemaTree(44-211)presto-common/src/main/java/com/facebook/presto/common/type/DoubleType.java (1)
DoubleType(31-173)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java (1)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpUtils.java (1)
ClpUtils(23-60)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java (3)
presto-analyzer/src/main/java/com/facebook/presto/sql/analyzer/SemanticException.java (1)
SemanticException(24-81)presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java (1)
Test(40-170)presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpMetadataFilterConfig.java (1)
Test(29-86)
🪛 GitHub Actions: maven checks
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java
[error] 155-156: Checkstyle indentation errors: '||' has incorrect indentation level 12, expected level should be 16.
⏰ Context from checks skipped due to timeout of 90000ms (15)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-distributed-queries)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-aggregation-queries)
- GitHub Check: test (17.0.13, :presto-main)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-plan-determinism)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-local-queries)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-distributed-non-hash-gen)
- GitHub Check: test (17.0.13, :presto-tests -P presto-tests-execution-memory)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-local-queries)
- GitHub Check: test (8.0.442, :presto-tests -P presto-tests-general)
- GitHub Check: test (8.0.442, :presto-main)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-tpch-distributed-queries)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-aggregation-queries)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-tpch-distributed-queries)
- GitHub Check: test (8.0.442, :presto-tests -P presto-tests-execution-memory)
- GitHub Check: test (:presto-docs)
🔇 Additional comments (16)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpUtils.java (1)
56-59: LGTM - type checking logic is correct.The method correctly identifies the numeric types supported by CLP as documented in the Javadoc reference to
ClpSchemaTree.presto-clp/src/test/resources/test-metadata-filter.json (1)
1-30: Well-structured metadata filter configuration.The JSON configuration correctly demonstrates the hierarchical filter scopes (catalog, schema, table) and includes a good example of range mapping for timestamp filtering.
presto-docs/src/main/sphinx/connector/clp.rst (2)
32-32: Good addition of the new configuration property.The metadata filter configuration property is properly documented in the configuration table.
Also applies to: 72-72
99-185: Excellent comprehensive documentation.The metadata filter configuration documentation is thorough and well-structured. It clearly explains:
- Hierarchical scopes (catalog, schema, table)
- Filter definitions and range mappings
- Practical examples with detailed explanations
- How metadata filtering enables split pruning
This addresses previous feedback about adding examples and provides users with clear guidance.
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java (2)
82-141: Comprehensive test coverage for timestamp filtering scenarios.The test method now covers a wide range of timestamp filtering scenarios including edge cases, boundary conditions, and non-overlapping ranges. This provides thorough validation of the metadata filtering logic.
143-169: Well-structured helper method for result comparison.The helper method appropriately encapsulates the test logic for comparing expected vs actual split results, making the tests more maintainable and readable.
presto-clp/src/test/java/com/facebook/presto/plugin/clp/ClpMetadataDbSetUp.java (3)
51-51: Good consistency improvement.Renaming to
ARCHIVES_STORAGE_DIRECTORY_BASEmaintains consistency with the plural naming pattern used elsewhere.
123-123: Well-implemented timestamp metadata support.The changes properly extend the archive table schema to include
begin_timestampandend_timestampcolumns, updating both the table creation and data insertion logic to handle the newArchivesTableRowstructure.Also applies to: 140-147, 151-162
231-258: Well-designed ArchivesTableRow class.The new class properly encapsulates split metadata with immutable fields and appropriate getter methods. The design supports the timestamp-based filtering requirements effectively.
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java (7)
17-18: Import additions support new functionality well.The new imports for
SemanticException,ImmutableSet,Set,assertNull, andassertThrowsare appropriate for the metadata SQL testing functionality being added.Also applies to: 22-22, 26-27
35-56: Method refactoring improves maintainability.The consistent update of all test methods to use the new
tryPushDownhelper method with metadata filter columns parameter is a good refactoring that centralizes the conversion logic and makes the tests more maintainable.
95-111: Comprehensive BETWEEN operator testing.The new test cases for the BETWEEN operator cover both normal cases and edge cases (NULL values), and properly test semantic exceptions for invalid operations using
assertThrows. This provides excellent coverage for the new BETWEEN support.
275-309: Thorough metadata SQL generation testing.The new test method comprehensively tests metadata SQL generation across various scenarios including AND/OR combinations, BETWEEN operations, and cases where metadata SQL should not be generated. The test cases properly verify both KQL and metadata SQL outputs.
311-321: Well-implemented helper method.The
tryPushDownmethod effectively centralizes the conversion logic by creating aClpFilterToKqlConverterwith the appropriate metadata filter columns and returning aClpExpression. This reduces code duplication and improves test maintainability.
323-352: Flexible test method design.The modified
testFiltermethod now accepts aClpExpressiondirectly and includes aremainingToRowExpressionparameter to handle both RowExpression and string comparisons. This flexible design accommodates different test scenarios while maintaining clear assertions.
354-367: Effective metadata SQL testing helper.The
testFilterWithMetadataSqlmethod provides a clean way to test both KQL expression generation and metadata SQL generation in a single call, with proper null handling for cases where metadata SQL should not be generated.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpUtils.java
Outdated
Show resolved
Hide resolved
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java
Outdated
Show resolved
Hide resolved
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpUtils.java
Outdated
Show resolved
Hide resolved
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java
Outdated
Show resolved
Hide resolved
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java
Outdated
Show resolved
Hide resolved
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java
Outdated
Show resolved
Hide resolved
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java
Outdated
Show resolved
Hide resolved
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java
Outdated
Show resolved
Hide resolved
|
Please also run mvn to fix format issues. |
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java
Outdated
Show resolved
Hide resolved
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java
Show resolved
Hide resolved
6ef78f9 to
616e889
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (7)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpUtils.java (2)
30-30: Remove unused logger declaration.The logger is declared but never used in the implementation.
- private static final Logger log = Logger.get(ClpUtils.class);
44-52: Add null check for input validation.The method doesn't handle null input which could cause a NullPointerException.
public static String escapeKqlSpecialCharsForStringValue(String literalString) { + if (literalString == null) { + return null; + } String escaped = literalString; escaped = escaped.replace("\\", "\\\\"); escaped = escaped.replace("\"", "\\\""); escaped = escaped.replace("?", "\\?"); escaped = escaped.replace("*", "\\*"); return escaped; }presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizer.java (1)
92-97: Optimize mandatory filter validation logic.The
checkContainsAllFiltersmethod is called even whenmetadataSqlis empty, which could throw an exception unnecessarily.- // Check metadata filters before the KQL query - metadataFilterProvider.checkContainsAllFilters(clpTableHandle.getSchemaTableName(), metadataSql.orElse("")); if (metadataSql.isPresent()) { + metadataFilterProvider.checkContainsAllFilters(clpTableHandle.getSchemaTableName(), metadataSql.get()); metadataSql = Optional.of(metadataFilterProvider.remapFilterSql(scope, metadataSql.get())); log.debug("Metadata filter SQL query: %s", metadataSql.get()); }presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadataFilterProvider.java (4)
75-89: Improve exception handling and add null check.The constructor should validate that the metadata filter config path is not null before attempting to read the file. Also, the exception message has a grammatical error.
@Inject public ClpMetadataFilterProvider(ClpConfig config) { requireNonNull(config, "config is null"); + requireNonNull(config.getMetadataFilterConfig(), "metadata filter config path is null"); ObjectMapper mapper = new ObjectMapper(); try { filterMap = mapper.readValue( new File(config.getMetadataFilterConfig()), new TypeReference<Map<String, TableConfig>>() {}); } catch (IOException e) { - throw new PrestoException(CLP_METADATA_FILTER_CONFIG_NOT_FOUND, "Failed to metadata filter config file open."); + throw new PrestoException(CLP_METADATA_FILTER_CONFIG_NOT_FOUND, "Failed to open metadata filter config file: " + e.getMessage(), e); } }
91-106: Improve filter validation logic.The current implementation uses string
containsto check for filter columns, which is fragile and could match substrings incorrectly. For example, if checking for column "id", it would incorrectly match "user_id" or "identifier".Consider parsing the metadata SQL properly or using more precise matching:
- for (String columnName : getFilterNames(format("%s.%s", CONNECTOR_NAME, schemaTableName))) { - if (!metadataFilterSql.contains(columnName)) { + Set<String> requiredFilters = getFilterNames(format("%s.%s", CONNECTOR_NAME, schemaTableName)); + for (String columnName : requiredFilters) { + // Use word boundaries to ensure exact column name match + String pattern = "\\b\"?" + Pattern.quote(columnName) + "\"?\\b"; + if (!Pattern.compile(pattern).matcher(metadataFilterSql).find()) { hasAllMetadataFilterColumns = false; notFoundFilterColumnNameListBuilder.add(columnName); } }
133-162: Security concern: SQL injection risk in remapFilterSql.The
remapFilterSqlmethod uses regex-based string replacement to modify SQL expressions, which is inherently risky and could lead to SQL injection vulnerabilities. The regex patterns don't properly validate the SQL structure and could match unintended parts of the query.Consider using a proper SQL parser to safely manipulate SQL expressions instead of regex replacements. If regex must be used, ensure proper escaping and validation:
for (Map.Entry<String, RangeMapping> entry : mappings.entrySet()) { - String key = entry.getKey(); + String key = Pattern.quote(entry.getKey()); RangeMapping value = entry.getValue(); remappedSql = remappedSql.replaceAll( - format("\"(%s)\"\\s(>=?)\\s([0-9]*)", key), + format("\\b\"%s\"\\s+(>=?)\\s+([0-9]+)\\b", key), format("%s $1 $2", value.upperBound));
201-243: Add proper access modifiers and validation to nested classes.The nested classes use package-private fields which could lead to unintended modifications. Consider making them private with proper accessors or using immutable patterns.
private static class TableConfig { - public List<Filter> filters; + @JsonProperty("filters") + private List<Filter> filters; + + public List<Filter> getFilters() + { + return filters != null ? ImmutableList.copyOf(filters) : ImmutableList.of(); + } } private static class RangeMapping { @JsonProperty("lowerBound") - public String lowerBound; + private String lowerBound; @JsonProperty("upperBound") - public String upperBound; + private String upperBound; + + public String getLowerBound() + { + return lowerBound; + } + + public String getUpperBound() + { + return upperBound; + }
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (6)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpExpression.java(4 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java(9 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadataFilterProvider.java(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizer.java(3 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpUtils.java(1 hunks)presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java(2 hunks)
🧰 Additional context used
🧠 Learnings (7)
📓 Common learnings
Learnt from: wraymo
PR: y-scope/presto#15
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java:22-33
Timestamp: 2025-06-13T12:56:06.325Z
Learning: `ClpMetadataProvider` is instantiated only once and used solely by the Presto coordinator, so concurrency/thread-safety guarantees are unnecessary.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizer.java (1)
Learnt from: wraymo
PR: y-scope/presto#15
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java:22-33
Timestamp: 2025-06-13T12:56:06.325Z
Learning: `ClpMetadataProvider` is instantiated only once and used solely by the Presto coordinator, so concurrency/thread-safety guarantees are unnecessary.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpUtils.java (1)
Learnt from: anlowee
PR: y-scope/presto#0
File: :0-0
Timestamp: 2025-06-19T08:41:46.288Z
Learning: In the y-scope/presto repository, prefer static imports for commonly used methods like String.format(). Instead of writing `String.format("%s", value)`, use static import `import static java.lang.String.format;` and call `format("%s", value)` directly.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpExpression.java (1)
Learnt from: wraymo
PR: y-scope/presto#15
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java:22-33
Timestamp: 2025-06-13T12:56:06.325Z
Learning: `ClpMetadataProvider` is instantiated only once and used solely by the Presto coordinator, so concurrency/thread-safety guarantees are unnecessary.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java (2)
Learnt from: anlowee
PR: y-scope/presto#0
File: :0-0
Timestamp: 2025-06-19T08:41:46.288Z
Learning: In the y-scope/presto repository, prefer using Guava's Immutable collections over regular collections when possible.
Learnt from: anlowee
PR: y-scope/presto#7
File: .github/workflows/hive-tests.yml:33-35
Timestamp: 2025-06-04T14:50:43.554Z
Learning: In the y-scope/presto repository, GitHub workflows are extended from OSS Presto and should maintain consistent formatting with the upstream patterns, including YAML indentation styles for workflow steps.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadataFilterProvider.java (4)
Learnt from: wraymo
PR: y-scope/presto#15
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java:22-33
Timestamp: 2025-06-13T12:56:06.325Z
Learning: `ClpMetadataProvider` is instantiated only once and used solely by the Presto coordinator, so concurrency/thread-safety guarantees are unnecessary.
Learnt from: anlowee
PR: y-scope/presto#0
File: :0-0
Timestamp: 2025-06-19T08:41:46.288Z
Learning: In the y-scope/presto repository, prefer static imports for commonly used methods like String.format(). Instead of writing `String.format("%s", value)`, use static import `import static java.lang.String.format;` and call `format("%s", value)` directly.
Learnt from: anlowee
PR: y-scope/presto#0
File: :0-0
Timestamp: 2025-06-19T08:41:46.288Z
Learning: In the y-scope/presto repository, prefer using Guava's Immutable collections over regular collections when possible.
Learnt from: anlowee
PR: y-scope/presto#7
File: .github/workflows/hive-tests.yml:33-35
Timestamp: 2025-06-04T14:50:43.554Z
Learning: In the y-scope/presto repository, GitHub workflows are extended from OSS Presto and should maintain consistent formatting with the upstream patterns, including YAML indentation styles for workflow steps.
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java (1)
Learnt from: anlowee
PR: y-scope/presto#0
File: :0-0
Timestamp: 2025-06-19T08:41:46.288Z
Learning: In the y-scope/presto repository, prefer using Guava's Immutable collections over regular collections when possible.
🧬 Code Graph Analysis (1)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java (1)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpUtils.java (1)
ClpUtils(28-76)
⏰ Context from checks skipped due to timeout of 90000ms (20)
- GitHub Check: test (17.0.13, :presto-main-base)
- GitHub Check: test (17.0.13, :presto-tests -P presto-tests-general)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-tpch-distributed-queries)
- GitHub Check: test (17.0.13, :presto-main)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-distributed-non-hash-gen)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-plan-determinism)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-resource-manager)
- GitHub Check: test (8.0.442, :presto-tests -P presto-tests-general)
- GitHub Check: test (8.0.442, :presto-main-base)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-plan-determinism)
- GitHub Check: test (17.0.13, :presto-tests -P presto-tests-execution-memory)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-distributed-non-hash-gen)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-distributed-queries)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-resource-manager)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-local-queries)
- GitHub Check: test (8.0.442, :presto-tests -P presto-tests-execution-memory)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-tpch-distributed-queries)
- GitHub Check: maven-checks (17.0.13)
- GitHub Check: maven-checks (8.0.442)
- GitHub Check: test (:presto-docs)
🔇 Additional comments (12)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpUtils.java (1)
61-75: LGTM!The numeric type checking logic correctly handles both CLP-specific types (BIGINT, DOUBLE) and broader numeric types based on the flag.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpExpression.java (2)
30-40: LGTM!The addition of the
metadataSqlfield and updated constructor properly extends the class to support metadata SQL alongside KQL expressions.
60-69: LGTM!The new constructor for creating ClpExpression with both pushDownExpression and metadataSql is well-designed and follows the existing pattern.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java (3)
131-184: LGTM!The
handleBetweenmethod correctly:
- Validates argument count and types
- Ensures all arguments are numeric (with appropriate handling for CLP vs general numeric types)
- Generates both KQL and metadata SQL expressions
269-277: LGTM!The NOT handler correctly propagates metadata SQL by prefixing it with "NOT" when present.
655-700: LGTM!The logical operator handlers correctly propagate metadata SQL:
- AND includes metadata SQL when at least one child has it (appropriate for conjunction)
- OR includes metadata SQL only when all children have it (appropriate for disjunction)
Also applies to: 714-740
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java (6)
17-17: Import changes look appropriate.The new imports (
ImmutableSet,Set,Pattern) are necessary for the enhanced functionality and align with the learning preference for Guava's Immutable collections.Also applies to: 21-22
33-55: Method naming is consistent and clear.The standardized naming with "PushDown" suffix improves clarity and consistency across all test methods.
241-284: Comprehensive metadata SQL generation testing.The new test method covers various scenarios including normal cases, BETWEEN operations, and complex expressions. The test logic correctly validates both the presence/absence of metadata SQL and its expected content.
286-309: Well-designed helper method overloads.The multiple
testPushDownoverloads provide clean interfaces for different testing scenarios while maintaining backward compatibility. The design allows for flexible assertion of remaining expressions and metadata SQL validation.
311-321: Enhanced tryPushDown method supports new functionality.The addition of the
metadataFilterColumnsparameter enables testing of the new metadata filter capability while maintaining a clean interface.
323-352: Flexible assertion logic handles different test scenarios.The
remainingToRowExpressionparameter allows the method to handle both RowExpression comparison and string comparison, which is necessary for cases where expressions cannot be converted back to RowExpression (like the BETWEEN test with pattern matching).
| public void testMetadataSqlGeneration() | ||
| { | ||
| SessionHolder sessionHolder = new SessionHolder(); | ||
| Set<String> testMetadataFilterColumns = ImmutableSet.of("fare"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Extract repeated constant to improve maintainability.
The ImmutableSet.of("fare") is repeated multiple times throughout the test method. Consider extracting it to a class-level constant for better maintainability.
+ private static final Set<String> TEST_METADATA_FILTER_COLUMNS = ImmutableSet.of("fare");
+
@Test
public void testMetadataSqlGeneration()
{
SessionHolder sessionHolder = new SessionHolder();
- Set<String> testMetadataFilterColumns = ImmutableSet.of("fare");
// Normal case
testPushDown(
sessionHolder,
"(fare > 0 AND city.Name like 'b%')",
"(fare > 0 AND city.Name: \"b*\")",
"(\"fare\" > 0)",
- testMetadataFilterColumns);
+ TEST_METADATA_FILTER_COLUMNS);Also applies to: 253-253, 261-261, 269-269, 275-275, 283-283
🤖 Prompt for AI Agents
In
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java
around lines 245, 253, 261, 269, 275, and 283, the ImmutableSet.of("fare") is
repeatedly used. Extract this repeated ImmutableSet.of("fare") into a private
static final class-level constant and replace all occurrences with this constant
to improve maintainability and reduce duplication.
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java
Outdated
Show resolved
Hide resolved
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java
Outdated
Show resolved
Hide resolved
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpExpression.java
Outdated
Show resolved
Hide resolved
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpUtils.java
Outdated
Show resolved
Hide resolved
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java
Outdated
Show resolved
Hide resolved
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizer.java
Outdated
Show resolved
Hide resolved
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java
Outdated
Show resolved
Hide resolved
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java
Outdated
Show resolved
Hide resolved
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java
Show resolved
Hide resolved
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java
Outdated
Show resolved
Hide resolved
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java
Outdated
Show resolved
Hide resolved
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java
Outdated
Show resolved
Hide resolved
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java
Outdated
Show resolved
Hide resolved
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (4)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpUtils.java (2)
30-30: Remove unused logger declaration.The logger is declared but never used in the current implementation.
44-52: Consider adding input validation.The method doesn't handle null input which could cause a NullPointerException. Consider adding a null check or documenting the precondition.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizer.java (1)
92-97: Optimize mandatory filter validation logic.The
checkContainsAllFiltersmethod is called even whenmetadataSqlis empty, which could throw an exception unnecessarily. Consider checking only when metadata SQL is present.presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java (1)
226-226: Consider extracting metadata filter columns to a constant.The
ImmutableSet.of("fare")is used multiple times in this test method. Consider extracting it to improve maintainability:+ private static final Set<String> FARE_METADATA_FILTER_COLUMNS = ImmutableSet.of("fare"); + @Test public void testMetadataSqlGeneration() { SessionHolder sessionHolder = new SessionHolder(); - Set<String> testMetadataFilterColumns = ImmutableSet.of("fare"); // Normal case testPushDown( sessionHolder, "(fare > 0 AND city.Name like 'b%')", "(fare > 0 AND city.Name: \"b*\")", "(\"fare\" > 0)", - testMetadataFilterColumns); + FARE_METADATA_FILTER_COLUMNS);
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (7)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpExpression.java(4 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java(9 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizer.java(3 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpUtils.java(1 hunks)presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java(2 hunks)presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java(4 hunks)presto-docs/src/main/sphinx/connector/clp.rst(3 hunks)
🧰 Additional context used
🧠 Learnings (8)
📓 Common learnings
Learnt from: wraymo
PR: y-scope/presto#15
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java:22-33
Timestamp: 2025-06-13T12:56:06.325Z
Learning: `ClpMetadataProvider` is instantiated only once and used solely by the Presto coordinator, so concurrency/thread-safety guarantees are unnecessary.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpUtils.java (1)
Learnt from: anlowee
PR: y-scope/presto#0
File: :0-0
Timestamp: 2025-06-19T08:41:46.288Z
Learning: In the y-scope/presto repository, prefer static imports for commonly used methods like String.format(). Instead of writing `String.format("%s", value)`, use static import `import static java.lang.String.format;` and call `format("%s", value)` directly.
presto-docs/src/main/sphinx/connector/clp.rst (1)
Learnt from: wraymo
PR: y-scope/presto#15
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java:22-33
Timestamp: 2025-06-13T12:56:06.325Z
Learning: `ClpMetadataProvider` is instantiated only once and used solely by the Presto coordinator, so concurrency/thread-safety guarantees are unnecessary.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpExpression.java (1)
Learnt from: wraymo
PR: y-scope/presto#15
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java:22-33
Timestamp: 2025-06-13T12:56:06.325Z
Learning: `ClpMetadataProvider` is instantiated only once and used solely by the Presto coordinator, so concurrency/thread-safety guarantees are unnecessary.
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java (1)
Learnt from: wraymo
PR: y-scope/presto#15
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java:22-33
Timestamp: 2025-06-13T12:56:06.325Z
Learning: `ClpMetadataProvider` is instantiated only once and used solely by the Presto coordinator, so concurrency/thread-safety guarantees are unnecessary.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizer.java (1)
Learnt from: wraymo
PR: y-scope/presto#15
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java:22-33
Timestamp: 2025-06-13T12:56:06.325Z
Learning: `ClpMetadataProvider` is instantiated only once and used solely by the Presto coordinator, so concurrency/thread-safety guarantees are unnecessary.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java (2)
Learnt from: anlowee
PR: y-scope/presto#0
File: :0-0
Timestamp: 2025-06-19T08:41:46.288Z
Learning: In the y-scope/presto repository, prefer using Guava's Immutable collections over regular collections when possible.
Learnt from: anlowee
PR: y-scope/presto#7
File: .github/workflows/hive-tests.yml:33-35
Timestamp: 2025-06-04T14:50:43.554Z
Learning: In the y-scope/presto repository, GitHub workflows are extended from OSS Presto and should maintain consistent formatting with the upstream patterns, including YAML indentation styles for workflow steps.
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java (1)
Learnt from: anlowee
PR: y-scope/presto#0
File: :0-0
Timestamp: 2025-06-19T08:41:46.288Z
Learning: In the y-scope/presto repository, prefer using Guava's Immutable collections over regular collections when possible.
🧬 Code Graph Analysis (1)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java (1)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpUtils.java (1)
ClpUtils(28-71)
🪛 GitHub Actions: maven checks
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java
[error] 109-109: Checkstyle violation: WhitespaceAfter: ',' is not followed by whitespace.
⏰ Context from checks skipped due to timeout of 90000ms (20)
- GitHub Check: test (17.0.13, :presto-main-base)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-resource-manager)
- GitHub Check: test (17.0.13, :presto-main)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-local-queries)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-plan-determinism)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-tpch-distributed-queries)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-aggregation-queries)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-distributed-non-hash-gen)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-distributed-queries)
- GitHub Check: test (8.0.442, :presto-tests -P presto-tests-general)
- GitHub Check: test (17.0.13, :presto-tests -P presto-tests-execution-memory)
- GitHub Check: test (8.0.442, :presto-main)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-plan-determinism)
- GitHub Check: test (17.0.13, :presto-tests -P presto-tests-general)
- GitHub Check: test (8.0.442, :presto-main-base)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-distributed-queries)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-aggregation-queries)
- GitHub Check: test (8.0.442, :presto-tests -P presto-tests-execution-memory)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-distributed-non-hash-gen)
- GitHub Check: test (:presto-docs)
🔇 Additional comments (13)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpUtils.java (1)
61-70: LGTM!The numeric type checking is comprehensive and correctly covers all numeric types supported by the CLP connector.
presto-docs/src/main/sphinx/connector/clp.rst (1)
99-189: Well-documented metadata filter configuration.The documentation clearly explains the hierarchical metadata filtering feature with a comprehensive example. The note about all filters being required and pushdown-eligible is particularly important for users to understand.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpExpression.java (1)
30-89: Clean integration of metadata SQL support.The new
metadataSqlfield is properly integrated with appropriate constructors and getter method. The backward compatibility is maintained through constructor overloading.presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java (1)
61-107: Comprehensive timestamp-based metadata filtering tests.The test enhancements effectively validate split filtering with various timestamp conditions. The synthetic timestamp generation is well-documented and the test cases cover important boundary conditions including edge cases.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java (7)
90-101: Constructor changes look good!The new
metadataFilterColumnsparameter is properly validated and stored.
123-125: BETWEEN operator handling properly integrated.The check and delegation to
handleBetweenis correctly placed.
131-177: Well-implemented BETWEEN handler!The method properly validates all inputs, correctly generates KQL expressions, and conditionally creates metadata SQL with proper quoting.
263-269: Metadata SQL propagation through NOT is correct.Good use of a variable to avoid duplicating the "NOT " prefix.
416-443: Metadata SQL generation logic is well-structured.Correctly handles different operators and properly quotes variable names in metadata SQL while only escaping special characters in KQL expressions.
644-687: AND operator correctly propagates metadata SQL.The logic to include metadata SQL when at least one child has it is appropriate for AND operations.
703-727: OR operator metadata SQL logic is correct.Requiring all children to have metadata SQL for OR operations is the right approach, contrasting appropriately with the AND logic.
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java (2)
32-221: Test method naming is now consistent.Good refactoring to use the
testPushDownhelper method throughout.
267-321: Helper methods are well-designed.The overloaded
testPushDownmethods and refactoredtestFiltermethod provide good separation of concerns.
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (7)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpMetadataFilterConfig.java (2)
77-103: Consider adding edge case tests.While the current test coverage is good, consider adding tests for:
- Invalid SQL expressions
- Filters without range mapping
- Empty or malformed configuration files
77-103: Add edge case tests for SQL remapping.The current SQL remapping tests cover basic comparison operators well, but consider adding tests for edge cases as mentioned in previous reviews.
Add the following test cases to improve coverage:
@Test public void remapSqlEdgeCases() { ClpConfig config = new ClpConfig(); config.setMetadataFilterConfig(filterConfigPath); ClpMetadataFilterProvider filterProvider = new ClpMetadataFilterProvider(config); // Test filters without range mapping String metadataFilterSql = "(\"level\" >= 1 AND \"level\" <= 3)"; String remappedSql = filterProvider.remapFilterSql("clp.default.table_1", metadataFilterSql); assertEquals(remappedSql, metadataFilterSql); // Should remain unchanged // Test invalid SQL expressions assertThrows(Exception.class, () -> filterProvider.remapFilterSql("clp.default.table_1", "INVALID SQL")); // Test empty filter expression String emptyFilter = ""; String remappedEmpty = filterProvider.remapFilterSql("clp.default.table_1", emptyFilter); assertEquals(remappedEmpty, emptyFilter); } @Test public void testMalformedConfig() { // Test with non-existent config file ClpConfig config = new ClpConfig(); config.setMetadataFilterConfig("/non/existent/path.json"); assertThrows(Exception.class, () -> new ClpMetadataFilterProvider(config)); }presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizer.java (1)
92-97: Optimize required filter validation logic.The
checkContainsRequiredFiltersmethod is called even whenmetadataSqlQueryis empty, which could throw an exception unnecessarily. Consider checking only when metadata SQL is present.Apply this diff to optimize the logic:
- // Check required metadata filters before the KQL query - metadataFilterProvider.checkContainsRequiredFilters(clpTableHandle.getSchemaTableName(), metadataSqlQuery.orElse("")); if (metadataSqlQuery.isPresent()) { + metadataFilterProvider.checkContainsRequiredFilters(clpTableHandle.getSchemaTableName(), metadataSqlQuery.get()); metadataSqlQuery = Optional.of(metadataFilterProvider.remapFilterSql(scope, metadataSqlQuery.get())); log.debug("Metadata SQL query: %s", metadataSqlQuery); }presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadataFilterProvider.java (4)
81-95: Improve exception handling and add null check.The constructor should validate that the metadata filter config path is not null before attempting to read the file. Also, the exception message has a grammatical error.
Apply this diff:
@Inject public ClpMetadataFilterProvider(ClpConfig config) { requireNonNull(config, "config is null"); + requireNonNull(config.getMetadataFilterConfig(), "metadata filter config path is null"); ObjectMapper mapper = new ObjectMapper(); try { filterMap = mapper.readValue( new File(config.getMetadataFilterConfig()), new TypeReference<Map<String, List<Filter>>>() {}); } catch (IOException e) { - throw new PrestoException(CLP_METADATA_FILTER_CONFIG_NOT_FOUND, "Failed to metadata filter config file open."); + throw new PrestoException(CLP_METADATA_FILTER_CONFIG_NOT_FOUND, "Failed to open metadata filter config file: " + e.getMessage(), e); } }
97-112: Improve filter validation logic.The current implementation uses string
containsto check for filter columns, which is fragile and could match substrings incorrectly. For example, if checking for column "id", it would incorrectly match "user_id" or "identifier".Consider parsing the metadata SQL properly or using more precise matching:
+ Set<String> requiredFilters = getRequiredColumnNames(format("%s.%s", CONNECTOR_NAME, schemaTableName)); for (String columnName : getRequiredColumnNames(format("%s.%s", CONNECTOR_NAME, schemaTableName))) { - if (!metadataFilterSql.contains(columnName)) { + // Use word boundaries to ensure exact column name match + String pattern = "\\b\"?" + Pattern.quote(columnName) + "\"?\\b"; + if (!Pattern.compile(pattern).matcher(metadataFilterSql).find()) { hasRequiredMetadataFilterColumns = false; notFoundListBuilder.add(columnName); } }
139-168: Security concern: SQL injection risk in remapFilterSql.The method uses regex-based string replacement to modify SQL expressions, which is inherently risky and could lead to SQL injection vulnerabilities. The regex patterns don't properly validate the SQL structure and could match unintended parts of the query.
Consider using a proper SQL parser to safely manipulate SQL expressions instead of regex replacements. If regex must be used, ensure proper escaping and validation:
for (Map.Entry<String, RangeMapping> entry : mappings.entrySet()) { - String key = entry.getKey(); + String key = Pattern.quote(entry.getKey()); RangeMapping value = entry.getValue(); remappedSql = remappedSql.replaceAll( - format("\"(%s)\"\\s(>=?)\\s([0-9]*)", key), + format("\\b\"%s\"\\s+(>=?)\\s+([0-9]+)\\b", key), format("%s $2 $3", value.upperBound));
224-251: Add proper access modifiers to RangeMapping class.The nested class uses package-private fields which could lead to unintended modifications. Consider making them private with proper accessors.
Apply this diff:
private static class RangeMapping { @JsonProperty("lowerBound") - public String lowerBound; + private String lowerBound; @JsonProperty("upperBound") - public String upperBound; + private String upperBound; + + public String getLowerBound() + { + return lowerBound; + } + + public String getUpperBound() + { + return upperBound; + }Also update the remapFilterSql method to use the getters:
- format("%s $2 $3", value.upperBound)); + format("%s $2 $3", value.getUpperBound()));
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (3)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadataFilterProvider.java(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizer.java(3 hunks)presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpMetadataFilterConfig.java(1 hunks)
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: wraymo
PR: y-scope/presto#15
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java:22-33
Timestamp: 2025-06-13T12:56:06.325Z
Learning: `ClpMetadataProvider` is instantiated only once and used solely by the Presto coordinator, so concurrency/thread-safety guarantees are unnecessary.
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpMetadataFilterConfig.java (1)
Learnt from: wraymo
PR: y-scope/presto#15
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java:22-33
Timestamp: 2025-06-13T12:56:06.325Z
Learning: `ClpMetadataProvider` is instantiated only once and used solely by the Presto coordinator, so concurrency/thread-safety guarantees are unnecessary.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizer.java (1)
Learnt from: wraymo
PR: y-scope/presto#15
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java:22-33
Timestamp: 2025-06-13T12:56:06.325Z
Learning: `ClpMetadataProvider` is instantiated only once and used solely by the Presto coordinator, so concurrency/thread-safety guarantees are unnecessary.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadataFilterProvider.java (4)
Learnt from: wraymo
PR: y-scope/presto#15
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java:22-33
Timestamp: 2025-06-13T12:56:06.325Z
Learning: `ClpMetadataProvider` is instantiated only once and used solely by the Presto coordinator, so concurrency/thread-safety guarantees are unnecessary.
Learnt from: anlowee
PR: y-scope/presto#0
File: :0-0
Timestamp: 2025-06-19T08:41:46.288Z
Learning: In the y-scope/presto repository, prefer static imports for commonly used methods like String.format(). Instead of writing `String.format("%s", value)`, use static import `import static java.lang.String.format;` and call `format("%s", value)` directly.
Learnt from: anlowee
PR: y-scope/presto#0
File: :0-0
Timestamp: 2025-06-19T08:41:46.288Z
Learning: In the y-scope/presto repository, prefer using Guava's Immutable collections over regular collections when possible.
Learnt from: anlowee
PR: y-scope/presto#7
File: .github/workflows/hive-tests.yml:33-35
Timestamp: 2025-06-04T14:50:43.554Z
Learning: In the y-scope/presto repository, GitHub workflows are extended from OSS Presto and should maintain consistent formatting with the upstream patterns, including YAML indentation styles for workflow steps.
🧬 Code Graph Analysis (1)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadataFilterProvider.java (2)
presto-spi/src/main/java/com/facebook/presto/spi/PrestoException.java (1)
PrestoException(18-56)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnectorFactory.java (1)
ClpConnectorFactory(33-72)
⏰ Context from checks skipped due to timeout of 90000ms (15)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-plan-determinism)
- GitHub Check: test (17.0.13, :presto-main-base)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-resource-manager)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-distributed-non-hash-gen)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-tpch-distributed-queries)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-distributed-queries)
- GitHub Check: test (8.0.442, :presto-tests -P presto-tests-execution-memory)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-tpch-distributed-queries)
- GitHub Check: test (8.0.442, :presto-main-base)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-distributed-non-hash-gen)
- GitHub Check: test (8.0.442, :presto-tests -P presto-tests-general)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-plan-determinism)
- GitHub Check: maven-checks (17.0.13)
- GitHub Check: maven-checks (8.0.442)
- GitHub Check: test (:presto-docs)
🔇 Additional comments (11)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpMetadataFilterConfig.java (5)
37-46: LGTM!The setUp method correctly loads the test resource file from the classpath with proper null checking and exception handling.
48-61: Good test coverage for required filter validation.The test properly validates both the failure case (missing required filter) and success case (required filter present).
63-75: Well-structured test for hierarchical filter retrieval.The test effectively validates filter name aggregation across catalog, schema, and table scopes, properly using ImmutableSet for assertions.
32-46: LGTM: Proper resource loading with exception handling.The setup method correctly handles resource loading with appropriate exception handling for missing files and URI conversion issues.
63-75: LGTM: Comprehensive hierarchical filter name validation.The test properly validates filter name retrieval at catalog, schema, and table levels with expected hierarchical inheritance.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizer.java (3)
35-35: Appropriate additions for metadata filter support.The static import and new field are correctly added to support metadata filtering functionality.
Also applies to: 45-45
47-52: Constructor properly updated with null check.The constructor correctly accepts the new dependency and validates it with requireNonNull, following the existing pattern.
81-90: Clean integration of metadata filter columns.The code properly constructs the scope and passes metadata filter column names to the converter, enabling metadata SQL generation alongside KQL.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadataFilterProvider.java (3)
43-77: Excellent documentation!The class-level JavaDoc provides comprehensive information about metadata filter configuration, including clear examples and field descriptions.
180-196: Well-designed abstraction for scope-based collection.The
collectColumnNamesFromScopesmethod effectively reduces code duplication and properly uses ImmutableSet.Builder as preferred in the codebase.
198-222: Clean implementation of filter processing helpers.All helper methods properly handle null cases and correctly use Guava's immutable collections as preferred in the codebase.
| private static class Filter | ||
| { | ||
| @JsonProperty("columnName") | ||
| public String columnName; | ||
|
|
||
| @JsonProperty("rangeMapping") | ||
| public RangeMapping rangeMapping; | ||
|
|
||
| @JsonProperty("required") | ||
| public boolean required; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Improve encapsulation for Filter class.
Similar to RangeMapping, the Filter class should use private fields with proper accessors.
Apply this diff:
private static class Filter
{
@JsonProperty("columnName")
- public String columnName;
+ private String columnName;
@JsonProperty("rangeMapping")
- public RangeMapping rangeMapping;
+ private RangeMapping rangeMapping;
@JsonProperty("required")
- public boolean required;
+ private boolean required;
+
+ public String getColumnName()
+ {
+ return columnName;
+ }
+
+ public RangeMapping getRangeMapping()
+ {
+ return rangeMapping;
+ }
+
+ public boolean isRequired()
+ {
+ return required;
+ }
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| private static class Filter | |
| { | |
| @JsonProperty("columnName") | |
| public String columnName; | |
| @JsonProperty("rangeMapping") | |
| public RangeMapping rangeMapping; | |
| @JsonProperty("required") | |
| public boolean required; | |
| } | |
| private static class Filter | |
| { | |
| @JsonProperty("columnName") | |
| private String columnName; | |
| @JsonProperty("rangeMapping") | |
| private RangeMapping rangeMapping; | |
| @JsonProperty("required") | |
| private boolean required; | |
| public String getColumnName() | |
| { | |
| return columnName; | |
| } | |
| public RangeMapping getRangeMapping() | |
| { | |
| return rangeMapping; | |
| } | |
| public boolean isRequired() | |
| { | |
| return required; | |
| } | |
| } |
🤖 Prompt for AI Agents
In
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadataFilterProvider.java
around lines 253 to 263, the Filter class currently uses public fields which
breaks encapsulation. Change the fields columnName, rangeMapping, and required
to private and add corresponding public getter and setter methods for each field
to properly encapsulate the data, similar to the RangeMapping class.
| @Test | ||
| public void checkRequiredFilters() | ||
| { | ||
| ClpConfig config = new ClpConfig(); | ||
| config.setMetadataFilterConfig(filterConfigPath); | ||
| ClpMetadataFilterProvider filterProvider = new ClpMetadataFilterProvider(config); | ||
| SchemaTableName testTableSchemaTableName = new SchemaTableName("default", "table_1"); | ||
| assertThrows(PrestoException.class, () -> filterProvider.checkContainsRequiredFilters( | ||
| testTableSchemaTableName, | ||
| "(\"level\" >= 1 AND \"level\" <= 3)")); | ||
| filterProvider.checkContainsRequiredFilters( | ||
| testTableSchemaTableName, | ||
| "(\"msg.timestamp\" > 1234 AND \"msg.timestamp\" < 5678)"); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Enhance exception validation specificity.
The test correctly validates required filter enforcement, but consider making the exception assertion more specific by checking the exception message or cause.
- assertThrows(PrestoException.class, () -> filterProvider.checkContainsRequiredFilters(
- testTableSchemaTableName,
- "(\"level\" >= 1 AND \"level\" <= 3)"));
+ PrestoException exception = assertThrows(PrestoException.class, () -> filterProvider.checkContainsRequiredFilters(
+ testTableSchemaTableName,
+ "(\"level\" >= 1 AND \"level\" <= 3)"));
+ // Optionally verify exception message contains expected content
+ // assertTrue(exception.getMessage().contains("required filter"));📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| @Test | |
| public void checkRequiredFilters() | |
| { | |
| ClpConfig config = new ClpConfig(); | |
| config.setMetadataFilterConfig(filterConfigPath); | |
| ClpMetadataFilterProvider filterProvider = new ClpMetadataFilterProvider(config); | |
| SchemaTableName testTableSchemaTableName = new SchemaTableName("default", "table_1"); | |
| assertThrows(PrestoException.class, () -> filterProvider.checkContainsRequiredFilters( | |
| testTableSchemaTableName, | |
| "(\"level\" >= 1 AND \"level\" <= 3)")); | |
| filterProvider.checkContainsRequiredFilters( | |
| testTableSchemaTableName, | |
| "(\"msg.timestamp\" > 1234 AND \"msg.timestamp\" < 5678)"); | |
| } | |
| @Test | |
| public void checkRequiredFilters() | |
| { | |
| ClpConfig config = new ClpConfig(); | |
| config.setMetadataFilterConfig(filterConfigPath); | |
| ClpMetadataFilterProvider filterProvider = new ClpMetadataFilterProvider(config); | |
| SchemaTableName testTableSchemaTableName = new SchemaTableName("default", "table_1"); | |
| - assertThrows(PrestoException.class, () -> filterProvider.checkContainsRequiredFilters( | |
| - testTableSchemaTableName, | |
| - "(\"level\" >= 1 AND \"level\" <= 3)")); | |
| + PrestoException exception = assertThrows(PrestoException.class, () -> filterProvider.checkContainsRequiredFilters( | |
| + testTableSchemaTableName, | |
| + "(\"level\" >= 1 AND \"level\" <= 3)")); | |
| + // Optionally verify exception message contains expected content | |
| + // assertTrue(exception.getMessage().contains("required filter")); | |
| filterProvider.checkContainsRequiredFilters( | |
| testTableSchemaTableName, | |
| "(\"msg.timestamp\" > 1234 AND \"msg.timestamp\" < 5678)"); | |
| } |
🤖 Prompt for AI Agents
In
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpMetadataFilterConfig.java
around lines 48 to 61, the test asserts that a PrestoException is thrown but
does not verify the exception message or cause. Modify the assertThrows call to
capture the thrown exception and add assertions to check that the exception
message or cause matches the expected error details, ensuring more precise
validation of the failure condition.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadataFilterProvider.java
Show resolved
Hide resolved
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadataFilterProvider.java
Show resolved
Hide resolved
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpExpression.java
Outdated
Show resolved
Hide resolved
kirkrodrigues
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review of clp.rst.
Co-authored-by: kirkrodrigues <[email protected]>
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpExpression.java
Outdated
Show resolved
Hide resolved
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpExpression.java
Outdated
Show resolved
Hide resolved
| * A translator to translate Presto RowExpressions into KQL (Kibana Query Language) filters used as | ||
| * CLP queries. This is used primarily for pushing down supported filters to the CLP engine. This | ||
| * class implements the RowExpressionVisitor interface and recursively walks Presto filter | ||
| * expressions, attempting to convert supported expressions into corresponding KQL filter strings. | ||
| * Any part of the expression that cannot be translated is preserved as a "remaining expression" for | ||
| * potential fallback processing. | ||
| * CLP queries and SQL filters used for filtering splits by metadata database. This is used | ||
| * primarily for pushing down supported filters to the CLP engine. This class implements the | ||
| * {@link RowExpressionVisitor} interface and recursively walks Presto filter expressions, | ||
| * attempting to convert supported expressions into corresponding KQL filter strings and SQL | ||
| * filter strings for metadata filtering. Any part of the expression that cannot be translated to | ||
| * KQL is preserved as a "remaining expression" for potential fallback processing. | ||
| * <p></p> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't suggest but:
* A translator to translate Presto RowExpressions into:
* <ul>
* <li>KQL (Kibana Query Language) filters used to push down supported filters to the CLP
* engine.</li>
* <li>SQL filters used for filtering splits in CLP's metadata database.</li>
* </ul>
* This class implements the {@link RowExpressionVisitor} interface and recursively walks Presto
* filter expressions, attempting to convert supported expressions into corresponding KQL filter
* strings and SQL filter strings for metadata filtering. Any part of the expression that cannot be
* translated to KQL is preserved as a "remaining expression" for potential fallback processing.
* <p></p>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for the first line I changed to:
A translator to translate Presto {@link RowExpression}s into:
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java
Outdated
Show resolved
Hide resolved
| * <ul>All arguments must have numeric types.</ul> | ||
| * <ul>The first argument is a variable reference expression.</ul> | ||
| * <ul>The second and third arguments are constant expressions.</ul> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I think you meant to do
<li>? - If the first sentence before the list is incomplete, then each bullet should start with a small letter as if it's continuing the sentence before the list.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadataFilterProvider.java
Outdated
Show resolved
Hide resolved
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadataFilterProvider.java
Outdated
Show resolved
Hide resolved
| Optional<String> metadataSqlQuery = clpExpression.getMetadataSqlQuery(); | ||
| Optional<RowExpression> remainingPredicate = clpExpression.getRemainingExpression(); | ||
|
|
||
| // Check required metadata filters before the KQL query |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Becuase the kqlQuery.isPresent() might be false, and if that case it will directly return the node (see L99~101) and won't execute the rest code. So we have to put the required fields check before it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add some of that detail to the comment? Otherwise it won't be clear to a future developer unless they read the code in detail, in which case the current comment wouldn't help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java
Outdated
Show resolved
Hide resolved
| List<ClpSplit> splits = clpSplitProvider.listSplits(layoutHandle); | ||
| assertEquals(splits.size(), expectedSplits.size()); | ||
| for (Map.Entry<String, List<ArchivesTableRow>> entry : tableSplits.entrySet()) { | ||
| // Without metadata filters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should have an empty line before each comment in this method.
a75d694 to
a6341a3
Compare
Co-authored-by: kirkrodrigues <[email protected]>
a6341a3 to
3d3eda4
Compare
| * <p></p> | ||
| * The configuration file is specified by the {@code clp.metadata-filter-config} property | ||
| * and defines metadata filters used to optimize query execution through split pruning. | ||
| * <p></p> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| * <p></p> | |
| * <p></p> | |
| * Each filter config indicates how a data column--a column in the Presto table--should be mapped to | |
| * a metadata column--a column in CLP’s metadata database. | |
| * <p></p> |
| * <li><b>{@code columnName}</b>: the name of a column in the table's logical schema. Currently, | ||
| * only numeric-type columns can be used as metadata filters.</li> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| * <li><b>{@code columnName}</b>: the name of a column in the table's logical schema. Currently, | |
| * only numeric-type columns can be used as metadata filters.</li> | |
| * <li><b>{@code columnName}</b>: the data column's name. Currently, only numeric-type columns can | |
| * be used as metadata filters.</li> |
| * <p> | ||
| * For example, a condition such as: | ||
| * </p> | ||
| * <pre>{@code | ||
| * "msg.timestamp" > 1234 AND "msg.timestamp" < 5678 | ||
| * }</pre> | ||
| * | ||
| * <p> | ||
| * will be rewritten as: | ||
| * </p> | ||
| * <pre>{@code | ||
| * "end_timestamp" > 1234 AND "begin_timestamp" < 5678 | ||
| * }</pre> | ||
| * | ||
| * <p> | ||
| * This ensures the filter applies to a superset of the actual result set, enabling safe | ||
| * pruning. | ||
| * </p> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Do we need this example?
- I think the last sentence might confuse readers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, removed the example and the last sentence
| // Perform required metadata filter checks before handling the KQL query. If kqlQuery | ||
| // is not present, the method returns early skipping all subsequent logic. Therefore, | ||
| // the check must happen before that return. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // Perform required metadata filter checks before handling the KQL query. If kqlQuery | |
| // is not present, the method returns early skipping all subsequent logic. Therefore, | |
| // the check must happen before that return. | |
| // Perform required metadata filter checks before handling the KQL query (if kqlQuery | |
| // isn't present, we'll return early, skipping subsequent checks). |
…nto xwei/metadata-filter-v2
kirkrodrigues
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the PR title, how about:
feat: Add support for defining and using specific columns for metadata filtering.
Description
The CLP metadata MySQL database already supported some metadata filtering like timestamp. It is similar to what we did before to use the date-sorted directory structure to filter archives at top level, which can significantly reduce the number of splits to scan.
Benefits:
Trade Offs:
To implement this, we added a new config property and designed a JSON format config file for metadata filter configuration. For example:
{ "clp": [ { "columnName": "level" } ], "clp.default": [ { "columnName": "author" } ], "clp.default.table_1": [ { "columnName": "msg.timestamp", "rangeMapping": { "lowerBound": "begin_timestamp", "upperBound": "end_timestamp" }, "required": true }, { "columnName": "file_name" } ] }Explanation:
"clp": Adds a filter on the columnlevelfor all schemas and tables under theclpcatalog."clp.default": Adds a filter onauthorfor all tables under theclp.defaultschema."clp.default.table_1": Adds two filters for the tableclp.default.table_1:msg.timestampis remapped viarangeMappingand is marked as required.file_nameis used as-is without remapping.For more details about this config file, please refer to the modified
clp.rstdoc.We also modified the
ClpExpressionandClpFilterToKqlConverterto extract the metadata filter SQL query when generating KQL pushdown.Checklist
breaking change.
Validation performed
E2E test with sample log with format like:
A new unit test class
TestClpMetadataFilterConfigand a few unit tests added into existing classesTestClpFilterToKqlandTestClpSplit.Summary by CodeRabbit
New Features
Documentation
Bug Fixes
Tests
Chores