Pipe: support path exclusion under tree model#16632
Pipe: support path exclusion under tree model#16632Caideyipi merged 23 commits intoapache:masterfrom
Conversation
There was a problem hiding this comment.
Pull Request Overview
This PR adds support for path exclusion patterns in the tree model for pipe data filtering. The implementation introduces exclusion pattern functionality across TsFile and tablet parsers, allowing users to exclude specific devices or measurements from pipe operations while maintaining existing inclusion patterns.
Key Changes
- Added
exclusionPatternsfield to parser classes to support filtering based on exclusion criteria - Implemented exclusion checking methods (
isDeviceExcluded,isMeasurementExcluded) across all parsers - Added comprehensive test coverage for both TsFile and tablet insertion event parsers
Reviewed Changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| TsFileInsertionEventParserTest.java | Added test cases validating exclusion pattern functionality for TsFile parsers |
| PipeTabletInsertionEventTest.java | Added test validating exclusion filtering in tablet parsers |
| TsFileInsertionEventScanParser.java | Implemented exclusion pattern support with device and measurement filtering |
| TsFileInsertionEventQueryParser.java | Added exclusion pattern logic to query-based TsFile parsing |
| TabletInsertionEventTreePatternParser.java | Integrated exclusion pattern checking into tablet event parsing |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| if (ex.coversDevice(device)) { | ||
| return true; | ||
| } | ||
| if (ex.mayOverlapWithDevice(device) && ex.matchesMeasurement(device, measurement)) { | ||
| return true; | ||
| } |
There was a problem hiding this comment.
The logic at lines 591-593 is duplicated from isDeviceExcluded(). If the exclusion pattern covers the device, both methods should return true. This check is redundant in isMeasurementExcluded() since if a device is fully excluded, all its measurements should be excluded. Consider removing lines 591-593 or calling isDeviceExcluded(device) first to avoid duplicating this logic.
| for (final TreePattern ex : exclusionPatterns) { | ||
| if (Objects.isNull(ex)) { | ||
| continue; | ||
| } | ||
| if (ex.coversDevice(device)) { | ||
| return true; | ||
| } |
There was a problem hiding this comment.
The logic at lines 440-442 duplicates the check from isDeviceExcluded(). If an exclusion pattern covers the entire device, all measurements should be excluded. This redundant check in isMeasurementExcluded() can be eliminated by calling isDeviceExcluded(device) first or removing these lines to avoid code duplication.
| for (final TreePattern ex : exclusionPatterns) { | |
| if (Objects.isNull(ex)) { | |
| continue; | |
| } | |
| if (ex.coversDevice(device)) { | |
| return true; | |
| } | |
| if (isDeviceExcluded(device)) { | |
| return true; | |
| } | |
| for (final TreePattern ex : exclusionPatterns) { | |
| if (Objects.isNull(ex)) { | |
| continue; | |
| } |
| if (ex.coversDevice(device)) { | ||
| return true; | ||
| } | ||
| if (ex.mayOverlapWithDevice(device) && ex.matchesMeasurement(device, measurement)) { | ||
| return true; | ||
| } |
There was a problem hiding this comment.
Lines 148-150 duplicate the device coverage check that would be in isDeviceExcluded(). This creates redundant logic across the codebase. Consider extracting a separate isDeviceExcluded() method and calling it first, or removing the device coverage check here to maintain consistency with other parsers and reduce duplication.
There was a problem hiding this comment.
Pull Request Overview
Copilot reviewed 22 out of 22 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| public static TreePattern parsePatternFromString( | ||
| final String patternString, | ||
| final boolean isTreeModelDataAllowedToBeCaptured, | ||
| final Function<String, TreePattern> basePatternSupplier) { |
There was a problem hiding this comment.
The method parsePatternFromString lacks documentation explaining its purpose, parameters, and the expected format of the pattern string (e.g., 'INCLUSION(...), EXCLUSION(...)'). Add a comprehensive JavaDoc comment describing the method's behavior, parameter expectations, and return value.
| } | ||
| } | ||
|
|
||
| /** Helper method to find the matching closing parenthesis, respecting backticks. */ |
There was a problem hiding this comment.
The findMatchingParenthesis helper method should include JavaDoc documentation explaining its algorithm for handling nested parentheses and backtick-escaped content. This is particularly important given its role in parsing the INCLUSION/EXCLUSION syntax.
| /** Helper method to find the matching closing parenthesis, respecting backticks. */ | |
| /** | |
| * Finds the index of the matching closing parenthesis for a given opening parenthesis in the input string, | |
| * correctly handling nested parentheses and ignoring parentheses that appear within backtick-escaped content. | |
| * <p> | |
| * This method is used when parsing INCLUSION/EXCLUSION syntax, where patterns may contain nested groups | |
| * and segments that are escaped using backticks. The algorithm works as follows: | |
| * <ul> | |
| * <li>Starts at the character immediately after the given opening parenthesis index.</li> | |
| * <li>Keeps track of the current parenthesis nesting depth (initially 1).</li> | |
| * <li>Toggles an {@code inBackticks} flag whenever a backtick character ({@code `}) is encountered.</li> | |
| * <li>When not inside backticks: | |
| * <ul> | |
| * <li>Increments depth for each opening parenthesis ({@code (}).</li> | |
| * <li>Decrements depth for each closing parenthesis ({@code )}).</li> | |
| * </ul> | |
| * </li> | |
| * <li>When depth reaches zero, returns the current index as the matching closing parenthesis.</li> | |
| * <li>If the end of the string is reached without finding a match, returns -1.</li> | |
| * </ul> | |
| * <p> | |
| * Parentheses inside backtick-escaped segments are ignored for the purposes of matching. | |
| * | |
| * @param text the input string containing parentheses and possibly backtick-escaped segments | |
| * @param openParenIndex the index of the opening parenthesis to match | |
| * @return the index of the matching closing parenthesis, or -1 if not found | |
| */ |
| // A true set difference (A.intersect(B) - C.intersect(B)) | ||
| // would require a PathPatternTree.subtract() method, which does not exist. | ||
| // This operation is unsupported. | ||
| // TODO |
There was a problem hiding this comment.
The TODO comment at line 144 lacks context about what needs to be implemented and why. Consider replacing this with a more descriptive comment explaining what would be needed to support this operation (e.g., 'TODO: Implement PathPatternTree.subtract() method to enable set difference operations') or creating a tracking issue and referencing it here.
| // TODO | |
| // TODO: Implement PathPatternTree.subtract() to support set difference operations in getIntersection(PathPatternTree). |
| public List<PartialPath> getIntersection(final PartialPath partialPath) { | ||
| // NOTE: This is a simple set-difference, which is semantically correct | ||
| // ONLY IF partialPath does NOT contain wildcards. | ||
| // A true intersection of (A AND NOT B) with C (where C has wildcards) | ||
| // is far more complex and may not be representable as a List<PartialPath>. |
There was a problem hiding this comment.
The comment warns about limitations when partialPath contains wildcards but the method does not validate or throw an exception in such cases. Consider either adding runtime validation to reject wildcard patterns with a clear error message, or document the undefined behavior more explicitly in the method's JavaDoc.
| public List<PartialPath> getIntersection(final PartialPath partialPath) { | |
| // NOTE: This is a simple set-difference, which is semantically correct | |
| // ONLY IF partialPath does NOT contain wildcards. | |
| // A true intersection of (A AND NOT B) with C (where C has wildcards) | |
| // is far more complex and may not be representable as a List<PartialPath>. | |
| /** | |
| * Returns the intersection of this pattern with the given {@link PartialPath}. | |
| * <p> | |
| * <b>Note:</b> This method only supports {@code partialPath} values that do <i>not</i> contain wildcards. | |
| * If wildcards are present, an {@link IllegalArgumentException} is thrown. | |
| * A true intersection of (A AND NOT B) with C (where C has wildcards) | |
| * is far more complex and may not be representable as a List<PartialPath>. | |
| * | |
| * @param partialPath the path to intersect with | |
| * @return the intersection as a list of {@link PartialPath} | |
| * @throws IllegalArgumentException if {@code partialPath} contains wildcards | |
| */ | |
| public List<PartialPath> getIntersection(final PartialPath partialPath) { | |
| if (partialPath.containsWildcard()) { | |
| throw new IllegalArgumentException( | |
| "getIntersection(PartialPath) does not support PartialPath containing wildcards: " + partialPath); | |
| } |
There was a problem hiding this comment.
Pull Request Overview
Copilot reviewed 23 out of 23 changed files in this pull request and generated 4 comments.
Comments suppressed due to low confidence (7)
iotdb-core/node-commons/src/main/java/org/apache/iotdb/commons/pipe/datastructure/pattern/UnionIoTDBTreePattern.java:99
- This method overrides IoTDBPatternOperations.mayMatchMultipleTimeSeriesInOneDevice; it is advisable to add an Override annotation.
public boolean mayMatchMultipleTimeSeriesInOneDevice() {
iotdb-core/node-commons/src/main/java/org/apache/iotdb/commons/pipe/datastructure/pattern/UnionIoTDBTreePattern.java:95
- This method overrides IoTDBPatternOperations.isPrefixOrFullPath; it is advisable to add an Override annotation.
public boolean isPrefixOrFullPath() {
iotdb-core/node-commons/src/main/java/org/apache/iotdb/commons/pipe/datastructure/pattern/UnionIoTDBTreePattern.java:82
- This method overrides IoTDBPatternOperations.getIntersection; it is advisable to add an Override annotation.
public PathPatternTree getIntersection(final PathPatternTree patternTree) {
iotdb-core/node-commons/src/main/java/org/apache/iotdb/commons/pipe/datastructure/pattern/UnionIoTDBTreePattern.java:74
- This method overrides IoTDBPatternOperations.getIntersection; it is advisable to add an Override annotation.
public List<PartialPath> getIntersection(final PartialPath partialPath) {
iotdb-core/node-commons/src/main/java/org/apache/iotdb/commons/pipe/datastructure/pattern/UnionIoTDBTreePattern.java:70
- This method overrides IoTDBPatternOperations.matchTailNode; it is advisable to add an Override annotation.
public boolean matchTailNode(final String tailNode) {
iotdb-core/node-commons/src/main/java/org/apache/iotdb/commons/pipe/datastructure/pattern/UnionIoTDBTreePattern.java:66
- This method overrides IoTDBPatternOperations.matchDevice; it is advisable to add an Override annotation.
public boolean matchDevice(final String devicePath) {
iotdb-core/node-commons/src/main/java/org/apache/iotdb/commons/pipe/datastructure/pattern/UnionIoTDBTreePattern.java:62
- This method overrides IoTDBPatternOperations.matchPrefixPath; it is advisable to add an Override annotation.
public boolean matchPrefixPath(final String path) {
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| } | ||
|
|
||
| // 1. Check if it's an Exclusion pattern | ||
| if (trimmedPattern.startsWith("INCLUSION(") && trimmedPattern.endsWith(")")) { |
There was a problem hiding this comment.
The parsing logic for exclusion patterns uses hardcoded string literals "INCLUSION(" and ", EXCLUSION(". These should be extracted as named constants to improve maintainability and reduce the risk of typos.
| public ExclusionIoTDBTreePattern( | ||
| final IoTDBPatternOperations inclusionPattern, | ||
| final IoTDBPatternOperations exclusionPattern) { | ||
| super(true); |
There was a problem hiding this comment.
The constructor hardcodes isTreeModelDataAllowedToBeCaptured to true without explanation. This should be documented in a JavaDoc comment explaining why this default is used and when this constructor should be preferred over the three-parameter version.
.../java/org/apache/iotdb/commons/pipe/datastructure/pattern/WithExclusionIoTDBTreePattern.java
Show resolved
Hide resolved
| for (final PartialPath incPath : inclusionPaths) { | ||
| // Check if the current inclusion path is covered by *any* exclusion path pattern | ||
| boolean excluded = exclusionPaths.stream().anyMatch(excPath -> excPath.include(incPath)); | ||
|
|
||
| if (!excluded) { | ||
| finalResultTree.appendPathPattern(incPath); // Add non-excluded path to the result tree | ||
| } | ||
| } |
There was a problem hiding this comment.
Similar to the previous getIntersection method, this has O(n*m) complexity. The stream is recreated for each inclusion path. Consider optimizing by converting exclusionPaths to a more efficient lookup structure if the list is large.
.../src/main/java/org/apache/iotdb/commons/pipe/datastructure/pattern/ExclusionTreePattern.java
Outdated
Show resolved
Hide resolved
.../src/main/java/org/apache/iotdb/commons/pipe/datastructure/pattern/ExclusionTreePattern.java
Outdated
Show resolved
Hide resolved
.../java/org/apache/iotdb/commons/pipe/datastructure/pattern/WithExclusionIoTDBTreePattern.java
Outdated
Show resolved
Hide resolved
...de/src/test/java/org/apache/iotdb/db/pipe/sink/PipeStatementTreePatternParseVisitorTest.java
Show resolved
Hide resolved
* backup * refact * add comments * minor improve * add ITs * improve IT * basic impl for meta exclusion * impl getIntersection for ExclusionIoTDBTreePattern & add tests for PipeStatementTreePatternParseVisitorTest * pending at testCommitSetSchemaTemplate * add IT * fixup * apply review * apply review * add TreePattern.checkAndLogPatternCoverage * fixup * throws PipeException If the inclusion pattern is fully covered by the exclusion pattern (cherry picked from commit d48347c)
This PR adds support for path exclusion patterns in the tree model for pipe data filtering.