Skip to content

Pipe: support path exclusion under tree model#16632

Merged
Caideyipi merged 23 commits intoapache:masterfrom
VGalaxies:path-exclusion
Nov 12, 2025
Merged

Pipe: support path exclusion under tree model#16632
Caideyipi merged 23 commits intoapache:masterfrom
VGalaxies:path-exclusion

Conversation

@VGalaxies
Copy link
Contributor

@VGalaxies VGalaxies commented Oct 21, 2025

This PR adds support for path exclusion patterns in the tree model for pipe data filtering.

@VGalaxies VGalaxies requested a review from Copilot October 21, 2025 02:42
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds support for path exclusion patterns in the tree model for pipe data filtering. The implementation introduces exclusion pattern functionality across TsFile and tablet parsers, allowing users to exclude specific devices or measurements from pipe operations while maintaining existing inclusion patterns.

Key Changes

  • Added exclusionPatterns field to parser classes to support filtering based on exclusion criteria
  • Implemented exclusion checking methods (isDeviceExcluded, isMeasurementExcluded) across all parsers
  • Added comprehensive test coverage for both TsFile and tablet insertion event parsers

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
TsFileInsertionEventParserTest.java Added test cases validating exclusion pattern functionality for TsFile parsers
PipeTabletInsertionEventTest.java Added test validating exclusion filtering in tablet parsers
TsFileInsertionEventScanParser.java Implemented exclusion pattern support with device and measurement filtering
TsFileInsertionEventQueryParser.java Added exclusion pattern logic to query-based TsFile parsing
TabletInsertionEventTreePatternParser.java Integrated exclusion pattern checking into tablet event parsing

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment on lines 591 to 596
if (ex.coversDevice(device)) {
return true;
}
if (ex.mayOverlapWithDevice(device) && ex.matchesMeasurement(device, measurement)) {
return true;
}
Copy link

Copilot AI Oct 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic at lines 591-593 is duplicated from isDeviceExcluded(). If the exclusion pattern covers the device, both methods should return true. This check is redundant in isMeasurementExcluded() since if a device is fully excluded, all its measurements should be excluded. Consider removing lines 591-593 or calling isDeviceExcluded(device) first to avoid duplicating this logic.

Copilot uses AI. Check for mistakes.
Comment on lines 436 to 442
for (final TreePattern ex : exclusionPatterns) {
if (Objects.isNull(ex)) {
continue;
}
if (ex.coversDevice(device)) {
return true;
}
Copy link

Copilot AI Oct 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic at lines 440-442 duplicates the check from isDeviceExcluded(). If an exclusion pattern covers the entire device, all measurements should be excluded. This redundant check in isMeasurementExcluded() can be eliminated by calling isDeviceExcluded(device) first or removing these lines to avoid code duplication.

Suggested change
for (final TreePattern ex : exclusionPatterns) {
if (Objects.isNull(ex)) {
continue;
}
if (ex.coversDevice(device)) {
return true;
}
if (isDeviceExcluded(device)) {
return true;
}
for (final TreePattern ex : exclusionPatterns) {
if (Objects.isNull(ex)) {
continue;
}

Copilot uses AI. Check for mistakes.
Comment on lines 148 to 153
if (ex.coversDevice(device)) {
return true;
}
if (ex.mayOverlapWithDevice(device) && ex.matchesMeasurement(device, measurement)) {
return true;
}
Copy link

Copilot AI Oct 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lines 148-150 duplicate the device coverage check that would be in isDeviceExcluded(). This creates redundant logic across the codebase. Consider extracting a separate isDeviceExcluded() method and calling it first, or removing the device coverage check here to maintain consistency with other parsers and reduce duplication.

Copilot uses AI. Check for mistakes.
@VGalaxies VGalaxies requested a review from Copilot October 28, 2025 16:01
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 22 out of 22 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +185 to +188
public static TreePattern parsePatternFromString(
final String patternString,
final boolean isTreeModelDataAllowedToBeCaptured,
final Function<String, TreePattern> basePatternSupplier) {
Copy link

Copilot AI Oct 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method parsePatternFromString lacks documentation explaining its purpose, parameters, and the expected format of the pattern string (e.g., 'INCLUSION(...), EXCLUSION(...)'). Add a comprehensive JavaDoc comment describing the method's behavior, parameter expectations, and return value.

Copilot uses AI. Check for mistakes.
}
}

/** Helper method to find the matching closing parenthesis, respecting backticks. */
Copy link

Copilot AI Oct 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The findMatchingParenthesis helper method should include JavaDoc documentation explaining its algorithm for handling nested parentheses and backtick-escaped content. This is particularly important given its role in parsing the INCLUSION/EXCLUSION syntax.

Suggested change
/** Helper method to find the matching closing parenthesis, respecting backticks. */
/**
* Finds the index of the matching closing parenthesis for a given opening parenthesis in the input string,
* correctly handling nested parentheses and ignoring parentheses that appear within backtick-escaped content.
* <p>
* This method is used when parsing INCLUSION/EXCLUSION syntax, where patterns may contain nested groups
* and segments that are escaped using backticks. The algorithm works as follows:
* <ul>
* <li>Starts at the character immediately after the given opening parenthesis index.</li>
* <li>Keeps track of the current parenthesis nesting depth (initially 1).</li>
* <li>Toggles an {@code inBackticks} flag whenever a backtick character ({@code `}) is encountered.</li>
* <li>When not inside backticks:
* <ul>
* <li>Increments depth for each opening parenthesis ({@code (}).</li>
* <li>Decrements depth for each closing parenthesis ({@code )}).</li>
* </ul>
* </li>
* <li>When depth reaches zero, returns the current index as the matching closing parenthesis.</li>
* <li>If the end of the string is reached without finding a match, returns -1.</li>
* </ul>
* <p>
* Parentheses inside backtick-escaped segments are ignored for the purposes of matching.
*
* @param text the input string containing parentheses and possibly backtick-escaped segments
* @param openParenIndex the index of the opening parenthesis to match
* @return the index of the matching closing parenthesis, or -1 if not found
*/

Copilot uses AI. Check for mistakes.
// A true set difference (A.intersect(B) - C.intersect(B))
// would require a PathPatternTree.subtract() method, which does not exist.
// This operation is unsupported.
// TODO
Copy link

Copilot AI Oct 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The TODO comment at line 144 lacks context about what needs to be implemented and why. Consider replacing this with a more descriptive comment explaining what would be needed to support this operation (e.g., 'TODO: Implement PathPatternTree.subtract() method to enable set difference operations') or creating a tracking issue and referencing it here.

Suggested change
// TODO
// TODO: Implement PathPatternTree.subtract() to support set difference operations in getIntersection(PathPatternTree).

Copilot uses AI. Check for mistakes.
Comment on lines 119 to 123
public List<PartialPath> getIntersection(final PartialPath partialPath) {
// NOTE: This is a simple set-difference, which is semantically correct
// ONLY IF partialPath does NOT contain wildcards.
// A true intersection of (A AND NOT B) with C (where C has wildcards)
// is far more complex and may not be representable as a List<PartialPath>.
Copy link

Copilot AI Oct 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment warns about limitations when partialPath contains wildcards but the method does not validate or throw an exception in such cases. Consider either adding runtime validation to reject wildcard patterns with a clear error message, or document the undefined behavior more explicitly in the method's JavaDoc.

Suggested change
public List<PartialPath> getIntersection(final PartialPath partialPath) {
// NOTE: This is a simple set-difference, which is semantically correct
// ONLY IF partialPath does NOT contain wildcards.
// A true intersection of (A AND NOT B) with C (where C has wildcards)
// is far more complex and may not be representable as a List<PartialPath>.
/**
* Returns the intersection of this pattern with the given {@link PartialPath}.
* <p>
* <b>Note:</b> This method only supports {@code partialPath} values that do <i>not</i> contain wildcards.
* If wildcards are present, an {@link IllegalArgumentException} is thrown.
* A true intersection of (A AND NOT B) with C (where C has wildcards)
* is far more complex and may not be representable as a List&lt;PartialPath&gt;.
*
* @param partialPath the path to intersect with
* @return the intersection as a list of {@link PartialPath}
* @throws IllegalArgumentException if {@code partialPath} contains wildcards
*/
public List<PartialPath> getIntersection(final PartialPath partialPath) {
if (partialPath.containsWildcard()) {
throw new IllegalArgumentException(
"getIntersection(PartialPath) does not support PartialPath containing wildcards: " + partialPath);
}

Copilot uses AI. Check for mistakes.
@VGalaxies VGalaxies changed the title [DNM] Pipe: support path exclusion under tree model Pipe: support path exclusion under tree model Nov 2, 2025
@VGalaxies VGalaxies marked this pull request as ready for review November 2, 2025 15:51
@VGalaxies VGalaxies requested a review from Copilot November 2, 2025 15:51
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 23 out of 23 changed files in this pull request and generated 4 comments.

Comments suppressed due to low confidence (7)

iotdb-core/node-commons/src/main/java/org/apache/iotdb/commons/pipe/datastructure/pattern/UnionIoTDBTreePattern.java:99

  public boolean mayMatchMultipleTimeSeriesInOneDevice() {

iotdb-core/node-commons/src/main/java/org/apache/iotdb/commons/pipe/datastructure/pattern/UnionIoTDBTreePattern.java:95

  public boolean isPrefixOrFullPath() {

iotdb-core/node-commons/src/main/java/org/apache/iotdb/commons/pipe/datastructure/pattern/UnionIoTDBTreePattern.java:82

  public PathPatternTree getIntersection(final PathPatternTree patternTree) {

iotdb-core/node-commons/src/main/java/org/apache/iotdb/commons/pipe/datastructure/pattern/UnionIoTDBTreePattern.java:74

  public List<PartialPath> getIntersection(final PartialPath partialPath) {

iotdb-core/node-commons/src/main/java/org/apache/iotdb/commons/pipe/datastructure/pattern/UnionIoTDBTreePattern.java:70

  public boolean matchTailNode(final String tailNode) {

iotdb-core/node-commons/src/main/java/org/apache/iotdb/commons/pipe/datastructure/pattern/UnionIoTDBTreePattern.java:66

  public boolean matchDevice(final String devicePath) {

iotdb-core/node-commons/src/main/java/org/apache/iotdb/commons/pipe/datastructure/pattern/UnionIoTDBTreePattern.java:62

  public boolean matchPrefixPath(final String path) {

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

}

// 1. Check if it's an Exclusion pattern
if (trimmedPattern.startsWith("INCLUSION(") && trimmedPattern.endsWith(")")) {
Copy link

Copilot AI Nov 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parsing logic for exclusion patterns uses hardcoded string literals "INCLUSION(" and ", EXCLUSION(". These should be extracted as named constants to improve maintainability and reduce the risk of typos.

Copilot uses AI. Check for mistakes.
Comment on lines 51 to 54
public ExclusionIoTDBTreePattern(
final IoTDBPatternOperations inclusionPattern,
final IoTDBPatternOperations exclusionPattern) {
super(true);
Copy link

Copilot AI Nov 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The constructor hardcodes isTreeModelDataAllowedToBeCaptured to true without explanation. This should be documented in a JavaDoc comment explaining why this default is used and when this constructor should be preferred over the three-parameter version.

Copilot uses AI. Check for mistakes.
Comment on lines +172 to +179
for (final PartialPath incPath : inclusionPaths) {
// Check if the current inclusion path is covered by *any* exclusion path pattern
boolean excluded = exclusionPaths.stream().anyMatch(excPath -> excPath.include(incPath));

if (!excluded) {
finalResultTree.appendPathPattern(incPath); // Add non-excluded path to the result tree
}
}
Copy link

Copilot AI Nov 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to the previous getIntersection method, this has O(n*m) complexity. The stream is recreated for each inclusion path. Consider optimizing by converting exclusionPaths to a more efficient lookup structure if the list is large.

Copilot uses AI. Check for mistakes.
Copy link
Collaborator

@Caideyipi Caideyipi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PTAL

@VGalaxies VGalaxies closed this Nov 10, 2025
@VGalaxies VGalaxies requested a review from Caideyipi November 10, 2025 03:58
@VGalaxies VGalaxies reopened this Nov 10, 2025
@Caideyipi Caideyipi merged commit d48347c into apache:master Nov 12, 2025
38 of 39 checks passed
@VGalaxies VGalaxies deleted the path-exclusion branch November 15, 2025 07:55
JackieTien97 pushed a commit that referenced this pull request Nov 26, 2025
* backup

* refact

* add comments

* minor improve

* add ITs

* improve IT

* basic impl for meta exclusion

* impl getIntersection for ExclusionIoTDBTreePattern & add tests for PipeStatementTreePatternParseVisitorTest

* pending at testCommitSetSchemaTemplate

* add IT

* fixup

* apply review

* apply review

* add TreePattern.checkAndLogPatternCoverage

* fixup

* throws PipeException If the inclusion pattern is fully covered by the exclusion pattern

(cherry picked from commit d48347c)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants