Skip to content

Conversation

normen662
Copy link
Contributor

@normen662 normen662 commented Mar 21, 2025

This PR adds the ability to plan arbitrary aggregation queries using aggregation indexes as well as intersections of aggregation indexes. It also corrects/adds the infrastructure to deal with complicated order-by requirements.

  • refactor data access rules into a data access rule for value indexes and one for aggregation indexes
  • have both of these data access rule implementations share the maximum code (only intersection planning is different)
  • intersection planning for aggregation indexes
    • because of the same reasoning as regular index intersections AND in order to answer the query as these indexes carry data that is not part of the underlying table (the aggregation)
    • rollups of finer-granularity index scans if such a roll-up can be useful in answering the query
  • order by requirements are treated in the same framework as all other kinds of expressions (using M3 maps, translations, etc.)

@normen662 normen662 force-pushed the composite-aggregates branch from a25ae95 to 4c9ec49 Compare March 24, 2025 08:24
@normen662 normen662 force-pushed the composite-aggregates branch 2 times, most recently from 72f7a46 to 2f5e9b8 Compare April 2, 2025 19:26
@normen662 normen662 force-pushed the composite-aggregates branch from 8f97639 to 55cd005 Compare April 10, 2025 12:47
@normen662 normen662 requested review from hatyo, alecgrieser and MMcM April 15, 2025 08:20
@normen662 normen662 added the enhancement New feature or request label Apr 15, 2025
@normen662 normen662 force-pushed the composite-aggregates branch 2 times, most recently from adec3d2 to 91ed1cf Compare May 27, 2025 11:02
@normen662 normen662 force-pushed the composite-aggregates branch from 91ed1cf to 6fa2c04 Compare July 9, 2025 16:48
@normen662 normen662 force-pushed the composite-aggregates branch from 6fa2c04 to 0aeab9c Compare July 9, 2025 16:52
@hatyo hatyo self-requested a review July 31, 2025 13:16
@@ -44,6 +44,10 @@ static <T> EnumeratingIterable<T> singleIterable(@Nonnull final T singleElement)
return new SingleIterable<>(singleElement);
}

static <T> EnumeratingIterable<T> emptyOnEmptyIterable() {
return new SingleIterable<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the reason for using SingleIterator (with null argument) instead of EmptyIterator is related to returning empty list, compared to returning null in EmptyIterator when calling computeNext, is that so, why not just use EmptyIterator instead and let the customer deal with the null?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EmptyIterator does not return any item. It is endOfData() immediately. This one does return the empty list. Some combinatorics algorithms want nothing on empty while some others like the topological sort want an empty result on empty input. We thus need both I think.

- unorderedResult: [{BITMAP: xStartsWith_1250'0200004', 'CATEGORY': 'hello', 'OFFSET':0},
{BITMAP: xStartsWith_1250'02', 'CATEGORY': 'hello', 'OFFSET':10000},
{BITMAP: xStartsWith_1250'0400008', 'CATEGORY': 'world', 'OFFSET':0}]
#-
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are these tests commented out?

return IntersectionResult.noViableIntersection();
}

final var compensation =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we can immediately return IntersectionResult.noViableIntersection() if the resulting intersection compensation is impossible, correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. It may be that we are currently impossible but that there is a common ordering (in fact there must be one for the follow up to work). If there is no common ordering the result will be discarded. If there is a common ordering but it is currently impossible to (think 2 out of 3 aggregate functions are satisfied), we want that intersection access to survive because otherwise we would never get to the intersection that creates the one where we get 3 out of 3.

}
isCompensationImpossible |= resultCompensationFunction.isImpossible();

groupByMappings =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to add some comments explaining the handling of matched group by aggregations and groups here.

}
}

return IntersectionResult.of(hasCommonOrdering ? intersectionOrdering : null, compensation,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it is possible to retrieve an IntersectionResult which non-empty list of RelationalExpressions and null-common ordering, if we examine the code flow above.

Having said that, we could probably cleanup the code, such that we can remove the hasCommonFlag maybe?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, You can have a common ordering but no expressions due to the compensation being impossible. You still want to signal the caller to keep this intersection (even though you didn't produce a single expression) in the sieve so you can potentially find a bigger intersection that is not impossible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants