-
Notifications
You must be signed in to change notification settings - Fork 25.7k
Add "emitEmptyBuckets" parameter to the "Bucket" function. #131609
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
c150e9c to
38b19d3
Compare
|
💚 CLA has been signed |
6ef0feb to
3cae3af
Compare
ivancea
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, it looks quite good!
Mixed some high level comments with minor things. But I would forget later otherwise, so feel free to ignore if this will evolve!
I would like somebody else from the team check it, as they may have ideas on the critical topics here:
- Doing this in the coord node
- Finding the Bucket
- Maybe other validations and things to do
| assert toArg.optional(); | ||
| var emitEmptyBucketsArg = args.get(++i); | ||
| assert emitEmptyBucketsArg.optional(); | ||
| // TODO: Should it be possible to be able to specify "emitEmptyBuckets" but not "from" and "to"? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe as a continuation? After the coordinator reduced all the datanode aggs, we have the min and max in theory.
Yet, I'm not sure if that would be clear to the users, or if there's any use case. Maybe worth a discussion
| description = "End of the range. Can be a number, a date or a date expressed as a string." | ||
| ) Expression to | ||
| ) Expression to, | ||
| @Param( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this should be a MapParam instead. Basically, a named parameter
|
|
||
| private TypeResolution checkArgsCount(int expectedCount) { | ||
| String expected = null; | ||
| if (expectedCount == 2 && (from != null || to != null)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was this case removed?
| @Override | ||
| public void addInput(Page page) { | ||
| if (isInitialPage.compareAndSet(true, false) | ||
| && (aggregators.size() == 0 || AggregatorMode.INITIAL.equals(aggregators.get(0).getMode()))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For later: Would be nice if we could do this on output() on the coordinator, on FINAL aggs only.
I remember the reason to not do this was that Bucket wouldn't be on the coord physical plan, as it's moved to an early eval
| // TODO: DocBlock doesn't allow appending nulls | ||
| ((DocBlock.Builder) blockBuilders[channel]).appendShard(0).appendSegment(0).appendDoc(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels dangerous. Adding a comment to not forget it
|
|
||
| private Page createInitialPage(Page page) { | ||
| // If no groups are generating bucket keys, move on | ||
| if (groups.stream().allMatch(g -> g.emptyBucketGenerator() == null)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add an assert on the constructor (probably?) that either all have a generator, or none have. If there's a mix, I guess we'll have a problem
| @Override | ||
| public void generate(Block.Builder blockBuilder) { | ||
| for (double bucket = round(Math.floor(from / roundTo) * roundTo, 2); bucket < to; bucket = round(bucket + roundTo, 2)) { | ||
| ((DoubleBlock.Builder) blockBuilder).appendDouble(bucket); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's move this cast outside the for. Maybe an:
if (blockBuilder instanceof DoubleBlock.Builder doubleBlockBuilder) {
// Code...
} else {
throw ...;
}
Same for the other generator.
| if (ne.id().equals(bucketId)) { | ||
| if (ne.children().size() > 0 && ne.children().get(0) instanceof Bucket bucket) { | ||
| foundBucket.set(bucket); | ||
| // TODO: Hack used when BUCKET is wrapped with ROUND. How to generalize it? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bucket technically can be used within any expression I think? Is this only for Round?
3cae3af to
224d95c
Compare
bdf724b to
9a5d6e3
Compare
9a5d6e3 to
cb98c7c
Compare
|
Is it possible to get an estimate of when this could land? It is needed in order to complete the UX of Log Pattern Analysis in Discover, which is a very popular feature. cc @dimkots |
-- DRAFT --
This PR extends the
BUCKETfunction by adding the ability to emit empty buckets (i.e. buckets with no data).Currently, the
BUCKETfunction can work in two modes and can have 2 or 4 arguments (see docs).With this PR, the 5th parameter (
emitEmptyBuckets, name to be finalized) is introduced. TheemitEmptyBucketsparameter is ofbooleantype and has the following semantics:false: empty buckets will not be emitted (this is the current behavior)true: empty buckets will be emitted. In this case,fromandtoparameters are required, even when the desired bucket size is provided explicitly.The default value for
emitEmptyBucketsisfalsefor BWC.Relates #111483