Skip to content

Conversation

@przemekwitek
Copy link
Contributor

@przemekwitek przemekwitek commented Jul 21, 2025

-- DRAFT --

This PR extends the BUCKET function by adding the ability to emit empty buckets (i.e. buckets with no data).

Currently, the BUCKET function can work in two modes and can have 2 or 4 arguments (see docs).
With this PR, the 5th parameter (emitEmptyBuckets, name to be finalized) is introduced. The emitEmptyBuckets parameter is of boolean type and has the following semantics:

  • false: empty buckets will not be emitted (this is the current behavior)
  • true: empty buckets will be emitted. In this case, from and to parameters are required, even when the desired bucket size is provided explicitly.
    The default value for emitEmptyBuckets is false for BWC.

Relates #111483

@przemekwitek przemekwitek force-pushed the emit_empty_buckets branch 5 times, most recently from c150e9c to 38b19d3 Compare July 21, 2025 14:58
@cla-checker-service
Copy link

cla-checker-service bot commented Jul 21, 2025

💚 CLA has been signed

Copy link
Contributor

@ivancea ivancea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, it looks quite good!
Mixed some high level comments with minor things. But I would forget later otherwise, so feel free to ignore if this will evolve!

I would like somebody else from the team check it, as they may have ideas on the critical topics here:

  • Doing this in the coord node
  • Finding the Bucket
  • Maybe other validations and things to do

assert toArg.optional();
var emitEmptyBucketsArg = args.get(++i);
assert emitEmptyBucketsArg.optional();
// TODO: Should it be possible to be able to specify "emitEmptyBuckets" but not "from" and "to"?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe as a continuation? After the coordinator reduced all the datanode aggs, we have the min and max in theory.
Yet, I'm not sure if that would be clear to the users, or if there's any use case. Maybe worth a discussion

description = "End of the range. Can be a number, a date or a date expressed as a string."
) Expression to
) Expression to,
@Param(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this should be a MapParam instead. Basically, a named parameter


private TypeResolution checkArgsCount(int expectedCount) {
String expected = null;
if (expectedCount == 2 && (from != null || to != null)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this case removed?

@Override
public void addInput(Page page) {
if (isInitialPage.compareAndSet(true, false)
&& (aggregators.size() == 0 || AggregatorMode.INITIAL.equals(aggregators.get(0).getMode()))) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For later: Would be nice if we could do this on output() on the coordinator, on FINAL aggs only.
I remember the reason to not do this was that Bucket wouldn't be on the coord physical plan, as it's moved to an early eval

Comment on lines +333 to +334
// TODO: DocBlock doesn't allow appending nulls
((DocBlock.Builder) blockBuilders[channel]).appendShard(0).appendSegment(0).appendDoc(0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels dangerous. Adding a comment to not forget it


private Page createInitialPage(Page page) {
// If no groups are generating bucket keys, move on
if (groups.stream().allMatch(g -> g.emptyBucketGenerator() == null)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add an assert on the constructor (probably?) that either all have a generator, or none have. If there's a mix, I guess we'll have a problem

@Override
public void generate(Block.Builder blockBuilder) {
for (double bucket = round(Math.floor(from / roundTo) * roundTo, 2); bucket < to; bucket = round(bucket + roundTo, 2)) {
((DoubleBlock.Builder) blockBuilder).appendDouble(bucket);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move this cast outside the for. Maybe an:

if (blockBuilder instanceof DoubleBlock.Builder doubleBlockBuilder) {
    // Code...
} else {
    throw ...;
}

Same for the other generator.

if (ne.id().equals(bucketId)) {
if (ne.children().size() > 0 && ne.children().get(0) instanceof Bucket bucket) {
foundBucket.set(bucket);
// TODO: Hack used when BUCKET is wrapped with ROUND. How to generalize it?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bucket technically can be used within any expression I think? Is this only for Round?

@arisonl
Copy link

arisonl commented Nov 20, 2025

Is it possible to get an estimate of when this could land? It is needed in order to complete the UX of Log Pattern Analysis in Discover, which is a very popular feature. cc @dimkots

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants