Skip to content

Conversation

dnhatn
Copy link
Member

@dnhatn dnhatn commented Aug 5, 2025

This change introduces several optimizations for constant blocks:

  1. When reading keyword fields that are index-sorted and the query range is large enough, we can return a constant block of BytesRef values to enable downstream optimizations.

  2. Enable shortcuts for BytesRefBlockHash and BytesRefLongBlockHash when handling constant blocks. These optimizations are quick wins for time-series aggregations.

  3. Enable shortcuts for CountAggregator and ValuesAggregator for constant blocks.

@dnhatn
Copy link
Member Author

dnhatn commented Aug 5, 2025

Here is the benchmark with this change:

Before:

{
    "operator": "TimeSeriesAggregationOperator[blockHash=BytesRefLongBlockHash{keys=[BytesRefKey[channel=3], LongKey[channel=2]], entries=546, size=56368b}, aggregators=[GroupingAggregator[aggregatorFunction=SumDoubleGroupingAggregatorFunction[channels=[4]], mode=INITIAL], GroupingAggregator[aggregatorFunction=CountGroupingAggregatorFunction[channels=[4]], mode=INITIAL], GroupingAggregator[aggregatorFunction=ValuesBytesRefGroupingAggregatorFunction[channels=[5]], mode=INITIAL]]]",
    "status": {
        "hash_nanos": 15198487, <-15ms
        "aggregation_nanos": 29086469, <- 29ms
        "pages_processed": 546,
        "rows_received": 982982,
        "rows_emitted": 546,
        "emit_nanos": 148580
    }
}

After:

{
    "operator": "TimeSeriesAggregationOperator[blockHash=BytesRefLongBlockHash{keys=[BytesRefKey[channel=3], LongKey[channel=2]], entries=546, size=56368b}, aggregators=[GroupingAggregator[aggregatorFunction=SumDoubleGroupingAggregatorFunction[channels=[4]], mode=INITIAL], GroupingAggregator[aggregatorFunction=CountGroupingAggregatorFunction[channels=[4]], mode=INITIAL], GroupingAggregator[aggregatorFunction=ValuesBytesRefGroupingAggregatorFunction[channels=[5]], mode=INITIAL]]]",
    "status": {
        "hash_nanos": 3291481, <- 3ms
        "aggregation_nanos": 22296535, <- 22ms
        "pages_processed": 546,
        "rows_received": 982982,
        "rows_emitted": 546,
        "emit_nanos": 130129
    }
}

try {
bytes = blockFactory.newConstantBytesRefVector(v, 1);
ordinals = blockFactory.newConstantIntVector(0, ords.length);
final var result = new OrdinalBytesRefBlock(ordinals.asBlock(), bytes);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, we choose to return an ordinal constant block instead of a direct constant block to ensure that ordinal optimizations are applied if the constant optimization is not available.

@elasticsearchmachine
Copy link
Collaborator

Hi @dnhatn, I've created a changelog YAML for you.

@martijnvg
Copy link
Member

This speedup is amazing!

Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good! I left two comments.

@dnhatn dnhatn requested a review from martijnvg August 7, 2025 05:49
@dnhatn dnhatn marked this pull request as ready for review August 7, 2025 05:49
@dnhatn dnhatn requested a review from nik9000 August 7, 2025 05:50
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Aug 7, 2025
Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

try {
bytes = blockFactory.newConstantBytesRefVector(v, 1);
ordinals = blockFactory.newConstantIntVector(0, ords.length);
final var result = new OrdinalBytesRefBlock(ordinals.asBlock(), bytes);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a ConstantBytesRefBlock?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it should be but here we need to return an ordinal constant block instead of a direct constant block to ensure that ordinal optimizations are applied if the constant optimization is not available: #132456 (comment)

@dnhatn dnhatn requested a review from nik9000 August 7, 2025 18:29
Copy link
Member

@nik9000 nik9000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. It's worth adding a comment about why that's not a ConstantBlock I think.

@dnhatn
Copy link
Member Author

dnhatn commented Aug 7, 2025

It's worth adding a comment about why that's not a ConstantBlock I think.

++, I added a comment in 276759a.

@dnhatn
Copy link
Member Author

dnhatn commented Aug 7, 2025

@martijnvg @nik9000 Thank you!

@dnhatn dnhatn merged commit 6196ef5 into elastic:main Aug 7, 2025
32 checks passed
@dnhatn dnhatn deleted the constant-blocks branch August 7, 2025 23:22
@dnhatn dnhatn mentioned this pull request Aug 8, 2025
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v9.2.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants