Skip to content

Conversation

@dnhatn
Copy link
Member

@dnhatn dnhatn commented Aug 10, 2025

Kahan summation can be expensive, and for time-series aggregation, a lossy summation can be a good trade-off for performance. This change introduces a lossy summation mode and makes it the default for time-series aggregations. These two summation modes for sum and avg are used internally and are not exposed to users.

@dnhatn dnhatn force-pushed the sum-aggregator branch 3 times, most recently from beb2b80 to 7c0d9f6 Compare August 10, 2025 23:57
@dnhatn dnhatn changed the title Add lossy sum for time-series aggregations Use lossy summation for time-series aggregations Aug 11, 2025
@dnhatn
Copy link
Member Author

dnhatn commented Aug 11, 2025

{
	"operator": "TimeSeriesAggregationOperator[blockHash=BytesRefLongBlockHash{keys=[BytesRefKey[channel=3], LongKey[channel=2]], entries=546, size=56368b}, aggregators=[GroupingAggregator[aggregatorFunction=SumDoubleGroupingAggregatorFunction[channels=[4]], mode=INITIAL], GroupingAggregator[aggregatorFunction=CountGroupingAggregatorFunction[channels=[4]], mode=INITIAL], GroupingAggregator[aggregatorFunction=ValuesBytesRefGroupingAggregatorFunction[channels=[5]], mode=INITIAL]]]",
	"status": {
	    "hash_nanos": 2949462,
	    "aggregation_nanos": 22679014, // <- 22ms
	    "pages_processed": 546,
	    "rows_received": 982982,
	    "rows_emitted": 546,
	    "emit_nanos": 121951
	}
}
{
    "operator": "TimeSeriesAggregationOperator[blockHash=BytesRefLongBlockHash{keys=[BytesRefKey[channel=3], LongKey[channel=2]], entries=546, size=56368b}, aggregators=[GroupingAggregator[aggregatorFunction=LossySumDoubleGroupingAggregatorFunction[channels=[4]], mode=INITIAL], GroupingAggregator[aggregatorFunction=CountGroupingAggregatorFunction[channels=[4]], mode=INITIAL], GroupingAggregator[aggregatorFunction=ValuesBytesRefGroupingAggregatorFunction[channels=[5]], mode=INITIAL]]]",
    "status": {
        "hash_nanos": 2770991,
        "aggregation_nanos": 15664657, // <- 15ms
        "pages_processed": 546,
        "rows_received": 982982,
        "rows_emitted": 546,
        "emit_nanos": 72400
    }
}

Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Nhat, this is a great speed up for the TS command!
I left a question, but other than that this looks good to me.

) Expression field
) {
this(source, field, Literal.TRUE);
this(source, field, Literal.TRUE, SummationMode.COMPENSATED_LITERAL);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we consider also using LOSSY_LITERAL for avg the function is used in the context of TS source command? If so, then maybe we can do that in a followup?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++. I will do it in a follow-up, as these may require bigger changes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense!

)
public Sum(Source source, @Param(name = "number", type = { "aggregate_metric_double", "double", "integer", "long" }) Expression field) {
this(source, field, Literal.TRUE);
this(source, field, Literal.TRUE, SummationMode.COMPENSATED_LITERAL);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question as for the avg function.

@dnhatn dnhatn requested a review from nik9000 August 11, 2025 03:50
@dnhatn dnhatn marked this pull request as ready for review August 11, 2025 03:50
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elasticsearchmachine elasticsearchmachine added Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Team:StorageEngine labels Aug 11, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

@dnhatn dnhatn requested review from kkrik-es and martijnvg August 11, 2025 17:01

public static void combineIntermediate(SumState state, double inValue, double zeroDelta, boolean seen) {
assert zeroDelta == 0.0 : zeroDelta;
if (seen) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be seen == false ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, seen indicates that the input value is valid.

private static Avg readFrom(StreamInput in) throws IOException {
Source source = Source.readFrom((PlanStreamInput) in);
Expression field = in.readNamedWriteable(Expression.class);
Expression filter = in.getTransportVersion().onOrAfter(TransportVersions.V_8_16_0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we use 8.16 here? Maybe add a comment?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was copied from the super class. I took a new approach in 8ec481f

Copy link
Contributor

@kkrik-es kkrik-es left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice.

@dnhatn dnhatn requested a review from limotova August 11, 2025 23:54
@dnhatn dnhatn removed the request for review from martijnvg August 11, 2025 23:54
Copy link
Contributor

@limotova limotova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@dnhatn
Copy link
Member Author

dnhatn commented Aug 12, 2025

Thanks friends!

@dnhatn dnhatn merged commit ad4831c into elastic:main Aug 12, 2025
33 checks passed
@dnhatn dnhatn deleted the sum-aggregator branch August 12, 2025 23:25
phananh1010 added a commit to phananh1010/elasticsearch that referenced this pull request Sep 23, 2025
BASE=30024e14edd361c4b9af134d6ddfc04ab1a061bc
HEAD=8ec481f7a2bbe403cea5e2e52bb9985aa3473ac2
Branch=main
phananh1010 added a commit to phananh1010/elasticsearch that referenced this pull request Oct 2, 2025
BASE=30024e14edd361c4b9af134d6ddfc04ab1a061bc
HEAD=8ec481f7a2bbe403cea5e2e52bb9985aa3473ac2
Branch=main
phananh1010 added a commit to phananh1010/elasticsearch that referenced this pull request Oct 7, 2025
BASE=30024e14edd361c4b9af134d6ddfc04ab1a061bc
HEAD=8ec481f7a2bbe403cea5e2e52bb9985aa3473ac2
Branch=main
phananh1010 added a commit to phananh1010/elasticsearch that referenced this pull request Oct 17, 2025
BASE=30024e14edd361c4b9af134d6ddfc04ab1a061bc
HEAD=8ec481f7a2bbe403cea5e2e52bb9985aa3473ac2
Branch=main
phananh1010 added a commit to phananh1010/elasticsearch that referenced this pull request Oct 23, 2025
BASE=30024e14edd361c4b9af134d6ddfc04ab1a061bc
HEAD=8ec481f7a2bbe403cea5e2e52bb9985aa3473ac2
Branch=main
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL >non-issue :StorageEngine/TSDB You know, for Metrics Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Team:StorageEngine v9.2.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants