Skip to content

Conversation

JonasKunz
Copy link
Contributor

@JonasKunz JonasKunz commented Aug 22, 2025

Adds a barebones ES|QL and block type for exponential histograms.

To keep this PR as small as possible I've reduced the type to just storing the scale of a histogram and nothing else.
Everything else will be added in follow-up PRs.

In this PR I've marked the shortcuts / disabled tests that definitely need work before we eventually can put this into tech-preview with TODO(b/133393).
Note that I've also excluded the type from some tests (e.g. TopN) without a TODO(b/133393): These are tests which cover functionality which I think won't be needed, at least for a tech-preview. Please carefully review them and let me know if those cover functionality which should work, so that I can add the TODO(b/133393) there aswell.

Initally this PR also included some CSV-tests. But I decided to remove them for now, as they would require implementing a blockloader, increasing the size of the PR unnecessarily. I'll add them back together with the blockloader in a followup PR.

@github-actions
Copy link
Contributor

github-actions bot commented Sep 5, 2025

ℹ️ Important: Docs version tagging

👋 Thanks for updating the docs! Just a friendly reminder that our docs are now cumulative. This means all 9.x versions are documented on the same page and published off of the main branch, instead of creating separate pages for each minor version.

We use applies_to tags to mark version-specific features and changes.

Expand for a quick overview

When to use applies_to tags:

✅ At the page level to indicate which products/deployments the content applies to (mandatory)
✅ When features change state (e.g. preview, ga) in a specific version
✅ When availability differs across deployments and environments

What NOT to do:

❌ Don't remove or replace information that applies to an older version
❌ Don't add new information that applies to a specific version without an applies_to tag
❌ Don't forget that applies_to tags can be used at the page, section, and inline level

🤔 Need help?

@JonasKunz JonasKunz changed the title PoC: ES|QL block type for exponential histograms ES|QL block type for exponential histograms Sep 5, 2025
@JonasKunz JonasKunz added :StorageEngine/ES|QL Timeseries / metrics / logsdb capabilities in ES|QL >feature labels Sep 5, 2025
@JonasKunz JonasKunz marked this pull request as ready for review September 5, 2025 09:54
@JonasKunz JonasKunz requested a review from kkrik-es September 5, 2025 09:54
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

@kkrik-es kkrik-es requested a review from dnhatn September 5, 2025 11:47
Copy link
Contributor

@kkrik-es kkrik-es left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll leave it to the esql experts to approve.

JonasKunz and others added 2 commits September 5, 2025 14:30
Copy link
Member

@dnhatn dnhatn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at the implementation of ExponentialHistogramFieldMapper. I think the Block/Builder should be similar to AggregateMetricDoubleArrayBlock. The confusing part is that the input/output of this block/builder is a decoded Histogram, but it should be an encoded Histogram in BytesRef. I've left some comments, and I think we can iterate on this quickly. Thanks Jonas!


@Override
public ExponentialHistogramBlockBuilder endPositionEntry() {
encodedHistogramsBuilder.endPositionEntry();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we throw UnsupportedOperationException here - we should not have multi-valued with ExponentialHistogram.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The exponential_histogram does not support multi-values for technical reasons, as the exponential histograms are stored split across multiple doc values and we can't guarantee the order in case of multivalues. There isn't really a use case justifying the effort required to support this.

For ES|QL, my understanding is that blocks are not only used when loading data, but also for intermediate and end results. So while loading a field of type exponential_histogram, can never produces multi-values, why wouldn't we allow users to construct multivalues, e.g. via the VALUES aggregation?

I don't really see much of a use-case for constructing multi-valued exponential histogram columns this way. But is there that much effort behind supporting it? To me it felt like much more effort not supporting it, because a lot of tests construct multivalues, which would need to be adapted then. But my impression might be wrong here.


@Override
public ExponentialHistogramBlockBuilder beginPositionEntry() {
encodedHistogramsBuilder.beginPositionEntry();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we throw UnsupportedOperationException here - we should not have multi-valued with ExponentialHistogram.

this.tempScratch = new BytesRef(new byte[INITIAL_SCRATCH_SIZE], 0, INITIAL_SCRATCH_SIZE);
}

public ExponentialHistogramBlockBuilder append(@Nullable ExponentialHistogram value) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we pass the encoded ExponentialHistogram as a BytesRef here instead? I think this part caused confusion for me. In the compute lifecycle, we load BytesRef from binary doc_values and store them in BytesRef this block. When the actual ExponentialHistogram is needed, we retrieve the BytesRef and decode it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In both the ES|QL block and for storage in doc values we use byte arrays, yes. The problem is here, that we will use them differently due to different things being most important.

So for the elasticsearch field, to my understanding the most important aspect is disk size, followed by decoding speed (for queries) and then least imporant encoding speed (ingestion), as that work is done only once.
Of course we make reasonable tradeoffs here, e.g. we don't do an optimization for 2% disk space savings while taking double the time per document for ingestion.

For the ES|QL blocks, to my understanding the most important things are decoding performance (the block is used as input of a query) and encoding performance (the block is used as output of a query or intermediate result). We care a lot less about size, which in this case is heap consumption.

For this reason, we will be using different encodings of exponential histograms when storing them vs when handling them with ES|QL. That's why I think it is correct to use the ExponentialHistogram interface as abstraction here for moving the data into a block, as it isolates these two similar but yet different implementations from each other.
Another example where this difference is very visible is the zero-threshold: That value is only useful in combination with the histogram buckets, so for the ES|QL block we would just also store it in the byte[]s. For storage however we've extracted it into a separate doc value, to allow for a good compression.

I do agree that it is wasteful for the blockloader to decode the doc values and then encode them again. However, for me this is an optimization to be added later and as a "special case" which is allowed to look through the abstraction for performance improvements:

  • The append(ExponentialHistogram value) still works on any ExponentialHistogram implementation
  • We add a special case in it: We detect if the passed histogram is a CompressedExponentialHistogram. In that case we gain direct access to the bytes[] storing the buckets and copy them as is, saving the decode + encode work.
  • Per row in the block, we remember the encoding and still return ExponentialHistograms on access, decoding using whatever encoding was used for the row. This way, this optimization remains an implementation detail, not leaking into consumers.

return new ExponentialHistogramBlockBuilder(estimatedSize, this);
}

public final ExponentialHistogramBlock newConstantExponentialHistogramBlock(ExponentialHistogram value, int positionCount) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's add the constant later - not sure if we need this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is needed for testing. While we don't allow the construction of exponential_histogram literals via a query (at least for now), tests do construct literals. And to turn those literals into blocks, constant blocks are used.

So if we remove the constant block, we get a lot of test failures like this one:

REPRODUCE WITH: ./gradlew ":x-pack:plugin:esql:test" --tests "org.elasticsearch.xpack.esql.expression.function.scalar.nulls.IsNullTests.testEvaluate {TestCase=non-null exponential_histogram}" -Dtests.seed=E0138310B809341F -Dtests.locale=hi-Deva-IN -Dtests.timezone=Etc/GMT+2 -Druntime.java=25

unsupported element type [EXPONENTIAL_HISTOGRAM]
java.lang.UnsupportedOperationException: unsupported element type [EXPONENTIAL_HISTOGRAM]
	at __randomizedtesting.SeedInfo.seed([E0138310B809341F:CABEDCCC64462444]:0)
	at org.elasticsearch.compute.data.BlockUtils.constantBlock(BlockUtils.java:259)
	at org.elasticsearch.compute.data.BlockUtils.constantBlock(BlockUtils.java:245)
	at org.elasticsearch.compute.data.BlockUtils.fromListRow(BlockUtils.java:107)
	at org.elasticsearch.compute.data.BlockUtils.fromListRow(BlockUtils.java:77)
	at org.elasticsearch.xpack.esql.expression.function.AbstractFunctionTestCase.row(AbstractFunctionTestCase.java:577)
	at org.elasticsearch.xpack.esql.expression.function.AbstractScalarFunctionTestCase.testEvaluate(AbstractScalarFunctionTestCase.java:117)
	at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)

And I don't think we'd want to exclude exponential histogram blocks from IsNullTests for example.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to "properly" implement the constant block. What I did was create constant-blocks for the sub-blocks (e.g. constant block for min, constant block for zero_threshold, etc).
Unfortunately it seems like something is wrong with the serialization of those blocks, as then several tests fail when trying to deserialize the block.

For that reason I reverted to the current, non-optimal solution, as it is only used in tests anyway AFAIK

}

@Override
public ExponentialHistogram getExponentialHistogram(int valueIndex) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we return the encoded histogram and let callers decode and use it - we also need to pass the scratch parameter. We can add a helper method in this class for decoding.

Copy link
Contributor Author

@JonasKunz JonasKunz Oct 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's kind of what I inteded to do - but abstracted away.
The key here is that the ExponentialHistogram intentionally does not allow random access on the buckets. You can only iterate over them.
This allows us to do encoding lazily in the ExponentialHistogram implementation we return here without having to repeat this on every consumer of this block.

So for that reason I don't see the benefit of your suggestion here.

However, what I'm going to implement here is a suggestion already made by @nik9000 :
To minimize allocations (not trusting inlining + scalar replacement), we should provide an "Accessor" similar to e.g. lucene doc value iterators. This accessor can then reuse the returned ExponentialHistogram instance across calls to minimize allocations.

@JonasKunz
Copy link
Contributor Author

JonasKunz commented Oct 21, 2025

@dnhatn: To summarize my understanding of what we discussed in the meeting:

IIUC the problem with the current approach in this PR with the BlockBuilder is that it is effectively impossible to reuse the smart BlockLoaders defined in BlockDocValuesReader. Those effectively peek through the NumericDocValues abstraction to do performance optimizations: For example if a doc value is in fact a constant, they are able to load them as constant blocks.

With my current approach, we'd have to always iterate over each document and do decoding: Not necesarily decoding the histogram buckets, but decoding the doc values from how they are stored in Lucene.

Is that correct?

What confuses me a little, it that IIUC this is not what's happening right now for AggregateMetricDouble: There we iterate values and copy them manually, instead of reusing the existing readers (Code)

So I'd propose to do the following in the follow-up PR, which will also contain the blockloader for exponential histograms:

  • Add the min/max/sum/count/zero_threshold sub-blocks to the ExponentialHistogramBlock. The internal formatting will be exactly how we format things for storage in the ExponentialHistogramFieldMapper
  • Implement the BlockLoader by delegating to the loaders from BlockDocValuesReader. Pass the resulting blocks to a new static ExponentialHistogramBlockBuilder.buildDirect(factory, min, max, sum, count, zeroThreshold, buckets). This method will take the blocks and just wrap them in the ExponentialHistogramBlock
  • ExponentialHistogramBlockBuilder.addExponentialHistogram will be changed to first encode the data into the disk format and we'll add a big TODO here to optimize by using a CPU-optimized encoding instead and adding support for that mode to the block implementation. This method will for now be only used in tests anyway.

So essentially, for the non-RowStrideReader path the ExponentialHistogramBlockBuilder won't be actually instantiated. Instead we load the sub-blocks directly and construct the ExponentialHistogramBlock from them.

We'll see if this works out or if there are other things I stumble across.

Is this the correct approach in your opinion?
Is there anything left todo in this PR (except for fixing the merge conflict)?

@dnhatn
Copy link
Member

dnhatn commented Oct 22, 2025

What confuses me a little, it that IIUC this is not what's happening right now for AggregateMetricDouble: There we iterate values and copy them manually, instead of reusing the existing readers.

I understand - we should reuse the existing DoublesBlockLoader/IntsBlockLoader to read sub-blocks. Similarly, for the histogram block, there will be at least two blocks: one for zeroThreshold and one for the buckets. The buckets block will be loaded using BytesRefsFromCustomBinary, and zeroThreshold can be loaded with LongsBlockLoader. This approach avoids combining these values during reading. When accessing the histogram value, we can retrieve zeroThreshold and decode the encoded histogram BytesRef from the buckets block.

Is there anything left todo in this PR (except for fixing the merge conflict)?

Just to confirm - you prefer to experiment with our discussion in follow-up PRs. If so, I'll need to take another look to make sure we don’t break existing things.

@JonasKunz JonasKunz force-pushed the exp-histo-esql branch 2 times, most recently from 14d8614 to f491a55 Compare October 23, 2025 12:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

external-contributor Pull request authored by a developer outside the Elasticsearch team >non-issue :StorageEngine/ES|QL Timeseries / metrics / logsdb capabilities in ES|QL Team:StorageEngine v9.3.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants