Skip to content

Commit cccacaf

Browse files
committed
Add Spectator Histogram SQL functions
1 parent ccfcb1c commit cccacaf

File tree

11 files changed

+1352
-7
lines changed

11 files changed

+1352
-7
lines changed

docs/development/extensions-contrib/spectator-histogram.md

Lines changed: 136 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -79,8 +79,6 @@ Also see the [limitations](#limitations] of this extension.
7979
* Supports positive long integer values within the range of [0, 2^53). Negatives are
8080
coerced to 0.
8181
* Does not support decimals.
82-
* Does not support Druid SQL queries, only native queries.
83-
* Does not support vectorized queries.
8482
* Generates 276 fixed buckets with increasing bucket widths. In practice, the observed error of computed percentiles ranges from 0.1% to 3%, exclusive. See [Bucket boundaries](#histogram-bucket-boundaries) for the full list of bucket boundaries.
8583

8684
:::tip
@@ -134,7 +132,11 @@ To use SpectatorHistogram, make sure you [include](../../configuration/extension
134132
druid.extensions.loadList=["druid-spectator-histogram"]
135133
```
136134

137-
## Aggregators
135+
## Native Query Components
136+
137+
The following sections describe the aggregators and post-aggregators for use with [native Druid queries](../../querying/querying.md).
138+
139+
### Aggregators
138140

139141
The result of the aggregation is a histogram that is built by ingesting numeric values from
140142
the raw data, or from combining pre-aggregated histograms. The result is represented in
@@ -207,9 +209,9 @@ To get the population size (count of events contributing to the histogram):
207209
| name | A String for the output (result) name of the aggregation. | yes |
208210
| fieldName | A String for the name of the input field containing pre-aggregated histograms. | yes |
209211

210-
## Post Aggregators
212+
### Post Aggregators
211213

212-
### Percentile (singular)
214+
#### Percentile (singular)
213215
This returns a single percentile calculation based on the distribution of the values in the aggregated histogram.
214216

215217
```
@@ -231,7 +233,7 @@ This returns a single percentile calculation based on the distribution of the va
231233
| field | A field reference pointing to the aggregated histogram. | yes |
232234
| percentile | A single decimal percentile between 0.0 and 100.0 | yes |
233235

234-
### Percentiles (multiple)
236+
#### Percentiles (multiple)
235237
This returns an array of percentiles corresponding to those requested.
236238

237239
```
@@ -272,6 +274,134 @@ array of percentiles.
272274
| field | A field reference pointing to the aggregated histogram. | yes |
273275
| percentiles | Non-empty array of decimal percentiles between 0.0 and 100.0 | yes |
274276

277+
#### Count Post-Aggregator
278+
279+
This returns the total count of observations (data points) that were recorded in the histogram.
280+
This is useful for understanding the population size without needing a separate count metric.
281+
282+
```json
283+
{
284+
"type": "countSpectatorHistogram",
285+
"name": "<output name>",
286+
"field": {
287+
"type": "fieldAccess",
288+
"fieldName": "<name of aggregated SpectatorHistogram>"
289+
}
290+
}
291+
```
292+
293+
| Property | Description | Required? |
294+
|----------|------------------------------------------------------------|-----------|
295+
| type | This String should always be "countSpectatorHistogram" | yes |
296+
| name | A String for the output (result) name of the calculation. | yes |
297+
| field | A field reference pointing to the aggregated histogram. | yes |
298+
299+
## SQL Functions
300+
301+
In addition to the native query aggregators and post-aggregators, this extension provides SQL functions for easier use in Druid SQL queries.
302+
303+
### SPECTATOR_COUNT
304+
305+
Returns the total count of observations (data points) in a Spectator histogram.
306+
307+
**Syntax:**
308+
```sql
309+
SPECTATOR_COUNT(expr)
310+
```
311+
312+
**Arguments:**
313+
- `expr`: A numeric column to aggregate into a histogram, or a pre-aggregated Spectator histogram column.
314+
315+
**Returns:** BIGINT - the total number of observations.
316+
317+
**Example:**
318+
```sql
319+
SELECT
320+
SPECTATOR_COUNT(hist_added) AS total_count,
321+
SPECTATOR_COUNT(added) AS total_count_from_raw
322+
FROM wikipedia
323+
```
324+
325+
### SPECTATOR_PERCENTILE
326+
327+
Computes approximate percentile values from a Spectator histogram. This function supports two forms: a single percentile or multiple percentiles.
328+
329+
#### Single Percentile
330+
331+
**Syntax:**
332+
```sql
333+
SPECTATOR_PERCENTILE(expr, percentile)
334+
```
335+
336+
**Arguments:**
337+
- `expr`: A numeric column to aggregate into a histogram, or a pre-aggregated Spectator histogram column.
338+
- `percentile`: A numeric value between 0 and 100 representing the desired percentile.
339+
340+
**Returns:** DOUBLE - the approximate value at the specified percentile.
341+
342+
**Example:**
343+
```sql
344+
SELECT
345+
SPECTATOR_PERCENTILE(hist_added, 50) AS median_added,
346+
SPECTATOR_PERCENTILE(hist_added, 99) AS p99_added,
347+
SPECTATOR_PERCENTILE(added, 95) AS p95_from_raw
348+
FROM wikipedia
349+
```
350+
351+
#### Multiple Percentiles (Array)
352+
353+
**Syntax:**
354+
```sql
355+
SPECTATOR_PERCENTILE(expr, ARRAY[p1, p2, ...])
356+
```
357+
358+
**Arguments:**
359+
- `expr`: A numeric column to aggregate into a histogram, or a pre-aggregated Spectator histogram column.
360+
- `ARRAY[p1, p2, ...]`: An array of numeric values between 0 and 100 representing the desired percentiles.
361+
362+
**Returns:** DOUBLE ARRAY - an array of approximate values at the specified percentiles, in the same order as requested.
363+
364+
**Example:**
365+
```sql
366+
SELECT
367+
SPECTATOR_PERCENTILE(hist_added, ARRAY[25, 50, 75, 99]) AS percentiles
368+
FROM wikipedia
369+
```
370+
371+
This returns an array like `[200.5, 341.0, 468.5, 675.9]` representing the 25th, 50th, 75th, and 99th percentiles.
372+
373+
Using the array form is more efficient than calling `SPECTATOR_PERCENTILE` multiple times for different percentiles, as the underlying histogram is only aggregated once.
374+
375+
### Combined Example
376+
377+
You can use both functions together in a single query. Multiple aggregations on the same column share the underlying histogram aggregator for efficiency:
378+
379+
```sql
380+
SELECT
381+
countryName,
382+
SPECTATOR_COUNT(hist_added) AS observation_count,
383+
SPECTATOR_PERCENTILE(hist_added, 50) AS median_added,
384+
SPECTATOR_PERCENTILE(hist_added, 90) AS p90_added,
385+
SPECTATOR_PERCENTILE(hist_added, 99) AS p99_added
386+
FROM wikipedia
387+
GROUP BY countryName
388+
ORDER BY observation_count DESC
389+
LIMIT 10
390+
```
391+
392+
Or using the array form to get multiple percentiles in a single column:
393+
394+
```sql
395+
SELECT
396+
countryName,
397+
SPECTATOR_COUNT(hist_added) AS observation_count,
398+
SPECTATOR_PERCENTILE(hist_added, ARRAY[50, 90, 99]) AS percentiles
399+
FROM wikipedia
400+
GROUP BY countryName
401+
ORDER BY observation_count DESC
402+
LIMIT 10
403+
```
404+
275405
## Examples
276406

277407
### Example Ingestion Spec

docs/querying/sql-aggregations.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -157,3 +157,15 @@ Load the T-Digest extension to use the following functions. See the [T-Digest ex
157157
|--------|-----|-------|
158158
|`TDIGEST_QUANTILE(expr, quantileFraction, [compression])`|Builds a T-Digest sketch on values produced by `expr` and returns the value for the quantile. Compression parameter (default value 100) determines the accuracy and size of the sketch. Higher compression means higher accuracy but more space to store sketches.|`Double.NaN`|
159159
|`TDIGEST_GENERATE_SKETCH(expr, [compression])`|Builds a T-Digest sketch on values produced by `expr`. Compression parameter (default value 100) determines the accuracy and size of the sketch Higher compression means higher accuracy but more space to store sketches.|Empty base64 encoded T-Digest sketch STRING|
160+
161+
## Histogram functions
162+
163+
### Spectator Histogram
164+
165+
Load the [Spectator Histogram extension](../development/extensions-contrib/spectator-histogram.md) to use the following functions.
166+
167+
|Function|Notes|Default|
168+
|--------|-----|-------|
169+
|`SPECTATOR_COUNT(expr)`|Counts the total number of observations (data points) in a Spectator histogram. The `expr` can be either a numeric column (which will be aggregated into a histogram) or a pre-aggregated [Spectator histogram](../development/extensions-contrib/spectator-histogram.md) column.|`0`|
170+
|`SPECTATOR_PERCENTILE(expr, percentile)`|Computes an approximate percentile value from a Spectator histogram. The `expr` can be either a numeric column (which will be aggregated into a histogram) or a pre-aggregated [Spectator histogram](../development/extensions-contrib/spectator-histogram.md) column. The `percentile` should be between 0 and 100.|`NaN`|
171+
|`SPECTATOR_PERCENTILE(expr, ARRAY[p1, p2, ...])`|Computes multiple approximate percentile values from a Spectator histogram and returns them as a DOUBLE ARRAY. The `expr` can be either a numeric column (which will be aggregated into a histogram) or a pre-aggregated [Spectator histogram](../development/extensions-contrib/spectator-histogram.md) column. Each percentile value in the array should be between 0 and 100. This is more efficient than calling `SPECTATOR_PERCENTILE` multiple times for different percentiles.|`null`|

extensions-contrib/spectator-histogram/pom.xml

Lines changed: 31 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@
3838
<dependency>
3939
<groupId>com.netflix.spectator</groupId>
4040
<artifactId>spectator-api</artifactId>
41-
<version>1.7.0</version>
41+
<version>1.9.0</version>
4242
</dependency>
4343
<dependency>
4444
<groupId>com.google.guava</groupId>
@@ -120,6 +120,36 @@
120120
<artifactId>junit</artifactId>
121121
<scope>test</scope>
122122
</dependency>
123+
<dependency>
124+
<groupId>org.junit.jupiter</groupId>
125+
<artifactId>junit-jupiter-api</artifactId>
126+
<scope>test</scope>
127+
</dependency>
128+
<dependency>
129+
<groupId>org.junit.jupiter</groupId>
130+
<artifactId>junit-jupiter-engine</artifactId>
131+
<scope>test</scope>
132+
</dependency>
133+
<dependency>
134+
<groupId>org.junit.jupiter</groupId>
135+
<artifactId>junit-jupiter-migrationsupport</artifactId>
136+
<scope>test</scope>
137+
</dependency>
138+
<dependency>
139+
<groupId>org.junit.jupiter</groupId>
140+
<artifactId>junit-jupiter-params</artifactId>
141+
<scope>test</scope>
142+
</dependency>
143+
<dependency>
144+
<groupId>org.junit.vintage</groupId>
145+
<artifactId>junit-vintage-engine</artifactId>
146+
<scope>test</scope>
147+
</dependency>
148+
<dependency>
149+
<groupId>org.reflections</groupId>
150+
<artifactId>reflections</artifactId>
151+
<scope>test</scope>
152+
</dependency>
123153
<dependency>
124154
<groupId>org.apache.druid</groupId>
125155
<artifactId>druid-processing</artifactId>
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,143 @@
1+
/*
2+
* Licensed to the Apache Software Foundation (ASF) under one
3+
* or more contributor license agreements. See the NOTICE file
4+
* distributed with this work for additional information
5+
* regarding copyright ownership. The ASF licenses this file
6+
* to you under the Apache License, Version 2.0 (the
7+
* "License"); you may not use this file except in compliance
8+
* with the License. You may obtain a copy of the License at
9+
*
10+
* http://www.apache.org/licenses/LICENSE-2.0
11+
*
12+
* Unless required by applicable law or agreed to in writing,
13+
* software distributed under the License is distributed on an
14+
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
15+
* KIND, either express or implied. See the License for the
16+
* specific language governing permissions and limitations
17+
* under the License.
18+
*/
19+
20+
package org.apache.druid.spectator.histogram;
21+
22+
import com.fasterxml.jackson.annotation.JsonCreator;
23+
import com.fasterxml.jackson.annotation.JsonProperty;
24+
import com.google.common.base.Preconditions;
25+
import com.google.common.primitives.Longs;
26+
import org.apache.druid.query.aggregation.AggregatorFactory;
27+
import org.apache.druid.query.aggregation.PostAggregator;
28+
import org.apache.druid.query.aggregation.post.PostAggregatorIds;
29+
import org.apache.druid.query.cache.CacheKeyBuilder;
30+
import org.apache.druid.segment.ColumnInspector;
31+
import org.apache.druid.segment.column.ColumnType;
32+
33+
import java.util.Comparator;
34+
import java.util.Map;
35+
import java.util.Objects;
36+
import java.util.Set;
37+
38+
/**
39+
* Post-aggregator that returns the total count of observations in a SpectatorHistogram.
40+
* This is the sum of all bucket counts.
41+
*/
42+
public class SpectatorHistogramCountPostAggregator implements PostAggregator
43+
{
44+
private final String name;
45+
private final PostAggregator field;
46+
47+
public static final String TYPE_NAME = "countSpectatorHistogram";
48+
49+
@JsonCreator
50+
public SpectatorHistogramCountPostAggregator(
51+
@JsonProperty("name") final String name,
52+
@JsonProperty("field") final PostAggregator field
53+
)
54+
{
55+
this.name = Preconditions.checkNotNull(name, "name is null");
56+
this.field = Preconditions.checkNotNull(field, "field is null");
57+
}
58+
59+
@Override
60+
@JsonProperty
61+
public String getName()
62+
{
63+
return name;
64+
}
65+
66+
@Override
67+
public ColumnType getType(ColumnInspector signature)
68+
{
69+
return ColumnType.LONG;
70+
}
71+
72+
@JsonProperty
73+
public PostAggregator getField()
74+
{
75+
return field;
76+
}
77+
78+
@Override
79+
public Object compute(final Map<String, Object> combinedAggregators)
80+
{
81+
final SpectatorHistogram sketch = (SpectatorHistogram) field.compute(combinedAggregators);
82+
if (sketch == null) {
83+
return null;
84+
}
85+
return sketch.getSum();
86+
}
87+
88+
@Override
89+
public Comparator<Long> getComparator()
90+
{
91+
return Longs::compare;
92+
}
93+
94+
@Override
95+
public Set<String> getDependentFields()
96+
{
97+
return field.getDependentFields();
98+
}
99+
100+
@Override
101+
public String toString()
102+
{
103+
return getClass().getSimpleName() + "{" +
104+
"name='" + name + '\'' +
105+
", field=" + field +
106+
"}";
107+
}
108+
109+
@Override
110+
public byte[] getCacheKey()
111+
{
112+
return new CacheKeyBuilder(
113+
PostAggregatorIds.SPECTATOR_HISTOGRAM_SKETCH_COUNT_CACHE_TYPE_ID)
114+
.appendCacheable(field)
115+
.build();
116+
}
117+
118+
@Override
119+
public boolean equals(Object o)
120+
{
121+
if (this == o) {
122+
return true;
123+
}
124+
if (o == null || getClass() != o.getClass()) {
125+
return false;
126+
}
127+
SpectatorHistogramCountPostAggregator that = (SpectatorHistogramCountPostAggregator) o;
128+
return Objects.equals(name, that.name) &&
129+
Objects.equals(field, that.field);
130+
}
131+
132+
@Override
133+
public int hashCode()
134+
{
135+
return Objects.hash(name, field);
136+
}
137+
138+
@Override
139+
public PostAggregator decorate(final Map<String, AggregatorFactory> map)
140+
{
141+
return this;
142+
}
143+
}

0 commit comments

Comments
 (0)