You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/development/extensions-contrib/spectator-histogram.md
+136-6Lines changed: 136 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -79,8 +79,6 @@ Also see the [limitations](#limitations] of this extension.
79
79
* Supports positive long integer values within the range of [0, 2^53). Negatives are
80
80
coerced to 0.
81
81
* Does not support decimals.
82
-
* Does not support Druid SQL queries, only native queries.
83
-
* Does not support vectorized queries.
84
82
* Generates 276 fixed buckets with increasing bucket widths. In practice, the observed error of computed percentiles ranges from 0.1% to 3%, exclusive. See [Bucket boundaries](#histogram-bucket-boundaries) for the full list of bucket boundaries.
85
83
86
84
:::tip
@@ -134,7 +132,11 @@ To use SpectatorHistogram, make sure you [include](../../configuration/extension
| type | This String should always be "countSpectatorHistogram" | yes |
296
+
| name | A String for the output (result) name of the calculation. | yes |
297
+
| field | A field reference pointing to the aggregated histogram. | yes |
298
+
299
+
## SQL Functions
300
+
301
+
In addition to the native query aggregators and post-aggregators, this extension provides SQL functions for easier use in Druid SQL queries.
302
+
303
+
### SPECTATOR_COUNT
304
+
305
+
Returns the total count of observations (data points) in a Spectator histogram.
306
+
307
+
**Syntax:**
308
+
```sql
309
+
SPECTATOR_COUNT(expr)
310
+
```
311
+
312
+
**Arguments:**
313
+
-`expr`: A numeric column to aggregate into a histogram, or a pre-aggregated Spectator histogram column.
314
+
315
+
**Returns:** BIGINT - the total number of observations.
316
+
317
+
**Example:**
318
+
```sql
319
+
SELECT
320
+
SPECTATOR_COUNT(hist_added) AS total_count,
321
+
SPECTATOR_COUNT(added) AS total_count_from_raw
322
+
FROM wikipedia
323
+
```
324
+
325
+
### SPECTATOR_PERCENTILE
326
+
327
+
Computes approximate percentile values from a Spectator histogram. This function supports two forms: a single percentile or multiple percentiles.
328
+
329
+
#### Single Percentile
330
+
331
+
**Syntax:**
332
+
```sql
333
+
SPECTATOR_PERCENTILE(expr, percentile)
334
+
```
335
+
336
+
**Arguments:**
337
+
-`expr`: A numeric column to aggregate into a histogram, or a pre-aggregated Spectator histogram column.
338
+
-`percentile`: A numeric value between 0 and 100 representing the desired percentile.
339
+
340
+
**Returns:** DOUBLE - the approximate value at the specified percentile.
341
+
342
+
**Example:**
343
+
```sql
344
+
SELECT
345
+
SPECTATOR_PERCENTILE(hist_added, 50) AS median_added,
346
+
SPECTATOR_PERCENTILE(hist_added, 99) AS p99_added,
347
+
SPECTATOR_PERCENTILE(added, 95) AS p95_from_raw
348
+
FROM wikipedia
349
+
```
350
+
351
+
#### Multiple Percentiles (Array)
352
+
353
+
**Syntax:**
354
+
```sql
355
+
SPECTATOR_PERCENTILE(expr, ARRAY[p1, p2, ...])
356
+
```
357
+
358
+
**Arguments:**
359
+
-`expr`: A numeric column to aggregate into a histogram, or a pre-aggregated Spectator histogram column.
360
+
-`ARRAY[p1, p2, ...]`: An array of numeric values between 0 and 100 representing the desired percentiles.
361
+
362
+
**Returns:** DOUBLE ARRAY - an array of approximate values at the specified percentiles, in the same order as requested.
363
+
364
+
**Example:**
365
+
```sql
366
+
SELECT
367
+
SPECTATOR_PERCENTILE(hist_added, ARRAY[25, 50, 75, 99]) AS percentiles
368
+
FROM wikipedia
369
+
```
370
+
371
+
This returns an array like `[200.5, 341.0, 468.5, 675.9]` representing the 25th, 50th, 75th, and 99th percentiles.
372
+
373
+
Using the array form is more efficient than calling `SPECTATOR_PERCENTILE` multiple times for different percentiles, as the underlying histogram is only aggregated once.
374
+
375
+
### Combined Example
376
+
377
+
You can use both functions together in a single query. Multiple aggregations on the same column share the underlying histogram aggregator for efficiency:
378
+
379
+
```sql
380
+
SELECT
381
+
countryName,
382
+
SPECTATOR_COUNT(hist_added) AS observation_count,
383
+
SPECTATOR_PERCENTILE(hist_added, 50) AS median_added,
384
+
SPECTATOR_PERCENTILE(hist_added, 90) AS p90_added,
385
+
SPECTATOR_PERCENTILE(hist_added, 99) AS p99_added
386
+
FROM wikipedia
387
+
GROUP BY countryName
388
+
ORDER BY observation_count DESC
389
+
LIMIT10
390
+
```
391
+
392
+
Or using the array form to get multiple percentiles in a single column:
393
+
394
+
```sql
395
+
SELECT
396
+
countryName,
397
+
SPECTATOR_COUNT(hist_added) AS observation_count,
398
+
SPECTATOR_PERCENTILE(hist_added, ARRAY[50, 90, 99]) AS percentiles
Copy file name to clipboardExpand all lines: docs/querying/sql-aggregations.md
+12Lines changed: 12 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -157,3 +157,15 @@ Load the T-Digest extension to use the following functions. See the [T-Digest ex
157
157
|--------|-----|-------|
158
158
|`TDIGEST_QUANTILE(expr, quantileFraction, [compression])`|Builds a T-Digest sketch on values produced by `expr` and returns the value for the quantile. Compression parameter (default value 100) determines the accuracy and size of the sketch. Higher compression means higher accuracy but more space to store sketches.|`Double.NaN`|
159
159
|`TDIGEST_GENERATE_SKETCH(expr, [compression])`|Builds a T-Digest sketch on values produced by `expr`. Compression parameter (default value 100) determines the accuracy and size of the sketch Higher compression means higher accuracy but more space to store sketches.|Empty base64 encoded T-Digest sketch STRING|
160
+
161
+
## Histogram functions
162
+
163
+
### Spectator Histogram
164
+
165
+
Load the [Spectator Histogram extension](../development/extensions-contrib/spectator-histogram.md) to use the following functions.
166
+
167
+
|Function|Notes|Default|
168
+
|--------|-----|-------|
169
+
|`SPECTATOR_COUNT(expr)`|Counts the total number of observations (data points) in a Spectator histogram. The `expr` can be either a numeric column (which will be aggregated into a histogram) or a pre-aggregated [Spectator histogram](../development/extensions-contrib/spectator-histogram.md) column.|`0`|
170
+
|`SPECTATOR_PERCENTILE(expr, percentile)`|Computes an approximate percentile value from a Spectator histogram. The `expr` can be either a numeric column (which will be aggregated into a histogram) or a pre-aggregated [Spectator histogram](../development/extensions-contrib/spectator-histogram.md) column. The `percentile` should be between 0 and 100.|`NaN`|
171
+
|`SPECTATOR_PERCENTILE(expr, ARRAY[p1, p2, ...])`|Computes multiple approximate percentile values from a Spectator histogram and returns them as a DOUBLE ARRAY. The `expr` can be either a numeric column (which will be aggregated into a histogram) or a pre-aggregated [Spectator histogram](../development/extensions-contrib/spectator-histogram.md) column. Each percentile value in the array should be between 0 and 100. This is more efficient than calling `SPECTATOR_PERCENTILE` multiple times for different percentiles.|`null`|
0 commit comments