You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -197,11 +199,48 @@ The following versions of aggregates are supported:
197
199
-`count(distinct col)` - counts distinct values of the given column.
198
200
-`sum(col)` - sums values in the given column.
199
201
-`avg(col)` - calculates the average of the given column.
202
+
-`diffix.count_histogram(aid, bin_size=1)` - computes a histogram that describes the distribution of rows among entities.
203
+
See [below](#diffixcount_histogramaid-bin_size) for details.
200
204
201
205
Results of these aggregates are anonymized by applying noise as described in the specification.
202
206
203
-
Each of the `count(...)`, `sum(...)`, `avg(...)` has an accompanying aggregate, which returns the approximate magnitude of noise added during anonymization (in terms of its standard deviation).
204
-
These are: `diffix.count_noise(...)`, `diffix.sum_noise(...)`, `diffix.avg_noise(...)` respectively.
207
+
Each of the `count(...)`, `sum(...)`, `avg(...)` has an accompanying aggregate,
208
+
which returns the approximate magnitude of noise added during anonymization (in terms of its standard deviation).
209
+
These are: `diffix.count_noise(...)`, `diffix.sum_noise(...)`, `diffix.avg_noise(...)`, respectively.
210
+
211
+
### diffix.count_histogram(aid, bin_size)
212
+
213
+
Returns a 2-dimensional array of shape `bigint[][2]`, where each entry is a pair of `[row_count, num_entities]`.
214
+
The `row_count` represents the number of rows contributed by `num_entities` distinct protected entities.
215
+
216
+
**Example:**
217
+
218
+
```
219
+
SELECT diffix.count_histogram(account)
220
+
FROM transactions;
221
+
222
+
count_histogram
223
+
--------------------------------
224
+
{{NULL,7},{1,15},{2,13},{4,6}}
225
+
(1 row)
226
+
```
227
+
228
+
The result of the above query can be interpreted as:
229
+
15 accounts have made a single transaction (1 row in result bucket), 13 accounts have made 2 transactions (2 rows),
230
+
6 accounts have made 4 transactions, and 7 accounts have made some other number of transactions (identified by the `NULL` count).
231
+
232
+
The reported `num_entities` is a noisy value, but not the `row_count` itself. Bins with insufficient `num_entities` are merged to
233
+
a suppress bin of shape `{NULL, num_entities}` where `num_entities` is also noisy. The suppress bin may itself be suppressed.
234
+
235
+
The optional `bin_size` parameter allows generalizing the bins' `row_count` to minimize suppression.
236
+
It acts identically to the `diffix.floor_by()` function.
237
+
238
+
The histogram array can be unwrapped to a set of pairs by using [diffix.unnest_histogram()](#diffixunnest_histogramhistogram).
239
+
240
+
**Restrictions:** The `aid` parameter must be a reference to a column tagged as an AID (identifier of a protected entity).
241
+
242
+
In untrusted mode, `bin_size` is restricted to a money style number:
243
+
1, 2, or 5 preceeded by or followed by zeros ⟨... 0.1, 0.2, 0.5, 1, 2, 5, 10, ...⟩.
205
244
206
245
## Numeric generalization functions
207
246
@@ -265,3 +304,22 @@ GROUP BY 1
265
304
### diffix.is_suppress_bin(*)
266
305
267
306
Aggregate that returns `true` only for the suppress bin, `false` otherwise.
307
+
308
+
### diffix.unnest_histogram(histogram)
309
+
310
+
Unnests a 2-dimensional array into a result set of 1-dimensional arrays.
311
+
312
+
**Example:**
313
+
314
+
```
315
+
SELECT diffix.unnest_histogram(diffix.count_histogram(account)) AS bins
0 commit comments