You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+111-6Lines changed: 111 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -60,6 +60,35 @@ Computes a cryptographic hash of the input value using the specified algorithm.
60
60
61
61
**Note:** Different data types with the same value will produce different hashes (e.g., `42::INTEGER` vs `42::BIGINT` vs `'42'::VARCHAR`).
62
62
63
+
### crypto_hash_agg()
64
+
65
+
**Syntax:**
66
+
```sql
67
+
crypto_hash_agg(algorithm, value ORDER BY sort_expression) → BLOB
68
+
```
69
+
70
+
An aggregate function that computes a cryptographic hash over multiple rows of data. This is useful for creating checksums of entire datasets, detecting changes in groups of records, or generating deterministic identifiers for sets of values.
71
+
72
+
**Parameters:**
73
+
-`algorithm` (VARCHAR): The hash algorithm name (same algorithms as `crypto_hash`)
74
+
-`value`: The column/expression to hash (supports same data types as `crypto_hash`)
75
+
-`ORDER BY`: **Required** - ensures deterministic ordering of values before hashing
76
+
77
+
**Returns:** BLOB containing the raw hash bytes, or NULL for empty result sets
78
+
79
+
**Important Notes:**
80
+
- The `ORDER BY` clause is **mandatory** because hash aggregation is order-dependent
81
+
- Values are hashed sequentially in the order specified by `ORDER BY`
82
+
- For `VARCHAR` and `BLOB` types, each value's length is hashed before its content (same as list hashing)
83
+
- The function produces the same hash as `crypto_hash()` would produce for an equivalent list
84
+
- Empty result sets return `NULL`
85
+
86
+
**Use Cases:**
87
+
-**Dataset Checksums**: Verify data integrity across tables or partitions
88
+
-**Change Detection**: Detect if any values in a group have changed
89
+
-**Merkle-like Hashing**: Create hierarchical hashes of grouped data
90
+
-**Deterministic IDs**: Generate stable identifiers for sets of values
-- Create a checksum for an entire table partition
272
+
SELECT
273
+
partition_date,
274
+
lower(to_hex(crypto_hash_agg('blake3', transaction_id ORDER BY transaction_id))) AS partition_checksum
275
+
FROM transactions
276
+
GROUP BY partition_date;
277
+
278
+
-- Detect changes in a dataset by comparing checksums
279
+
WITH current_hash AS (
280
+
SELECT crypto_hash_agg('sha2-256', data ORDER BY id) AS hash
281
+
FROM critical_table
282
+
)
283
+
SELECT hash ='\x<expected_hash_value>'::BLOB AS data_unchanged
284
+
FROM current_hash;
285
+
```
286
+
287
+
### Merkle-Style Hierarchical Hashing
288
+
```sql
289
+
-- Create hierarchical hashes for efficient change detection
290
+
-- Level 1: Hash individual user transactions
291
+
WITH user_hashes AS (
292
+
SELECT
293
+
user_id,
294
+
crypto_hash_agg('sha2-256', transaction_id ORDER BYtimestamp) AS user_hash
295
+
FROM transactions
296
+
GROUP BY user_id
297
+
)
298
+
-- Level 2: Hash all user hashes to get global hash
299
+
SELECT
300
+
lower(to_hex(crypto_hash_agg('sha2-256', user_hash ORDER BY user_id))) AS global_hash
301
+
FROM user_hashes;
302
+
```
303
+
221
304
## Important Notes
222
305
223
-
1.**Output Format**: Both `crypto_hash()`and `crypto_hmac()` return raw binary data as `BLOB`. Use `to_hex()` to convert to hexadecimal strings, or `lower(to_hex(...))` for lowercase hex.
306
+
1.**Output Format**: `crypto_hash()`, `crypto_hash_agg()`, and `crypto_hmac()` all return raw binary data as `BLOB`. Use `to_hex()` to convert to hexadecimal strings, or `lower(to_hex(...))` for lowercase hex.
224
307
225
308
2.**Type Sensitivity**: The hash is computed on the binary representation of the data type. The same numeric value with different types will produce different hashes:
3.**NULL Handling**: Both functions return `NULL` if the input value is `NULL`.
314
+
3.**NULL Handling**: `crypto_hash()` and `crypto_hmac()`return `NULL` if the input value is `NULL`. `crypto_hash_agg()` returns `NULL` for empty result sets.
232
315
233
-
4.**List Hashing with Length Encoding**:
234
-
- For fixed-length types (integers, floats, dates, etc.) in lists, only the raw binary data is hashed
235
-
- For variable-length types (`VARCHAR` and `BLOB`) in lists, each element is hashed as: `[8-byte length][content]`
316
+
4.**List and Aggregate Hashing with Length Encoding**:
317
+
- Applies to both `crypto_hash()` when hashing lists and `crypto_hash_agg()` when aggregating values
318
+
- For fixed-length types (integers, floats, dates, etc.), only the raw binary data is hashed
319
+
- For variable-length types (`VARCHAR` and `BLOB`), each element is hashed as: `[8-byte length][content]`
236
320
- The length is encoded as a 64-bit unsigned integer (uint64_t) in native byte order
237
321
- This prevents length extension attacks where `['ab', 'c']` would otherwise hash the same as `['a', 'bc']`
0 commit comments