You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -4,17 +4,21 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
4
4
5
5
## Overview
6
6
7
-
This is a DuckDB extension that adds cryptographic hash functions and HMAC calculation capabilitiesusing OpenSSL.
7
+
This is a DuckDB extension that adds cryptographic hash functions, HMAC calculation capabilities, and cryptographically secure random byte generation using OpenSSL and BLAKE3.
8
8
9
9
## Architecture
10
10
11
-
The extension is implemented in C++ and uses OpenSSL's EVP API for cryptographic operations:
11
+
The extension is implemented in C++ and uses OpenSSL's EVP API for most cryptographic operations, with BLAKE3 provided by a vendored library:
12
12
13
13
1.**C++ implementation** (`src/`):
14
-
-`src/crypto_extension.cpp`: Main extension entry point, registers functions with DuckDB
15
-
-`src/crypto_hash.cpp`: Core hash and HMAC implementations using OpenSSL
14
+
-`src/crypto_extension.cpp`: Main extension entry point, registers all functions with DuckDB
15
+
-`src/crypto_hash.cpp`: Core hash, HMAC, and random byte implementations
Computes an HMAC using the specified algorithm, key, and message. All algorithms supported except BLAKE3 requires exactly 32 bytes for the key.
92
+
93
+
### crypto_hash_agg()
94
+
**Syntax**: `crypto_hash_agg(algorithm, value ORDER BY sort_expression) → BLOB`
95
+
96
+
Aggregate function that computes a hash over multiple rows. **ORDER BY is required** to ensure deterministic results. Produces the same hash as `crypto_hash()` would for an equivalent ordered list. Returns NULL for empty result sets.
97
+
98
+
### crypto_random_bytes()
99
+
**Syntax**: `crypto_random_bytes(length) → BLOB`
100
+
101
+
Generates cryptographically secure random bytes using OpenSSL's `RAND_bytes()`. Length must be between 1 and 4,294,967,295 bytes (4GB - 1, the maximum BLOB size in DuckDB). This function is marked as VOLATILE so each call produces different random bytes.
102
+
69
103
## Supported Hash Algorithms
70
104
71
-
The extension supports these algorithms (defined in `src/crypto_hash.cpp:getDigestByName()`):
72
-
- blake3
73
-
- blake2b-512
74
-
- keccak224, keccak256, keccak384, keccak512 (mapped to SHA3 variants)
75
-
- md4, md5
76
-
- sha1
77
-
- sha2-224, sha2-256, sha2-384, sha2-512
78
-
- sha3-224, sha3-256, sha3-384, sha3-512
105
+
The extension supports these algorithms (defined in `src/crypto_hash.cpp:GetDigestMap()`):
106
+
-**blake3** - 32 bytes (separate vendored library, not from OpenSSL)
107
+
-**blake2b-512** - 64 bytes
108
+
-**keccak224, keccak256, keccak384, keccak512** - mapped to SHA3 variants
109
+
-**md4, md5** - 16 bytes (deprecated, may not work on some systems)
110
+
-**sha1** - 20 bytes
111
+
-**sha2-224, sha2-256, sha2-384, sha2-512** - SHA-2 family
112
+
-**sha3-224, sha3-256, sha3-384, sha3-512** - SHA-3 family
79
113
80
-
Both `crypto_hash()`and `crypto_hmac()` support all these algorithms.
114
+
All functions (`crypto_hash()`, `crypto_hmac()`, and `crypto_hash_agg()`) support all these algorithms.
81
115
82
116
**Note**: Keccak is mapped to SHA3 in OpenSSL. True Keccak (pre-standardization) differs slightly from SHA3.
83
117
84
118
## Development Workflow
85
119
86
120
### Adding a New Hash Algorithm
87
121
88
-
1. Add the new algorithm case to `getDigestByName()` in `src/crypto_hash.cpp`
89
-
2. Return the appropriate OpenSSL EVP_MD function (e.g., `EVP_sha512_256()`)
90
-
3. Update error messages to include the new algorithm name
91
-
4. Add tests to `test/sql/crypto.test`
92
-
5. Update README.md with the new algorithm
122
+
1. Add the new algorithm to `GetDigestMap()` in `src/crypto_hash.cpp`
123
+
2. Map it to the appropriate OpenSSL EVP_MD function (e.g., `{"sha2-512", []() { return EVP_sha512(); }}`)
124
+
3. Add test vectors to `test/sql/crypto.test` and/or `test/sql/crypto_hash.test`
125
+
4. Update README.md with the new algorithm
126
+
5. The `LookupAlgorithm()` function in `src/crypto_extension.cpp` handles algorithm lookup for the main hash function
127
+
128
+
### Adding a New Supported Data Type
129
+
130
+
1. Update `CryptoScalarHashFun()` in `src/crypto_extension.cpp` to handle the new type
131
+
2. Ensure the type's binary representation is hashable
132
+
3. Add corresponding test cases in `test/sql/crypto.test`
133
+
4. Update README.md to document the new type support
134
+
5. For aggregate support, update `RegisterHashAggType()` calls in `LoadInternal()`
135
+
136
+
## Key Implementation Details
137
+
138
+
### Algorithm Handling
139
+
-**BLAKE3**: Handled separately using the vendored BLAKE3 library (`blake3.h`), not through OpenSSL
140
+
-**OpenSSL algorithms**: Use the EVP API through `GetDigestMap()` which returns a lambda that calls the appropriate `EVP_*()` function
141
+
-**Algorithm lookup**: `LookupAlgorithm()` in `src/crypto_extension.cpp` returns `nullptr` for BLAKE3, or the EVP_MD pointer for OpenSSL algorithms
142
+
143
+
### List Hashing
144
+
- Lists are hashed element-by-element in order
145
+
- For VARCHAR/BLOB elements: each element is hashed as `[8-byte length][content]` to prevent length extension attacks
146
+
- For fixed-length types: only the raw binary data is hashed
147
+
- List hashing code is in `HashListElementBlake3()` and `HashListElementEVP()` helper functions
148
+
149
+
### Aggregate Function
150
+
-`crypto_hash_agg()` uses `HashAggregateState` to maintain state across rows
151
+
- Requires ORDER BY clause - enforced by checking for combining operations
152
+
- Produces identical output to `crypto_hash()` on an equivalent ordered list
153
+
- Supports same algorithms and types as scalar function
93
154
94
155
### Error Handling
95
156
96
157
The C++ implementation throws DuckDB exceptions:
97
-
-`InvalidInputException`: For invalid algorithm names or input validation failures
98
-
-`InternalException`: For OpenSSL operation failures
158
+
-`InvalidInputException`: For invalid algorithm names, unsupported types, NULL list elements, invalid random byte lengths
159
+
-`InternalException`: For OpenSSL operation failures (context creation, digest operations, random byte generation)
99
160
100
161
Exceptions are caught by DuckDB's executor and presented to the user.
101
162
163
+
## Testing
164
+
165
+
The extension has comprehensive SQL-based tests:
166
+
-`test/sql/crypto.test` - Main test suite covering all functions, algorithms, data types, and error cases
167
+
-`test/sql/crypto_hash.test` - Focused test suite for hash algorithms with known test vectors
168
+
169
+
Tests are run using:
170
+
```sh
171
+
make test_debug # Run tests with debug build
172
+
```
173
+
174
+
Test coverage includes:
175
+
- All hash algorithms with known test vectors
176
+
- All supported data types (integers, floats, booleans, dates, UUIDs, etc.)
0 commit comments