Skip to content

Commit a97126c

Browse files
committed
fix: add claude
1 parent 56f0a7d commit a97126c

File tree

1 file changed

+102
-21
lines changed

1 file changed

+102
-21
lines changed

CLAUDE.md

Lines changed: 102 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -4,17 +4,21 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
44

55
## Overview
66

7-
This is a DuckDB extension that adds cryptographic hash functions and HMAC calculation capabilities using OpenSSL.
7+
This is a DuckDB extension that adds cryptographic hash functions, HMAC calculation capabilities, and cryptographically secure random byte generation using OpenSSL and BLAKE3.
88

99
## Architecture
1010

11-
The extension is implemented in C++ and uses OpenSSL's EVP API for cryptographic operations:
11+
The extension is implemented in C++ and uses OpenSSL's EVP API for most cryptographic operations, with BLAKE3 provided by a vendored library:
1212

1313
1. **C++ implementation** (`src/`):
14-
- `src/crypto_extension.cpp`: Main extension entry point, registers functions with DuckDB
15-
- `src/crypto_hash.cpp`: Core hash and HMAC implementations using OpenSSL
14+
- `src/crypto_extension.cpp`: Main extension entry point, registers all functions with DuckDB
15+
- `src/crypto_hash.cpp`: Core hash, HMAC, and random byte implementations
1616
- `src/query_farm_telemetry.cpp`: Telemetry integration
17-
- Implements DuckDB scalar functions: `crypto_hash()` and `crypto_hmac()`
17+
- Implements four DuckDB functions:
18+
- `crypto_hash()` - Scalar function for hashing various data types
19+
- `crypto_hmac()` - Scalar function for HMAC computation
20+
- `crypto_hash_agg()` - Aggregate function for hashing multiple rows
21+
- `crypto_random_bytes()` - Scalar function for generating random bytes
1822

1923
### Build Integration
2024

@@ -66,39 +70,116 @@ Runs SQL tests located in `test/sql/*.test` but uses the debug build.
6670

6771
Launches DuckDB with the extension already loaded.
6872

73+
## Implemented Functions
74+
75+
### crypto_hash()
76+
**Syntax**: `crypto_hash(algorithm, value) → BLOB`
77+
78+
Computes a cryptographic hash of the input value. Supports multiple data types:
79+
- **Strings**: VARCHAR, BLOB
80+
- **Integers**: TINYINT, SMALLINT, INTEGER, BIGINT, HUGEINT, UTINYINT, USMALLINT, UINTEGER, UBIGINT, UHUGEINT
81+
- **Floating point**: FLOAT, DOUBLE
82+
- **Other**: BOOLEAN, DATE, TIME, TIMESTAMP, UUID
83+
- **Lists**: Arrays of any supported fixed-length types (e.g., `INTEGER[]`, `VARCHAR[]`, `BLOB[]`)
84+
- NULL elements inside lists are not supported
85+
- Nested lists (lists of lists, lists of structs, etc.) are not supported
86+
- For VARCHAR/BLOB lists, each element's length is hashed before its content to prevent length extension attacks
87+
88+
### crypto_hmac()
89+
**Syntax**: `crypto_hmac(algorithm, key, message) → BLOB`
90+
91+
Computes an HMAC using the specified algorithm, key, and message. All algorithms supported except BLAKE3 requires exactly 32 bytes for the key.
92+
93+
### crypto_hash_agg()
94+
**Syntax**: `crypto_hash_agg(algorithm, value ORDER BY sort_expression) → BLOB`
95+
96+
Aggregate function that computes a hash over multiple rows. **ORDER BY is required** to ensure deterministic results. Produces the same hash as `crypto_hash()` would for an equivalent ordered list. Returns NULL for empty result sets.
97+
98+
### crypto_random_bytes()
99+
**Syntax**: `crypto_random_bytes(length) → BLOB`
100+
101+
Generates cryptographically secure random bytes using OpenSSL's `RAND_bytes()`. Length must be between 1 and 4,294,967,295 bytes (4GB - 1, the maximum BLOB size in DuckDB). This function is marked as VOLATILE so each call produces different random bytes.
102+
69103
## Supported Hash Algorithms
70104

71-
The extension supports these algorithms (defined in `src/crypto_hash.cpp:getDigestByName()`):
72-
- blake3
73-
- blake2b-512
74-
- keccak224, keccak256, keccak384, keccak512 (mapped to SHA3 variants)
75-
- md4, md5
76-
- sha1
77-
- sha2-224, sha2-256, sha2-384, sha2-512
78-
- sha3-224, sha3-256, sha3-384, sha3-512
105+
The extension supports these algorithms (defined in `src/crypto_hash.cpp:GetDigestMap()`):
106+
- **blake3** - 32 bytes (separate vendored library, not from OpenSSL)
107+
- **blake2b-512** - 64 bytes
108+
- **keccak224, keccak256, keccak384, keccak512** - mapped to SHA3 variants
109+
- **md4, md5** - 16 bytes (deprecated, may not work on some systems)
110+
- **sha1** - 20 bytes
111+
- **sha2-224, sha2-256, sha2-384, sha2-512** - SHA-2 family
112+
- **sha3-224, sha3-256, sha3-384, sha3-512** - SHA-3 family
79113

80-
Both `crypto_hash()` and `crypto_hmac()` support all these algorithms.
114+
All functions (`crypto_hash()`, `crypto_hmac()`, and `crypto_hash_agg()`) support all these algorithms.
81115

82116
**Note**: Keccak is mapped to SHA3 in OpenSSL. True Keccak (pre-standardization) differs slightly from SHA3.
83117

84118
## Development Workflow
85119

86120
### Adding a New Hash Algorithm
87121

88-
1. Add the new algorithm case to `getDigestByName()` in `src/crypto_hash.cpp`
89-
2. Return the appropriate OpenSSL EVP_MD function (e.g., `EVP_sha512_256()`)
90-
3. Update error messages to include the new algorithm name
91-
4. Add tests to `test/sql/crypto.test`
92-
5. Update README.md with the new algorithm
122+
1. Add the new algorithm to `GetDigestMap()` in `src/crypto_hash.cpp`
123+
2. Map it to the appropriate OpenSSL EVP_MD function (e.g., `{"sha2-512", []() { return EVP_sha512(); }}`)
124+
3. Add test vectors to `test/sql/crypto.test` and/or `test/sql/crypto_hash.test`
125+
4. Update README.md with the new algorithm
126+
5. The `LookupAlgorithm()` function in `src/crypto_extension.cpp` handles algorithm lookup for the main hash function
127+
128+
### Adding a New Supported Data Type
129+
130+
1. Update `CryptoScalarHashFun()` in `src/crypto_extension.cpp` to handle the new type
131+
2. Ensure the type's binary representation is hashable
132+
3. Add corresponding test cases in `test/sql/crypto.test`
133+
4. Update README.md to document the new type support
134+
5. For aggregate support, update `RegisterHashAggType()` calls in `LoadInternal()`
135+
136+
## Key Implementation Details
137+
138+
### Algorithm Handling
139+
- **BLAKE3**: Handled separately using the vendored BLAKE3 library (`blake3.h`), not through OpenSSL
140+
- **OpenSSL algorithms**: Use the EVP API through `GetDigestMap()` which returns a lambda that calls the appropriate `EVP_*()` function
141+
- **Algorithm lookup**: `LookupAlgorithm()` in `src/crypto_extension.cpp` returns `nullptr` for BLAKE3, or the EVP_MD pointer for OpenSSL algorithms
142+
143+
### List Hashing
144+
- Lists are hashed element-by-element in order
145+
- For VARCHAR/BLOB elements: each element is hashed as `[8-byte length][content]` to prevent length extension attacks
146+
- For fixed-length types: only the raw binary data is hashed
147+
- List hashing code is in `HashListElementBlake3()` and `HashListElementEVP()` helper functions
148+
149+
### Aggregate Function
150+
- `crypto_hash_agg()` uses `HashAggregateState` to maintain state across rows
151+
- Requires ORDER BY clause - enforced by checking for combining operations
152+
- Produces identical output to `crypto_hash()` on an equivalent ordered list
153+
- Supports same algorithms and types as scalar function
93154

94155
### Error Handling
95156

96157
The C++ implementation throws DuckDB exceptions:
97-
- `InvalidInputException`: For invalid algorithm names or input validation failures
98-
- `InternalException`: For OpenSSL operation failures
158+
- `InvalidInputException`: For invalid algorithm names, unsupported types, NULL list elements, invalid random byte lengths
159+
- `InternalException`: For OpenSSL operation failures (context creation, digest operations, random byte generation)
99160

100161
Exceptions are caught by DuckDB's executor and presented to the user.
101162

163+
## Testing
164+
165+
The extension has comprehensive SQL-based tests:
166+
- `test/sql/crypto.test` - Main test suite covering all functions, algorithms, data types, and error cases
167+
- `test/sql/crypto_hash.test` - Focused test suite for hash algorithms with known test vectors
168+
169+
Tests are run using:
170+
```sh
171+
make test_debug # Run tests with debug build
172+
```
173+
174+
Test coverage includes:
175+
- All hash algorithms with known test vectors
176+
- All supported data types (integers, floats, booleans, dates, UUIDs, etc.)
177+
- List hashing with different element types
178+
- HMAC computation with all algorithms
179+
- Aggregate hashing with ORDER BY requirements
180+
- Random byte generation with various lengths
181+
- Error cases (invalid algorithms, unsupported types, NULL list elements, etc.)
182+
102183
## CI/CD
103184

104185
The repository uses the DuckDB extension template's CI system:

0 commit comments

Comments
 (0)