Skip to content

Commit 56f0a7d

Browse files
committed
feat: add crypto_random_bytes
1 parent 6e6ab43 commit 56f0a7d

File tree

6 files changed

+282
-4
lines changed

6 files changed

+282
-4
lines changed

CLAUDE.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -53,10 +53,10 @@ Builds:
5353
### Testing
5454

5555
```sh
56-
make test
56+
make test_debug
5757
```
5858

59-
Runs SQL tests located in `test/sql/crypto.test`. but using the release build.
59+
Runs SQL tests located in `test/sql/*.test` but uses the debug build.
6060

6161
### Running the Extension
6262

@@ -69,6 +69,7 @@ Launches DuckDB with the extension already loaded.
6969
## Supported Hash Algorithms
7070

7171
The extension supports these algorithms (defined in `src/crypto_hash.cpp:getDigestByName()`):
72+
- blake3
7273
- blake2b-512
7374
- keccak224, keccak256, keccak384, keccak512 (mapped to SHA3 variants)
7475
- md4, md5

README.md

Lines changed: 89 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
# Crypto Hash/HMAC Extension for DuckDB
22

3-
This extension, `crypto`, adds cryptographic hash functions and HMAC (Hash-based Message Authentication Code) calculation to DuckDB.
3+
This extension, `crypto`, adds cryptographic hash functions, HMAC (Hash-based Message Authentication Code) calculation, and cryptographically secure random byte generation to DuckDB.
44

5-
While DuckDB already includes basic hash functions like `hash()` and `sha256()`, this extension provides additional algorithms including Blake3, SHA-3, and supports hashing of various data types beyond just strings.
5+
While DuckDB already includes basic hash functions like `hash()` and `sha256()`, this extension provides additional algorithms including Blake3, SHA-3, supports hashing of various data types beyond just strings, and includes secure random number generation using OpenSSL.
66

77
## Installation
88

@@ -31,6 +31,10 @@ SELECT lower(to_hex(crypto_hash('sha2-256', 42)));
3131
-- Calculate HMAC with a secret key
3232
SELECT lower(to_hex(crypto_hmac('sha2-256', 'my-secret-key', 'important message')));
3333
-- Result: 97f324adef061b4ad0abeb6be543913d7db6ba8e6e7f33cd3c4395d619b56df4
34+
35+
-- Generate 32 cryptographically secure random bytes
36+
SELECT lower(to_hex(crypto_random_bytes(32)));
37+
-- Result: (random hex string, different each time)
3438
```
3539

3640
## Hash Functions
@@ -177,6 +181,63 @@ FROM (VALUES (1), (2), (3), (4), (5)) t(value);
177181
-- true (aggregate hash matches list hash)
178182
```
179183

184+
## Random Byte Generation
185+
186+
### crypto_random_bytes()
187+
188+
**Syntax:**
189+
```sql
190+
crypto_random_bytes(length) → BLOB
191+
```
192+
193+
Generates cryptographically secure random bytes using OpenSSL's `RAND_bytes()` function. This is useful for generating random keys, salts, nonces, and other cryptographic material.
194+
195+
**Parameters:**
196+
- `length` (BIGINT): The number of random bytes to generate (must be between 1 and 4,294,967,295)
197+
198+
**Returns:** BLOB containing the requested number of cryptographically secure random bytes
199+
200+
**Security:** Uses OpenSSL's `RAND_bytes()`, which provides cryptographically strong random numbers suitable for security-sensitive applications like key generation and cryptographic operations.
201+
202+
**Limits:**
203+
- Minimum length: 1 byte
204+
- Maximum length: 4,294,967,295 bytes (4GB - 1, the maximum BLOB size in DuckDB)
205+
- Requesting 0 or negative bytes raises an `InvalidInputException`
206+
- Requesting more than 4GB raises an `InvalidInputException`
207+
208+
### Examples
209+
210+
```sql
211+
-- Generate 32 random bytes (suitable for AES-256 key)
212+
SELECT crypto_random_bytes(32);
213+
214+
-- Generate random bytes and convert to hex for display
215+
SELECT lower(to_hex(crypto_random_bytes(16)));
216+
-- Example output: 3f7a2b8c9d1e4f6a8b2c3d4e5f6a7b8c
217+
218+
-- Generate a random salt for password hashing
219+
SELECT crypto_random_bytes(16) AS salt;
220+
221+
-- Use random bytes as an HMAC key
222+
SELECT crypto_hmac('sha2-256', crypto_random_bytes(32), 'message to authenticate');
223+
224+
-- Generate multiple random values in a table
225+
CREATE TABLE api_keys (id INTEGER, api_key BLOB);
226+
INSERT INTO api_keys
227+
SELECT id, crypto_random_bytes(32)
228+
FROM range(10) t(id);
229+
230+
-- Verify randomness (each call produces different output)
231+
SELECT crypto_random_bytes(16) != crypto_random_bytes(16);
232+
-- true
233+
234+
-- Generate a 128-bit random UUID-like value
235+
SELECT lower(to_hex(crypto_random_bytes(16)));
236+
237+
-- Create a random nonce for cryptographic operations
238+
SELECT crypto_random_bytes(12) AS nonce; -- 96-bit nonce for AES-GCM
239+
```
240+
180241
## HMAC Functions
181242

182243
### crypto_hmac()
@@ -229,6 +290,32 @@ FROM api_requests;
229290

230291
## Common Use Cases
231292

293+
### Generating Cryptographic Keys and Salts
294+
```sql
295+
-- Generate a random AES-256 encryption key
296+
SELECT crypto_random_bytes(32) AS encryption_key;
297+
298+
-- Generate random salts for password hashing
299+
CREATE TABLE users (
300+
id INTEGER,
301+
username VARCHAR,
302+
password_hash BLOB,
303+
salt BLOB
304+
);
305+
306+
INSERT INTO users (id, username, salt)
307+
VALUES (1, 'alice', crypto_random_bytes(16));
308+
309+
-- Generate random HMAC keys
310+
SELECT crypto_random_bytes(32) AS hmac_key;
311+
312+
-- Create a table with random API keys
313+
CREATE TABLE api_credentials (
314+
user_id INTEGER,
315+
api_key BLOB DEFAULT crypto_random_bytes(32)
316+
);
317+
```
318+
232319
### Generating Unique IDs
233320
```sql
234321
-- Generate unique IDs from multiple columns

src/crypto_extension.cpp

Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -300,6 +300,62 @@ namespace duckdb
300300
});
301301
}
302302

303+
inline void CryptoScalarRandomBytesFun(DataChunk &args, ExpressionState &state, Vector &result)
304+
{
305+
// This is called with one argument: the number of bytes to generate
306+
auto &length_vector = args.data[0];
307+
auto count = args.size();
308+
309+
UnifiedVectorFormat length_data;
310+
length_vector.ToUnifiedFormat(count, length_data);
311+
auto lengths = UnifiedVectorFormat::GetData<int64_t>(length_data);
312+
313+
auto results = FlatVector::GetData<string_t>(result);
314+
315+
// Process each row
316+
for (idx_t i = 0; i < count; i++)
317+
{
318+
auto length_idx = length_data.sel->get_index(i);
319+
320+
if (!length_data.validity.RowIsValid(length_idx))
321+
{
322+
FlatVector::SetNull(result, i, true);
323+
continue;
324+
}
325+
326+
int64_t length = lengths[length_idx];
327+
328+
// Validate length before allocation (CryptoRandomBytes will validate too, but we need to prevent allocation issues)
329+
if (length <= 0)
330+
{
331+
throw InvalidInputException("Random bytes length must be greater than 0");
332+
}
333+
334+
// DuckDB BLOB maximum size is 4GB (2^32 - 1 bytes)
335+
constexpr int64_t MAX_BLOB_SIZE = 4294967295LL; // 4GB - 1
336+
if (length > MAX_BLOB_SIZE)
337+
{
338+
throw InvalidInputException(
339+
"Random bytes length must be less than or equal to " +
340+
std::to_string(MAX_BLOB_SIZE) + " bytes (4GB)");
341+
}
342+
343+
// Allocate buffer for random bytes
344+
auto buffer = std::unique_ptr<unsigned char[]>(new unsigned char[length]);
345+
346+
// Generate random bytes (will also validate length)
347+
CryptoRandomBytes(length, buffer.get());
348+
349+
// Add result as BLOB
350+
results[i] = StringVector::AddStringOrBlob(result, string_t(reinterpret_cast<const char *>(buffer.get()), length));
351+
}
352+
353+
if (count == 1)
354+
{
355+
result.SetVectorType(VectorType::CONSTANT_VECTOR);
356+
}
357+
}
358+
303359
struct HashAggregateState
304360
{
305361
bool is_touched;
@@ -565,6 +621,11 @@ namespace duckdb
565621
auto crypto_hmac_scalar_function = ScalarFunction("crypto_hmac", {LogicalType::VARCHAR, LogicalType::VARCHAR, LogicalType::VARCHAR}, LogicalType::BLOB, CryptoScalarHmacFun);
566622
loader.RegisterFunction(crypto_hmac_scalar_function);
567623

624+
auto crypto_random_bytes_scalar_function = ScalarFunction(
625+
"crypto_random_bytes",
626+
{LogicalType::BIGINT}, LogicalType::BLOB, CryptoScalarRandomBytesFun, nullptr, nullptr, nullptr, nullptr, LogicalTypeId::INVALID, FunctionStability::VOLATILE);
627+
loader.RegisterFunction(crypto_random_bytes_scalar_function);
628+
568629
auto agg_set = AggregateFunctionSet("crypto_hash_agg");
569630

570631
// Variable-size types (include size prefix to prevent length extension attacks)

src/crypto_hash.cpp

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22
#include "duckdb/common/string_util.hpp"
33
#include <openssl/evp.h>
44
#include <openssl/hmac.h>
5+
#include <openssl/rand.h>
56
#include <cstring>
67
#include <unordered_map>
78
#include <functional>
@@ -152,4 +153,31 @@ namespace duckdb
152153
}
153154
}
154155

156+
void CryptoRandomBytes(int64_t length, unsigned char *result)
157+
{
158+
// Validate input length
159+
if (length <= 0)
160+
{
161+
throw InvalidInputException("Random bytes length must be greater than 0");
162+
}
163+
164+
// DuckDB BLOB maximum size is 4GB (2^32 - 1 bytes)
165+
constexpr int64_t MAX_BLOB_SIZE = 4294967295LL; // 4GB - 1
166+
if (length > MAX_BLOB_SIZE)
167+
{
168+
throw InvalidInputException(
169+
"Random bytes length must be less than or equal to " +
170+
std::to_string(MAX_BLOB_SIZE) + " bytes (4GB)");
171+
}
172+
173+
// Generate random bytes using OpenSSL's RAND_bytes
174+
// RAND_bytes is cryptographically secure and automatically seeds itself
175+
int rand_result = RAND_bytes(result, static_cast<int>(length));
176+
177+
if (rand_result != 1)
178+
{
179+
throw InternalException("Failed to generate random bytes");
180+
}
181+
}
182+
155183
} // namespace duckdb

src/include/crypto_hash.hpp

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,4 +22,7 @@ void CryptoHash(const std::string& algorithm, const char* data, size_t data_len,
2222
// Compute an HMAC (Hash-based Message Authentication Code)
2323
void CryptoHmac(const std::string& algorithm, const std::string& key, const std::string& data, unsigned char* result, unsigned int& result_len);
2424

25+
// Generate cryptographically secure random bytes
26+
void CryptoRandomBytes(int64_t length, unsigned char* result);
27+
2528
} // namespace duckdb

test/sql/crypto.test

Lines changed: 98 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1038,3 +1038,101 @@ SELECT crypto_hash_agg('blake3', data ORDER BY data) =
10381038
FROM (VALUES ('hello'), ('world')) t(data);
10391039
----
10401040
true
1041+
1042+
# Test crypto_random_bytes() - Random byte generation
1043+
# =====================================================
1044+
1045+
# Test basic random byte generation (32 bytes)
1046+
query I
1047+
SELECT octet_length(crypto_random_bytes(32));
1048+
----
1049+
32
1050+
1051+
# Test small random byte generation (1 byte)
1052+
query I
1053+
SELECT octet_length(crypto_random_bytes(1));
1054+
----
1055+
1
1056+
1057+
# Test larger random byte generation (1024 bytes)
1058+
query I
1059+
SELECT octet_length(crypto_random_bytes(1024));
1060+
----
1061+
1024
1062+
1063+
# Test that random bytes are different on each call
1064+
query I
1065+
SELECT crypto_random_bytes(32) != crypto_random_bytes(32);
1066+
----
1067+
true
1068+
1069+
# Test with different lengths produce different results
1070+
query I
1071+
SELECT octet_length(crypto_random_bytes(16)) != octet_length(crypto_random_bytes(32));
1072+
----
1073+
true
1074+
1075+
# Test error case: zero bytes
1076+
statement error
1077+
SELECT crypto_random_bytes(0);
1078+
----
1079+
Invalid Input Error: Random bytes length must be greater than 0
1080+
1081+
# Test error case: negative length
1082+
statement error
1083+
SELECT crypto_random_bytes(-1);
1084+
----
1085+
Invalid Input Error: Random bytes length must be greater than 0
1086+
1087+
# Test error case: exceeds 4GB limit
1088+
statement error
1089+
SELECT crypto_random_bytes(4294967296);
1090+
----
1091+
Invalid Input Error: Random bytes length must be less than or equal to 4294967295 bytes (4GB)
1092+
1093+
# Test NULL input returns NULL
1094+
query I
1095+
SELECT crypto_random_bytes(NULL::BIGINT) IS NULL;
1096+
----
1097+
true
1098+
1099+
# Test that results are truly random (non-zero variance)
1100+
# Generate 10 random bytes and verify they're not all the same
1101+
query I
1102+
SELECT crypto_random_bytes(10) != '\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'::BLOB;
1103+
----
1104+
true
1105+
1106+
# Test at the maximum allowed size (4GB - 1 bytes)
1107+
# Note: We'll test with a smaller size for practicality (1MB)
1108+
query I
1109+
SELECT octet_length(crypto_random_bytes(1048576));
1110+
----
1111+
1048576
1112+
1113+
# Test using random bytes with crypto_hash
1114+
query I
1115+
SELECT octet_length(crypto_hash('sha2-256', crypto_random_bytes(64)));
1116+
----
1117+
32
1118+
1119+
# Test in a table context (multiple rows)
1120+
# The function is volatile, so each call produces different random bytes
1121+
query I
1122+
SELECT count(DISTINCT random_data)
1123+
FROM (
1124+
SELECT crypto_random_bytes(16) AS random_data
1125+
FROM range(10)
1126+
) subq;
1127+
----
1128+
10
1129+
1130+
# Test that different calls produce different random values
1131+
query I
1132+
SELECT (
1133+
SELECT crypto_random_bytes(8)
1134+
) != (
1135+
SELECT crypto_random_bytes(8)
1136+
);
1137+
----
1138+
true

0 commit comments

Comments
 (0)