Skip to content

Commit dc1c899

Browse files
committed
Add PK value encoding rules for paths
- Keep = sign in paths (Hive convention, widely supported) - Simple types used directly: integers, dates, timestamps, strings - Conversion to path-safe strings only when necessary: - Path-unsafe characters (/, \) get URL-encoded - Long strings truncated with hash suffix - Binary/complex types hashed
1 parent 93559a4 commit dc1c899

File tree

1 file changed

+24
-0
lines changed

1 file changed

+24
-0
lines changed

docs/src/design/tables/file-type-spec.md

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -284,6 +284,30 @@ This enables:
284284

285285
The **random token** is stored in the JSON metadata to complete the full path.
286286

287+
### Primary Key Value Encoding
288+
289+
Primary key values are encoded directly in paths when they are simple, path-safe types:
290+
- **Integers**: Used directly (`subject_id=123`)
291+
- **Dates**: ISO format (`session_date=2025-01-15`)
292+
- **Timestamps**: ISO format with safe separators (`created=2025-01-15T10-30-00`)
293+
- **Simple strings**: Used directly if path-safe (`experiment=baseline`)
294+
295+
**Conversion to path-safe strings** is applied only when necessary:
296+
- Strings containing `/`, `\`, or other path-unsafe characters
297+
- Very long strings (truncated with hash suffix)
298+
- Binary or complex types (hashed)
299+
300+
```python
301+
# Direct encoding (no conversion needed)
302+
subject_id=123
303+
session_date=2025-01-15
304+
trial_type=control
305+
306+
# Converted encoding (path-unsafe characters)
307+
filename=my%2Ffile.dat # "/" encoded
308+
description=a1b2c3d4_abc123 # long string truncated + hash
309+
```
310+
287311
### Filename Collision Avoidance
288312

289313
To prevent filename collisions, each stored file receives a **random hash suffix** appended to its basename:

0 commit comments

Comments
 (0)