Skip to content

Commit 93559a4

Browse files
committed
Update path structure: field after PK, add partition pattern
Path changes: - Field name now comes after all primary key attributes - Groups related files together (all fields for same record in same dir) Partitioning: - partition_pattern config promotes PK attributes to path root - Enables grouping by high-level attributes (subject, experiment) - Example: {subject_id} moves subject to path start for data locality
1 parent 9d3e194 commit 93559a4

File tree

1 file changed

+44
-11
lines changed

1 file changed

+44
-11
lines changed

docs/src/design/tables/file-type-spec.md

Lines changed: 44 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -162,7 +162,7 @@ The `file` type is stored as a `JSON` column in MySQL containing:
162162
**File example:**
163163
```json
164164
{
165-
"path": "my_schema/objects/Recording/raw_data/subject_id=123/session_id=45/recording_Ax7bQ2kM.dat",
165+
"path": "my_schema/objects/Recording/subject_id=123/session_id=45/raw_data/recording_Ax7bQ2kM.dat",
166166
"size": 12345,
167167
"hash": "sha256:abcdef1234...",
168168
"original_name": "recording.dat",
@@ -175,7 +175,7 @@ The `file` type is stored as a `JSON` column in MySQL containing:
175175
**Folder example:**
176176
```json
177177
{
178-
"path": "my_schema/objects/Recording/raw_data/subject_id=123/session_id=45/data_folder_pL9nR4wE",
178+
"path": "my_schema/objects/Recording/subject_id=123/session_id=45/raw_data/data_folder_pL9nR4wE",
179179
"size": 567890,
180180
"hash": "sha256:fedcba9876...",
181181
"original_name": "data_folder",
@@ -205,20 +205,43 @@ Storage paths are **deterministically constructed** from record metadata, enabli
205205
### Path Components
206206

207207
1. **Location** - from configuration (`object_storage.location`)
208-
2. **Schema name** - from the table's schema
209-
3. **Object directory** - `objects/`
210-
4. **Table name** - the table class name
211-
5. **Field name** - the attribute name
212-
6. **Primary key encoding** - all PK attributes and values
213-
7. **Suffixed filename** - original name with random hash suffix
208+
2. **Partition attributes** - promoted PK attributes (if `partition_pattern` configured)
209+
3. **Schema name** - from the table's schema
210+
4. **Object directory** - `objects/`
211+
5. **Table name** - the table class name
212+
6. **Primary key encoding** - remaining PK attributes and values
213+
7. **Field name** - the attribute name
214+
8. **Suffixed filename** - original name with random hash suffix
214215

215216
### Path Template
216217

218+
**Without partitioning:**
217219
```
218-
{location}/{schema}/objects/{Table}/{field}/{pk_attr1}={pk_val1}/{pk_attr2}={pk_val2}/.../{basename}_{token}.{ext}
220+
{location}/{schema}/objects/{Table}/{pk_attr1}={pk_val1}/{pk_attr2}={pk_val2}/.../field/{basename}_{token}.{ext}
219221
```
220222

221-
### Example
223+
**With partitioning:**
224+
```
225+
{location}/{partition_attr}={val}/.../schema/objects/{Table}/{remaining_pk_attrs}/.../field/{basename}_{token}.{ext}
226+
```
227+
228+
### Partitioning
229+
230+
The **partition pattern** allows promoting certain primary key attributes to the beginning of the path (after `location`). This organizes storage by high-level attributes like subject or experiment, enabling:
231+
- Efficient data locality for related records
232+
- Easier manual browsing of storage
233+
- Potential for storage tiering by partition
234+
235+
**Configuration:**
236+
```json
237+
{
238+
"object_storage.partition_pattern": "{subject_id}/{experiment_id}"
239+
}
240+
```
241+
242+
Partition attributes are extracted from the primary key and placed at the path root. Remaining PK attributes appear in their normal position.
243+
244+
### Example Without Partitioning
222245

223246
For a table:
224247
```python
@@ -235,9 +258,19 @@ class Recording(dj.Manual):
235258
Inserting `{"subject_id": 123, "session_id": 45, "raw_data": "/path/to/recording.dat"}` produces:
236259

237260
```
238-
my_project/my_schema/objects/Recording/raw_data/subject_id=123/session_id=45/recording_Ax7bQ2kM.dat
261+
my_project/my_schema/objects/Recording/subject_id=123/session_id=45/raw_data/recording_Ax7bQ2kM.dat
262+
```
263+
264+
### Example With Partitioning
265+
266+
With `partition_pattern = "{subject_id}"`:
267+
268+
```
269+
my_project/subject_id=123/my_schema/objects/Recording/session_id=45/raw_data/recording_Ax7bQ2kM.dat
239270
```
240271

272+
The `subject_id` is promoted to the path root, grouping all files for subject 123 together regardless of schema or table.
273+
241274
### Deterministic Bidirectional Mapping
242275

243276
The path structure (excluding the random token) is fully deterministic:

0 commit comments

Comments
 (0)