Skip to content

Commit 6c6349b

Browse files
committed
Restructure store paths: objects/ after table, rename store config
- Rename store metadata: dj-store-meta.json → datajoint_store.json - Move objects/ directory after table name in path hierarchy - Path is now: {schema}/{Table}/objects/{pk_attrs}/{field}_{token}{ext} - Allows table folders to contain both tabular data and objects - Update all path examples and JSON samples
1 parent 36806cc commit 6c6349b

File tree

1 file changed

+31
-31
lines changed

1 file changed

+31
-31
lines changed

docs/src/design/tables/file-type-spec.md

Lines changed: 31 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -42,28 +42,26 @@ A DataJoint project creates a structured hierarchical storage pattern:
4242

4343
```
4444
📁 project_name/
45-
├── datajoint.json
46-
├── 📁 schema_name1/
47-
├── 📁 schema_name2/
48-
├── 📁 schema_name3/
49-
│ ├── schema.py
50-
│ ├── 📁 tables/
51-
│ │ ├── table1/key1-value1.parquet
52-
│ │ ├── table2/key2-value2.parquet
53-
│ │ ...
54-
│ ├── 📁 objects/
55-
│ │ ├── table1-field1/key3-value3.zarr
56-
│ │ ├── table1-field2/key3-value3.gif
57-
│ │ ...
45+
├── datajoint_store.json # store metadata (not client config)
46+
├── 📁 schema_name/
47+
│ ├── 📁 Table1/
48+
│ │ ├── data.parquet # tabular data export (future)
49+
│ │ └── 📁 objects/ # object storage for this table
50+
│ │ ├── pk1=val1/pk2=val2/field1_token.dat
51+
│ │ └── pk1=val1/pk2=val2/field2_token.zarr
52+
│ ├── 📁 Table2/
53+
│ │ ├── data.parquet
54+
│ │ └── 📁 objects/
55+
│ │ └── ...
5856
```
5957

6058
### Object Storage Keys
6159

6260
When using cloud object storage:
6361

6462
```
65-
s3://bucket/project_name/schema_name3/objects/table1/key1-value1.parquet
66-
s3://bucket/project_name/schema_name3/objects/table1-field1/key3-value3.zarr
63+
s3://bucket/project_name/schema_name/Table1/objects/pk1=val1/field_token.dat
64+
s3://bucket/project_name/schema_name/Table1/objects/pk1=val1/field_token.zarr
6765
```
6866

6967
## Configuration
@@ -145,24 +143,24 @@ The partition pattern is configured **per pipeline** (one per settings file). Pl
145143
**Example with partitioning:**
146144

147145
```
148-
s3://my-bucket/my_project/subject123/session45/schema_name/objects/Recording-raw_data/recording.dat
146+
s3://my-bucket/my_project/subject_id=123/session_id=45/schema_name/Recording/objects/raw_data_Ax7bQ2kM.dat
149147
```
150148

151-
If no partition pattern is specified, files are organized directly under `{location}/{schema}/objects/`.
149+
If no partition pattern is specified, files are organized directly under `{location}/{schema}/{Table}/objects/`.
152150

153-
## Store Metadata (`dj-store-meta.json`)
151+
## Store Metadata (`datajoint_store.json`)
154152

155-
Each object store contains a metadata file at its root that identifies the store and enables verification by DataJoint clients.
153+
Each object store contains a metadata file at its root that identifies the store and enables verification by DataJoint clients. This file is named `datajoint_store.json` to distinguish it from client configuration files (`datajoint.json`).
156154

157155
### Location
158156

159157
```
160-
{location}/dj-store-meta.json
158+
{location}/datajoint_store.json
161159
```
162160

163161
For cloud storage:
164162
```
165-
s3://bucket/my_project/dj-store-meta.json
163+
s3://bucket/my_project/datajoint_store.json
166164
```
167165

168166
### Content
@@ -193,7 +191,7 @@ The store metadata file is created when the first `object` attribute is used:
193191
┌─────────────────────────────────────────────────────────┐
194192
│ 1. Client attempts first file operation │
195193
├─────────────────────────────────────────────────────────┤
196-
│ 2. Check if dj-store-meta.json exists
194+
│ 2. Check if datajoint_store.json exists │
197195
│ ├─ If exists: verify project_name matches │
198196
│ └─ If not: create with current project_name │
199197
├─────────────────────────────────────────────────────────┤
@@ -205,7 +203,7 @@ The store metadata file is created when the first `object` attribute is used:
205203

206204
DataJoint performs a basic verification on connect to ensure store-database cohesion:
207205

208-
1. **On connect**: Client reads `dj-store-meta.json` from store
206+
1. **On connect**: Client reads `datajoint_store.json` from store
209207
2. **Verify**: `project_name` in client settings matches store metadata
210208
3. **On mismatch**: Raise `DataJointError` with descriptive message
211209

@@ -248,7 +246,7 @@ The `object` type is stored as a `JSON` column in MySQL containing:
248246
**File example:**
249247
```json
250248
{
251-
"path": "my_schema/objects/Recording/subject_id=123/session_id=45/raw_data_Ax7bQ2kM.dat",
249+
"path": "my_schema/Recording/objects/subject_id=123/session_id=45/raw_data_Ax7bQ2kM.dat",
252250
"size": 12345,
253251
"hash": "sha256:abcdef1234...",
254252
"ext": ".dat",
@@ -261,7 +259,7 @@ The `object` type is stored as a `JSON` column in MySQL containing:
261259
**Folder example:**
262260
```json
263261
{
264-
"path": "my_schema/objects/Recording/subject_id=123/session_id=45/raw_data_pL9nR4wE",
262+
"path": "my_schema/Recording/objects/subject_id=123/session_id=45/raw_data_pL9nR4wE",
265263
"size": 567890,
266264
"hash": "sha256:fedcba9876...",
267265
"ext": null,
@@ -314,23 +312,25 @@ Storage paths are **deterministically constructed** from record metadata, enabli
314312
1. **Location** - from configuration (`object_storage.location`)
315313
2. **Partition attributes** - promoted PK attributes (if `partition_pattern` configured)
316314
3. **Schema name** - from the table's schema
317-
4. **Object directory** - `objects/`
318-
5. **Table name** - the table class name
315+
4. **Table name** - the table class name
316+
5. **Object directory** - `objects/`
319317
6. **Primary key encoding** - remaining PK attributes and values
320318
7. **Suffixed filename** - `{field}_{token}{ext}`
321319

322320
### Path Template
323321

324322
**Without partitioning:**
325323
```
326-
{location}/{schema}/objects/{Table}/{pk_attr1}={pk_val1}/{pk_attr2}={pk_val2}/.../{field}_{token}{ext}
324+
{location}/{schema}/{Table}/objects/{pk_attr1}={pk_val1}/{pk_attr2}={pk_val2}/.../{field}_{token}{ext}
327325
```
328326

329327
**With partitioning:**
330328
```
331-
{location}/{partition_attr}={val}/.../schema/objects/{Table}/{remaining_pk_attrs}/.../{field}_{token}{ext}
329+
{location}/{partition_attr}={val}/.../schema/{Table}/objects/{remaining_pk_attrs}/.../{field}_{token}{ext}
332330
```
333331

332+
Note: The `objects/` directory follows the table name, allowing each table folder to also contain tabular data exports (e.g., `data.parquet`) alongside the objects.
333+
334334
### Partitioning
335335

336336
The **partition pattern** allows promoting certain primary key attributes to the beginning of the path (after `location`). This organizes storage by high-level attributes like subject or experiment, enabling:
@@ -364,7 +364,7 @@ class Recording(dj.Manual):
364364
Inserting `{"subject_id": 123, "session_id": 45, "raw_data": "/path/to/recording.dat"}` produces:
365365

366366
```
367-
my_project/my_schema/objects/Recording/subject_id=123/session_id=45/raw_data_Ax7bQ2kM.dat
367+
my_project/my_schema/Recording/objects/subject_id=123/session_id=45/raw_data_Ax7bQ2kM.dat
368368
```
369369

370370
Note: The filename is `raw_data` (field name) with `.dat` extension (from source file).
@@ -374,7 +374,7 @@ Note: The filename is `raw_data` (field name) with `.dat` extension (from source
374374
With `partition_pattern = "{subject_id}"`:
375375

376376
```
377-
my_project/subject_id=123/my_schema/objects/Recording/session_id=45/raw_data_Ax7bQ2kM.dat
377+
my_project/subject_id=123/my_schema/Recording/objects/session_id=45/raw_data_Ax7bQ2kM.dat
378378
```
379379

380380
The `subject_id` is promoted to the path root, grouping all files for subject 123 together regardless of schema or table.

0 commit comments

Comments
 (0)