You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- Replace "Single Storage Backend" with "Default and Named Stores"
- Add object@store syntax for named stores in table definitions
- Add Named Stores configuration section with stores.<name> prefix
- Update JSON schema with store, url, and path fields
- Update Access Control Patterns for multiple buckets
- Update orphan cleanup for per-store operation with delimiter-based
listing for efficient Zarr enumeration
@@ -50,18 +50,41 @@ This is fundamentally different from **external references**, where DataJoint me
50
50
51
51
## Storage Architecture
52
52
53
-
### Single Storage Backend Per Pipeline
53
+
### Default and Named Stores
54
54
55
-
Each DataJoint pipeline has **one**associated storage backend configured in `datajoint.json`. DataJoint fully controls the path structure within this backend.
55
+
Each DataJoint pipeline has a **default storage backend**plus optional **named stores**, all configured in `datajoint.json`. DataJoint fully controls the path structure within each store.
56
56
57
-
**Why single backend?** The object store is a logical extension of the schema—its integrity must be verifiable as a unit. With a single backend:
58
-
- Schema completeness can be verified with one listing operation
59
-
- Orphan detection is straightforward
60
-
- Migration requires only config changes, not mass URL updates in the database
57
+
```python
58
+
@schema
59
+
classRecording(dj.Manual):
60
+
definition ="""
61
+
subject_id : int
62
+
session_id : int
63
+
---
64
+
raw_data : object # uses default store
65
+
published : object@public # uses 'public' named store
66
+
"""
67
+
```
68
+
69
+
**All stores follow OAS principles:**
70
+
- DataJoint owns the lifecycle (insert/delete/fetch as a unit)
71
+
- Same deterministic path structure (`project/schema/Table/objects/...`)
72
+
- Same access control alignment with database
73
+
- Each store has its own `datajoint_store.json` metadata file
74
+
75
+
**Why support multiple stores?**
76
+
- Different access policies (private vs public buckets)
| Row-level | Per-object ACL or signed URLs | Future enhancement |
74
97
75
-
**Example: Private and public data in one bucket**
76
-
77
-
Rather than using separate buckets, use prefix-based policies:
98
+
**Example: Private and public data in separate stores**
78
99
79
100
```
80
-
s3://my-bucket/my_project/
81
-
├── internal_schema/ ← restricted IAM policy
82
-
│ └── ProcessingResults/
83
-
│ └── objects/...
84
-
└── publications/ ← public bucket policy
101
+
# Default store (private)
102
+
s3://internal-bucket/my_project/
103
+
└── lab_schema/
104
+
└── ProcessingResults/
105
+
└── objects/...
106
+
107
+
# Named 'public' store
108
+
s3://public-bucket/my_project/
109
+
└── lab_schema/
85
110
└── PublishedDatasets/
86
111
└── objects/...
87
112
```
88
113
89
-
This achieves the same access separation as multiple buckets while maintaining schema integrity in a single backend.
114
+
Alternatively, use prefix-based policies within a single bucket if preferred.
90
115
91
116
**Row-level access control** (access to objects for specific primary key values) is not directly supported by object store policies. Future versions may address this via DataJoint-generated signed URLs that project database permissions onto object access.
92
117
@@ -156,6 +181,42 @@ For local filesystem storage:
156
181
}
157
182
```
158
183
184
+
### Named Stores
185
+
186
+
Additional stores can be defined using the `object_storage.stores.<name>` prefix:
Named stores inherit `project_name` from the default configuration but can override all other settings. Use named stores with the `object@store_name` syntax:
202
+
203
+
```python
204
+
@schema
205
+
classDataset(dj.Manual):
206
+
definition ="""
207
+
dataset_id : int
208
+
---
209
+
internal_data : object # default store (internal-bucket)
210
+
published_data : object@public # public store (public-bucket)
211
+
"""
212
+
```
213
+
214
+
Each named store:
215
+
- Must be explicitly configured (no ad-hoc URLs)
216
+
- Has its own `datajoint_store.json` metadata file
217
+
- Follows the same OAS lifecycle semantics as the default store
218
+
- Credentials are managed at the platform level, aligned with database access control
219
+
159
220
### Settings Schema
160
221
161
222
| Setting | Type | Required | Description |
@@ -320,20 +381,24 @@ class Recording(dj.Manual):
320
381
subject_id : int
321
382
session_id : int
322
383
---
323
-
raw_data : object # managed file storage
324
-
processed : object # another object attribute
384
+
raw_data : object # uses default store
385
+
processed : object # another object attribute (default store)
386
+
published : object@public # uses named 'public' store
325
387
"""
326
388
```
327
389
328
-
Note: No `@store` suffix needed - storage is determined by pipeline configuration.
390
+
-`object` — uses the default storage backend
391
+
-`object@store_name` — uses a named store (must be configured in settings)
329
392
330
393
## Database Storage
331
394
332
395
The `object` type is stored as a `JSON` column in MySQL containing:
@@ -386,7 +457,9 @@ The `object` type is stored as a `JSON` column in MySQL containing:
386
457
387
458
| Field | Type | Required | Description |
388
459
|-------|------|----------|-------------|
389
-
|`path`| string | Yes | Full path/key within storage backend (includes token) |
460
+
|`store`| string/null | Yes | Store name (e.g., `"public"`), or `null` for default store |
461
+
|`url`| string | Yes | Full URL including protocol and bucket (e.g., `s3://bucket/path`) |
462
+
|`path`| string | Yes | Relative path within store (excludes protocol/bucket, includes token) |
390
463
|`size`| integer/null | No | Total size in bytes (sum for folders), or null if not computed. See [Performance Considerations](#performance-considerations). |
391
464
|`hash`| string/null | Yes | Content hash with algorithm prefix, or null (default) |
392
465
|`ext`| string/null | Yes | File extension as tooling hint (e.g., `.dat`, `.zarr`) or null. See [Extension Field](#extension-field). |
@@ -395,6 +468,11 @@ The `object` type is stored as a `JSON` column in MySQL containing:
395
468
|`mime_type`| string | No | MIME type (files only, auto-detected from extension) |
396
469
|`item_count`| integer | No | Number of files (folders only), or null if not computed. See [Performance Considerations](#performance-considerations). |
397
470
471
+
**Why both `url` and `path`?**
472
+
-`url`: Self-describing, enables cross-validation, robust to config changes
473
+
-`path`: Enables store name re-derivation at migration time, consistent structure across stores
474
+
- At migration, the store name can be derived by matching `url` against configured stores
475
+
398
476
### Extension Field
399
477
400
478
The `ext` field is a **tooling hint** that preserves the original file extension or provides a conventional suffix for directory-based formats. It is:
@@ -937,18 +1015,36 @@ Orphaned files (files in storage without corresponding database records) may acc
937
1015
938
1016
### Orphan Cleanup Procedure
939
1017
940
-
Orphan cleanup is a **separate maintenance operation** provided via the `schema.object_storage` utility object.
1018
+
Orphan cleanup is a **separate maintenance operation** provided via the `schema.object_storage` utility object. Cleanup operates **per-store**, iterating through all configured stores.
941
1019
942
1020
```python
943
1021
# Maintenance utility methods (not a hidden table)
944
-
schema.object_storage.find_orphaned(grace_period_minutes=30) # List orphaned files
1022
+
schema.object_storage.find_orphaned(grace_period_minutes=30) # List orphaned files (all stores)
1023
+
schema.object_storage.find_orphaned(store="public") # List orphaned files (specific store)
**Note**: `schema.object_storage` is a utility object, not a hidden table. Unlike `attach@store` which uses `~external_*` tables, the `object` type stores all metadata inline in JSON columns and has no hidden tables.
951
1030
1031
+
**Efficient listing for Zarr and large stores:**
1032
+
1033
+
For stores with Zarr arrays (potentially millions of chunk objects), cleanup uses **delimiter-based listing** to enumerate only root object names, not individual chunks:
1034
+
1035
+
```python
1036
+
# S3 API with delimiter - lists "directories" only
0 commit comments