Skip to content

Commit fb8c0cb

Browse files
committed
Add Augmented Schema vs External References section
Clarifies the architectural distinction between the object type (AUS) and filepath@store (external references) to address reviewer question about multi-cloud scenarios.
1 parent 15418c3 commit fb8c0cb

File tree

1 file changed

+25
-0
lines changed

1 file changed

+25
-0
lines changed

docs/src/design/tables/file-type-spec.md

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,31 @@ Once an object is **finalized** (either via copy-insert or staged-insert complet
2323
| **Copy** | Small files, existing data | Local file → copy to storage → insert record |
2424
| **Staged** | Large objects, Zarr/HDF5 | Reserve path → write directly to storage → finalize record |
2525

26+
### Augmented Schema vs External References
27+
28+
The `object` type implements **Augmented Schema (AUS)** — a paradigm where the object store becomes a true extension of the relational database:
29+
30+
- **DataJoint fully controls** the object store lifecycle
31+
- **Only DataJoint writes** to the object store (users may have direct read access)
32+
- **Tight coupling** between database and object store
33+
- **Joint transaction management** on objects and database records
34+
- **Single backend per pipeline** — all managed objects live together
35+
36+
This is fundamentally different from **external references**, where DataJoint merely points to user-managed data:
37+
38+
| Aspect | `object` (Augmented Schema) | `filepath@store` (External Reference) |
39+
|--------|----------------------------|--------------------------------------|
40+
| **Ownership** | DataJoint owns the data | User owns the data |
41+
| **Writes** | Only via DataJoint | User writes directly |
42+
| **Deletion** | DataJoint deletes on record delete | User manages lifecycle |
43+
| **Multi-backend** | Single backend per pipeline | Multiple named stores |
44+
| **Use case** | Pipeline-generated data | Collaborator data, legacy assets |
45+
46+
**When to use each:**
47+
48+
- Use `object` for data that DataJoint should own and manage as part of the schema (e.g., processed results, derived datasets)
49+
- Use `filepath@store` for referencing externally-managed data across multiple backends (e.g., collaborator data on different cloud providers, legacy data that shouldn't be moved)
50+
2651
## Storage Architecture
2752

2853
### Single Storage Backend Per Pipeline

0 commit comments

Comments
 (0)