@@ -12,7 +12,7 @@ This document defines a three-layer type architecture:
1212┌───────────────────────────────────────────────────────────────────┐
1313│ AttributeTypes (Layer 3) │
1414│ │
15- │ Built-in: <blob> <attach> <object@> <content @> <filepath@> │
15+ │ Built-in: <blob> <attach> <object@> <hash @> <filepath@> │
1616│ User: <custom> <mytype> ... │
1717├───────────────────────────────────────────────────────────────────┤
1818│ Core DataJoint Types (Layer 2) │
@@ -39,7 +39,7 @@ This document defines a three-layer type architecture:
3939| Region | Path Pattern | Addressing | Use Case |
4040| --------| --------------| ------------| ----------|
4141| Object | ` {schema}/{table}/{pk}/ ` | Primary key | Large objects, Zarr, HDF5 |
42- | Content | ` _content /{hash}` | Content hash | Deduplicated blobs/files |
42+ | Hash | ` _hash /{hash}` | SHA256 hash | Deduplicated blobs/files |
4343
4444### External References
4545
@@ -193,7 +193,7 @@ The `@` character in AttributeType syntax indicates **external storage** (object
193193- ** ` @ ` alone** : Use default store - e.g., ` <blob@> `
194194- ** ` @name ` ** : Use named store - e.g., ` <blob@cold> `
195195
196- Some types support both modes (` <blob> ` , ` <attach> ` ), others are external-only (` <object@> ` , ` <content @> ` , ` <filepath@> ` ).
196+ Some types support both modes (` <blob> ` , ` <attach> ` ), others are external-only (` <object@> ` , ` <hash @> ` , ` <filepath@> ` ).
197197
198198### Type Resolution and Chaining
199199
@@ -204,16 +204,16 @@ returns the appropriate dtype based on storage mode:
204204Resolution at declaration time:
205205
206206<blob> → get_dtype(False) → "bytes" → LONGBLOB/BYTEA
207- <blob@> → get_dtype(True) → "<content >" → json → JSON/JSONB
208- <blob@cold> → get_dtype(True) → "<content >" → json (store=cold)
207+ <blob@> → get_dtype(True) → "<hash >" → json → JSON/JSONB
208+ <blob@cold> → get_dtype(True) → "<hash >" → json (store=cold)
209209
210210<attach> → get_dtype(False) → "bytes" → LONGBLOB/BYTEA
211- <attach@> → get_dtype(True) → "<content >" → json → JSON/JSONB
211+ <attach@> → get_dtype(True) → "<hash >" → json → JSON/JSONB
212212
213213<object@> → get_dtype(True) → "json" → JSON/JSONB
214214<object> → get_dtype(False) → ERROR (external only)
215215
216- <content @> → get_dtype(True) → "json" → JSON/JSONB
216+ <hash @> → get_dtype(True) → "json" → JSON/JSONB
217217<filepath@s> → get_dtype(True) → "json" → JSON/JSONB
218218```
219219
@@ -262,15 +262,15 @@ class ObjectType(AttributeType):
262262 return ObjectRef(store = get_store(stored[" store" ]), path = stored[" path" ])
263263```
264264
265- ### ` <content @> ` / ` <content @store> ` - Content -Addressed Storage
265+ ### ` <hash @> ` / ` <hash @store> ` - Hash -Addressed Storage
266266
267267** Built-in AttributeType. External only.**
268268
269- Content -addressed storage with deduplication:
269+ Hash -addressed storage with deduplication:
270270
271271- ** Single blob only** : stores a single file or serialized object (not folders)
272272- ** Per-project scope** : content is shared across all schemas in a project (not per-schema)
273- - Path derived from content hash: ` _content /{hash[:2]}/{hash[2:4]}/{hash}`
273+ - Path derived from content hash: ` _hash /{hash[:2]}/{hash[2:4]}/{hash}`
274274- Many-to-one: multiple rows (even across schemas) can reference same content
275275- Reference counted for garbage collection
276276- Deduplication: identical content stored once across the entire project
@@ -282,48 +282,48 @@ store_root/
282282├── {schema}/{table}/{pk}/ # object storage (path-addressed by PK)
283283│ └── {attribute}/
284284│
285- └── _content / # content storage (content -addressed)
285+ └── _hash / # content storage (hash -addressed)
286286 └── {hash[:2]}/{hash[2:4]}/{hash}
287287```
288288
289289#### Implementation
290290
291291``` python
292- class ContentType (AttributeType ):
293- """ Content -addressed storage. External only."""
294- type_name = " content "
292+ class HashType (AttributeType ):
293+ """ Hash -addressed storage. External only."""
294+ type_name = " hash "
295295
296296 def get_dtype (self , is_external : bool ) -> str :
297297 if not is_external:
298- raise DataJointError(" <content > requires @ (external storage only)" )
298+ raise DataJointError(" <hash > requires @ (external storage only)" )
299299 return " json"
300300
301301 def encode (self , data : bytes , * , key = None , store_name = None ) -> dict :
302302 """ Store content, return metadata as JSON."""
303- content_hash = hashlib.sha256(data).hexdigest()
303+ hash_id = hashlib.sha256(data).hexdigest()
304304 store = get_store(store_name or dj.config[' stores' ][' default' ])
305- path = f " _content/ { content_hash [:2 ]} / { content_hash [2 :4 ]} / { content_hash } "
305+ path = f " _hash/ { hash_id [:2 ]} / { hash_id [2 :4 ]} / { hash_id } "
306306
307307 if not store.exists(path):
308308 store.put(path, data)
309- ContentRegistry ().insert1({
310- ' content_hash ' : content_hash ,
309+ HashRegistry ().insert1({
310+ ' hash_id ' : hash_id ,
311311 ' store' : store_name,
312312 ' size' : len (data)
313313 }, skip_duplicates = True )
314314
315- return {" hash" : content_hash , " store" : store_name, " size" : len (data)}
315+ return {" hash" : hash_id , " store" : store_name, " size" : len (data)}
316316
317317 def decode (self , stored : dict , * , key = None ) -> bytes :
318318 """ Retrieve content by hash."""
319319 store = get_store(stored[" store" ])
320- path = f " _content /{ stored[' hash' ][:2 ]} / { stored[' hash' ][2 :4 ]} / { stored[' hash' ]} "
320+ path = f " _hash /{ stored[' hash' ][:2 ]} / { stored[' hash' ][2 :4 ]} / { stored[' hash' ]} "
321321 return store.get(path)
322322```
323323
324324#### Database Column
325325
326- The ` <content @> ` type stores JSON metadata:
326+ The ` <hash @> ` type stores JSON metadata:
327327
328328``` sql
329329-- content column (MySQL)
@@ -442,7 +442,7 @@ column_name JSONB NOT NULL
442442```
443443
444444The ` json ` database type:
445- - Used as dtype by built-in AttributeTypes (` <object@> ` , ` <content @> ` , ` <filepath@store> ` )
445+ - Used as dtype by built-in AttributeTypes (` <object@> ` , ` <hash @> ` , ` <filepath@store> ` )
446446- Stores arbitrary JSON-serializable data
447447- Automatically uses appropriate type for database backend
448448- Supports JSON path queries where available
@@ -457,7 +457,7 @@ Serializes Python objects (NumPy arrays, dicts, lists, etc.) using DataJoint's
457457blob format. Compatible with MATLAB.
458458
459459- ** ` <blob> ` ** : Stored in database (` bytes ` → ` LONGBLOB ` /` BYTEA ` )
460- - ** ` <blob@> ` ** : Stored externally via ` <content @> ` with deduplication
460+ - ** ` <blob@> ` ** : Stored externally via ` <hash @> ` with deduplication
461461- ** ` <blob@store> ` ** : Stored in specific named store
462462
463463``` python
@@ -467,7 +467,7 @@ class BlobType(AttributeType):
467467 type_name = " blob"
468468
469469 def get_dtype (self , is_external : bool ) -> str :
470- return " <content >" if is_external else " bytes"
470+ return " <hash >" if is_external else " bytes"
471471
472472 def encode (self , value , * , key = None , store_name = None ) -> bytes :
473473 from . import blob
@@ -497,7 +497,7 @@ class ProcessedData(dj.Computed):
497497Stores files with filename preserved. On fetch, extracts to configured download path.
498498
499499- ** ` <attach> ` ** : Stored in database (` bytes ` → ` LONGBLOB ` /` BYTEA ` )
500- - ** ` <attach@> ` ** : Stored externally via ` <content @> ` with deduplication
500+ - ** ` <attach@> ` ** : Stored externally via ` <hash @> ` with deduplication
501501- ** ` <attach@store> ` ** : Stored in specific named store
502502
503503``` python
@@ -507,7 +507,7 @@ class AttachType(AttributeType):
507507 type_name = " attach"
508508
509509 def get_dtype (self , is_external : bool ) -> str :
510- return " <content >" if is_external else " bytes"
510+ return " <hash >" if is_external else " bytes"
511511
512512 def encode (self , filepath , * , key = None , store_name = None ) -> bytes :
513513 path = Path(filepath)
@@ -567,7 +567,7 @@ class ImageType(AttributeType):
567567 type_name = " image"
568568
569569 def get_dtype (self , is_external : bool ) -> str :
570- return " <content >" if is_external else " bytes"
570+ return " <hash >" if is_external else " bytes"
571571
572572 def encode (self , image , * , key = None , store_name = None ) -> bytes :
573573 # Convert PIL Image to PNG bytes
@@ -584,31 +584,31 @@ class ImageType(AttributeType):
584584| Type | get_dtype | Resolves To | Storage Location | Dedup | Returns |
585585| ------| -----------| -------------| ------------------| -------| ---------|
586586| ` <blob> ` | ` bytes ` | ` LONGBLOB ` /` BYTEA ` | Database | No | Python object |
587- | ` <blob@> ` | ` <content > ` | ` json ` | ` _content /{hash}` | Yes | Python object |
588- | ` <blob@s> ` | ` <content > ` | ` json ` | ` _content /{hash}` | Yes | Python object |
587+ | ` <blob@> ` | ` <hash > ` | ` json ` | ` _hash /{hash}` | Yes | Python object |
588+ | ` <blob@s> ` | ` <hash > ` | ` json ` | ` _hash /{hash}` | Yes | Python object |
589589| ` <attach> ` | ` bytes ` | ` LONGBLOB ` /` BYTEA ` | Database | No | Local file path |
590- | ` <attach@> ` | ` <content > ` | ` json ` | ` _content /{hash}` | Yes | Local file path |
591- | ` <attach@s> ` | ` <content > ` | ` json ` | ` _content /{hash}` | Yes | Local file path |
590+ | ` <attach@> ` | ` <hash > ` | ` json ` | ` _hash /{hash}` | Yes | Local file path |
591+ | ` <attach@s> ` | ` <hash > ` | ` json ` | ` _hash /{hash}` | Yes | Local file path |
592592| ` <object@> ` | ` json ` | ` JSON ` /` JSONB ` | ` {schema}/{table}/{pk}/ ` | No | ObjectRef |
593593| ` <object@s> ` | ` json ` | ` JSON ` /` JSONB ` | ` {schema}/{table}/{pk}/ ` | No | ObjectRef |
594- | ` <content @> ` | ` json ` | ` JSON ` /` JSONB ` | ` _content /{hash}` | Yes | bytes |
595- | ` <content @s> ` | ` json ` | ` JSON ` /` JSONB ` | ` _content /{hash}` | Yes | bytes |
594+ | ` <hash @> ` | ` json ` | ` JSON ` /` JSONB ` | ` _hash /{hash}` | Yes | bytes |
595+ | ` <hash @s> ` | ` json ` | ` JSON ` /` JSONB ` | ` _hash /{hash}` | Yes | bytes |
596596| ` <filepath@s> ` | ` json ` | ` JSON ` /` JSONB ` | Configured store | No | ObjectRef |
597597
598- ## Reference Counting for Content Type
598+ ## Reference Counting for Hash Type
599599
600- The ` ContentRegistry ` is a ** project-level** table that tracks content -addressed objects
600+ The ` HashRegistry ` is a ** project-level** table that tracks hash -addressed objects
601601across all schemas. This differs from the legacy ` ~external_* ` tables which were per-schema.
602602
603603``` python
604- class ContentRegistry :
604+ class HashRegistry :
605605 """
606- Project-level content registry.
607- Stored in a designated database (e.g., `{project}_content `).
606+ Project-level hash registry.
607+ Stored in a designated database (e.g., `{project}_hash `).
608608 """
609609 definition = """
610- # Content -addressed object registry (project-wide)
611- content_hash : char(64) # SHA256 hex
610+ # Hash -addressed object registry (project-wide)
611+ hash_id : char(64) # SHA256 hex
612612 ---
613613 store : varchar(64) # Store name
614614 size : bigint unsigned # Size in bytes
@@ -620,34 +620,34 @@ Garbage collection scans **all schemas** in the project:
620620
621621``` python
622622def garbage_collect (project ):
623- """ Remove content not referenced by any table in any schema."""
623+ """ Remove data not referenced by any table in any schema."""
624624 # Get all registered hashes
625- registered = set (ContentRegistry ().fetch(' content_hash ' , ' store' ))
625+ registered = set (HashRegistry ().fetch(' hash_id ' , ' store' ))
626626
627627 # Get all referenced hashes from ALL schemas in the project
628628 referenced = set ()
629629 for schema in project.schemas:
630630 for table in schema.tables:
631631 for attr in table.heading.attributes:
632- if attr.type in (' content ' , ' content @...' ):
632+ if attr.type in (' hash ' , ' hash @...' ):
633633 hashes = table.fetch(attr.name)
634634 referenced.update((h, attr.store) for h in hashes)
635635
636- # Delete orphaned content
637- for content_hash , store in (registered - referenced):
636+ # Delete orphaned data
637+ for hash_id , store in (registered - referenced):
638638 store_backend = get_store(store)
639- store_backend.delete(content_path(content_hash ))
640- (ContentRegistry () & {' content_hash ' : content_hash }).delete()
639+ store_backend.delete(hash_path(hash_id ))
640+ (HashRegistry () & {' hash_id ' : hash_id }).delete()
641641```
642642
643643## Built-in AttributeType Comparison
644644
645- | Feature | ` <blob> ` | ` <attach> ` | ` <object@> ` | ` <content @> ` | ` <filepath@> ` |
645+ | Feature | ` <blob> ` | ` <attach> ` | ` <object@> ` | ` <hash @> ` | ` <filepath@> ` |
646646| ---------| ----------| ------------| -------------| --------------| ---------------|
647647| Storage modes | Both | Both | External only | External only | External only |
648648| Internal dtype | ` bytes ` | ` bytes ` | N/A | N/A | N/A |
649- | External dtype | ` <content > ` | ` <content > ` | ` json ` | ` json ` | ` json ` |
650- | Addressing | Content hash | Content hash | Primary key | Content hash | Relative path |
649+ | External dtype | ` <hash > ` | ` <hash > ` | ` json ` | ` json ` | ` json ` |
650+ | Addressing | Hash | Hash | Primary key | Hash | Relative path |
651651| Deduplication | Yes (external) | Yes (external) | No | Yes | No |
652652| Structure | Single blob | Single file | Files, folders | Single blob | Any |
653653| Returns | Python object | Local path | ObjectRef | bytes | ObjectRef |
@@ -657,7 +657,7 @@ def garbage_collect(project):
657657- ** ` <blob> ` ** : Serialized Python objects (NumPy arrays, dicts). Use ` <blob@> ` for large/duplicated data
658658- ** ` <attach> ` ** : File attachments with filename preserved. Use ` <attach@> ` for large files
659659- ** ` <object@> ` ** : Large/complex file structures (Zarr, HDF5) where DataJoint controls organization
660- - ** ` <content @> ` ** : Raw bytes with deduplication (typically used via ` <blob@> ` or ` <attach@> ` )
660+ - ** ` <hash @> ` ** : Raw bytes with deduplication (typically used via ` <blob@> ` or ` <attach@> ` )
661661- ** ` <filepath@store> ` ** : Portable references to externally-managed files
662662- ** ` varchar ` ** : Arbitrary URLs/paths where ObjectRef semantics aren't needed
663663
@@ -671,9 +671,9 @@ def garbage_collect(project):
6716713 . ** AttributeTypes use angle brackets** : ` <blob> ` , ` <object@store> ` , ` <filepath@main> ` - distinguishes from core types
6726724 . ** ` @ ` indicates external storage** : No ` @ ` = database, ` @ ` present = object store
6736735 . ** ` get_dtype(is_external) ` method** : Types resolve dtype at declaration time based on storage mode
674- 6 . ** AttributeTypes are composable** : ` <blob@> ` uses ` <content @> ` , which uses ` json `
674+ 6 . ** AttributeTypes are composable** : ` <blob@> ` uses ` <hash @> ` , which uses ` json `
6756757 . ** Built-in external types use JSON dtype** : Stores metadata (path, hash, store name, etc.)
676- 8 . ** Two OAS regions** : object (PK-addressed) and content (hash-addressed) within managed stores
676+ 8 . ** Two OAS regions** : object (PK-addressed) and hash (hash-addressed) within managed stores
6776779 . ** Filepath for portability** : ` <filepath@store> ` uses relative paths within stores for environment portability
67867810 . ** No ` uri ` type** : For arbitrary URLs, use ` varchar ` —simpler and more transparent
67967911 . ** Naming conventions** :
@@ -682,7 +682,7 @@ def garbage_collect(project):
682682 - ` @ ` alone = default store
683683 - ` @name ` = named store
68468412 . ** Dual-mode types** : ` <blob> ` and ` <attach> ` support both internal and external storage
685- 13 . ** External-only types** : ` <object@> ` , ` <content @> ` , ` <filepath@> ` require ` @ `
685+ 13 . ** External-only types** : ` <object@> ` , ` <hash @> ` , ` <filepath@> ` require ` @ `
68668614 . ** Transparent access** : AttributeTypes return Python objects or file paths
68768715 . ** Lazy access** : ` <object@> ` and ` <filepath@store> ` return ObjectRef
688688
@@ -699,20 +699,20 @@ def garbage_collect(project):
699699### Migration from Legacy ` ~external_* ` Stores
700700
701701Legacy external storage used per-schema ` ~external_{store} ` tables. Migration to the new
702- per-project ` ContentRegistry ` requires:
702+ per-project ` HashRegistry ` requires:
703703
704704``` python
705705def migrate_external_store (schema , store_name ):
706706 """
707- Migrate legacy ~external_{store} to new ContentRegistry .
707+ Migrate legacy ~external_{store} to new HashRegistry .
708708
709709 1. Read all entries from ~external_{store}
710710 2. For each entry:
711711 - Fetch content from legacy location
712712 - Compute SHA256 hash
713- - Copy to _content /{hash}/ if not exists
713+ - Copy to _hash /{hash}/ if not exists
714714 - Update table column from UUID to hash
715- - Register in ContentRegistry
715+ - Register in HashRegistry
716716 3. After all schemas migrated, drop ~external_{store} tables
717717 """
718718 external_table = schema.external[store_name]
@@ -724,17 +724,17 @@ def migrate_external_store(schema, store_name):
724724 content = external_table.get(legacy_uuid)
725725
726726 # Compute new content hash
727- content_hash = hashlib.sha256(content).hexdigest()
727+ hash_id = hashlib.sha256(content).hexdigest()
728728
729729 # Store in new location if not exists
730- new_path = f " _content/ { content_hash [:2 ]} / { content_hash [2 :4 ]} / { content_hash } "
730+ new_path = f " _hash/ { hash_id [:2 ]} / { hash_id [2 :4 ]} / { hash_id } "
731731 store = get_store(store_name)
732732 if not store.exists(new_path):
733733 store.put(new_path, content)
734734
735- # Register in project-wide ContentRegistry
736- ContentRegistry ().insert1({
737- ' content_hash ' : content_hash ,
735+ # Register in project-wide HashRegistry
736+ HashRegistry ().insert1({
737+ ' hash_id ' : hash_id ,
738738 ' store' : store_name,
739739 ' size' : len (content)
740740 }, skip_duplicates = True )
@@ -755,4 +755,4 @@ def migrate_external_store(schema, store_name):
755755## Open Questions
756756
7577571 . How long should the backward compatibility layer support legacy ` ~external_* ` format?
758- 2 . Should ` <content @> ` (without store name) use a default store or require explicit store name?
758+ 2 . Should ` <hash @> ` (without store name) use a default store or require explicit store name?
0 commit comments