@@ -609,49 +609,31 @@ class ImageType(AttributeType):
609609| ` <hash@s> ` | ` json ` | ` JSON ` /` JSONB ` | ` _hash/{hash} ` | Yes | bytes |
610610| ` <filepath@s> ` | ` json ` | ` JSON ` /` JSONB ` | Configured store | No | ObjectRef |
611611
612- ## Reference Counting for Hash Type
612+ ## Garbage Collection for Hash Storage
613613
614- The ` HashRegistry ` is a ** project-level ** table that tracks hash-addressed objects
615- across all schemas. This differs from the legacy ` ~external_* ` tables which were per-schema.
614+ Hash metadata (hash, store, size) is stored directly in each table's JSON column - no separate
615+ registry table is needed. Garbage collection scans all tables to find referenced hashes:
616616
617617``` python
618- class HashRegistry :
619- """
620- Project-level hash registry.
621- Stored in a designated database (e.g., `{project}_hash`).
622- """
623- definition = """
624- # Hash-addressed object registry (project-wide)
625- hash_id : char(64) # SHA256 hex
626- ---
627- store : varchar(64) # Store name
628- size : uint64 # Size in bytes
629- created = CURRENT_TIMESTAMP : datetime
630- """
631- ```
632-
633- Garbage collection scans ** all schemas** in the project:
634-
635- ``` python
636- def garbage_collect (project ):
637- """ Remove data not referenced by any table in any schema."""
638- # Get all registered hashes
639- registered = set (HashRegistry().fetch(' hash_id' , ' store' ))
618+ def garbage_collect (store_name ):
619+ """ Remove hash-addressed data not referenced by any table."""
620+ # Scan store for all hash files
621+ store = get_store(store_name)
622+ all_hashes = set (store.list_hashes()) # from _hash/ directory
640623
641- # Get all referenced hashes from ALL schemas in the project
624+ # Scan all tables for referenced hashes
642625 referenced = set ()
643626 for schema in project.schemas:
644627 for table in schema.tables:
645628 for attr in table.heading.attributes:
646- if attr.type in (' hash' , ' hash@...' ):
647- hashes = table.fetch(attr.name)
648- referenced.update((h, attr.store) for h in hashes)
649-
650- # Delete orphaned data
651- for hash_id, store in (registered - referenced):
652- store_backend = get_store(store)
653- store_backend.delete(hash_path(hash_id))
654- (HashRegistry() & {' hash_id' : hash_id}).delete()
629+ if uses_hash_storage(attr): # <blob@>, <attach@>, <hash@>
630+ for row in table.fetch(attr.name):
631+ if row and row.get(' store' ) == store_name:
632+ referenced.add(row[' hash' ])
633+
634+ # Delete orphaned files
635+ for hash_id in (all_hashes - referenced):
636+ store.delete(hash_path(hash_id))
655637```
656638
657639## Built-in AttributeType Comparison
0 commit comments