@@ -245,6 +245,19 @@ The `object` type is stored as a `JSON` column in MySQL containing:
245245
246246** File example:**
247247``` json
248+ {
249+ "path" : " my_schema/Recording/objects/subject_id=123/session_id=45/raw_data_Ax7bQ2kM.dat" ,
250+ "size" : 12345 ,
251+ "hash" : null ,
252+ "ext" : " .dat" ,
253+ "is_dir" : false ,
254+ "timestamp" : " 2025-01-15T10:30:00Z" ,
255+ "mime_type" : " application/octet-stream"
256+ }
257+ ```
258+
259+ ** File with optional hash:**
260+ ``` json
248261{
249262 "path" : " my_schema/Recording/objects/subject_id=123/session_id=45/raw_data_Ax7bQ2kM.dat" ,
250263 "size" : 12345 ,
@@ -261,7 +274,7 @@ The `object` type is stored as a `JSON` column in MySQL containing:
261274{
262275 "path" : " my_schema/Recording/objects/subject_id=123/session_id=45/raw_data_pL9nR4wE" ,
263276 "size" : 567890 ,
264- "hash" : " sha256:fedcba9876... " ,
277+ "hash" : null ,
265278 "ext" : null ,
266279 "is_dir" : true ,
267280 "timestamp" : " 2025-01-15T10:30:00Z" ,
@@ -275,13 +288,59 @@ The `object` type is stored as a `JSON` column in MySQL containing:
275288| -------| ------| ----------| -------------|
276289| ` path ` | string | Yes | Full path/key within storage backend (includes token) |
277290| ` size ` | integer | Yes | Total size in bytes (sum for folders) |
278- | ` hash ` | string | Yes | Content hash with algorithm prefix |
291+ | ` hash ` | string/null | Yes | Content hash with algorithm prefix, or null (default) |
279292| ` ext ` | string/null | Yes | File extension (e.g., ` .dat ` , ` .zarr ` ) or null |
280293| ` is_dir ` | boolean | Yes | True if stored content is a directory |
281294| ` timestamp ` | string | Yes | ISO 8601 upload timestamp |
282295| ` mime_type ` | string | No | MIME type (files only, auto-detected from extension) |
283296| ` item_count ` | integer | No | Number of files (folders only) |
284297
298+ ### Content Hashing
299+
300+ By default, ** no content hash is computed** to avoid performance overhead for large objects. Storage backend integrity is trusted.
301+
302+ ** Optional hashing** can be requested per-insert:
303+
304+ ``` python
305+ # Default - no hash (fast)
306+ Recording.insert1({... , " raw_data" : " /path/to/large.dat" })
307+
308+ # Request hash computation
309+ Recording.insert1({... , " raw_data" : " /path/to/important.dat" }, hash = " sha256" )
310+ ```
311+
312+ Supported hash algorithms: ` sha256 ` , ` md5 ` , ` xxhash ` (xxh3, faster for large files)
313+
314+ ** Staged inserts never compute hashes** - data is written directly to storage without a local copy to hash.
315+
316+ ### Folder Manifests
317+
318+ For folders (directories), a ** manifest file** is created alongside the folder to enable integrity verification without computing content hashes:
319+
320+ ```
321+ raw_data_pL9nR4wE/
322+ raw_data_pL9nR4wE.manifest.json
323+ ```
324+
325+ ** Manifest content:**
326+ ``` json
327+ {
328+ "files" : [
329+ {"path" : " file1.dat" , "size" : 1234 },
330+ {"path" : " subdir/file2.dat" , "size" : 5678 },
331+ {"path" : " subdir/file3.dat" , "size" : 91011 }
332+ ],
333+ "total_size" : 567890 ,
334+ "item_count" : 42 ,
335+ "created" : " 2025-01-15T10:30:00Z"
336+ }
337+ ```
338+
339+ The manifest enables:
340+ - Quick verification that all expected files exist
341+ - Size validation without reading file contents
342+ - Detection of missing or extra files
343+
285344### Filename Convention
286345
287346The stored filename is ** always derived from the field name** :
@@ -736,7 +795,7 @@ file_ref = record["raw_data"]
736795# Access metadata (no I/O)
737796print (file_ref.path) # Full storage path
738797print (file_ref.size) # File size in bytes
739- print (file_ref.hash) # Content hash
798+ print (file_ref.hash) # Content hash (if computed) or None
740799print (file_ref.ext) # File extension (e.g., ".dat") or None
741800print (file_ref.is_dir) # True if stored content is a folder
742801
@@ -840,7 +899,7 @@ class ObjectRef:
840899
841900 path: str
842901 size: int
843- hash : str
902+ hash : str | None # content hash (if computed) or None
844903 ext: str | None # file extension (e.g., ".dat") or None
845904 is_dir: bool
846905 timestamp: datetime
@@ -875,6 +934,18 @@ class ObjectRef:
875934 # Common operations
876935 def download (self , destination : Path | str , subpath : str | None = None ) -> Path: ...
877936 def exists (self , subpath : str | None = None ) -> bool : ...
937+
938+ # Integrity verification
939+ def verify (self ) -> bool :
940+ """
941+ Verify object integrity.
942+
943+ For files: checks size matches, and hash if available.
944+ For folders: validates manifest (all files exist with correct sizes).
945+
946+ Returns True if valid, raises IntegrityError with details if not.
947+ """
948+ ...
878949```
879950
880951#### fsspec Integration
0 commit comments