You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
-**Length**: Configurable via `object_storage.hash_length` (default: 8, range: 4-16)
206
+
-**Generation**: Cryptographically random using `secrets.token_urlsafe()`
207
+
208
+
At 8 characters with 64 possible values per character: 64^8 = 281 trillion combinations.
209
+
210
+
#### Rationale
211
+
212
+
- Avoids collisions without requiring existence checks
213
+
- Preserves original filename for human readability
214
+
- URL-safe for web-based access to cloud storage
215
+
- Filesystem-safe across all supported platforms
216
+
190
217
### No Deduplication
191
218
192
219
Each insert stores a separate copy of the file, even if identical content was previously stored. This ensures:
@@ -224,11 +251,63 @@ with open("/local/path/data.bin", "rb") as f:
224
251
1. Resolve storage backend from pipeline configuration
225
252
2. Read file content (from path or stream)
226
253
3. Compute content hash (SHA-256)
227
-
4. Generate storage path using partition pattern and primary key
254
+
4. Generate storage path with random suffix
228
255
5. Upload file to storage backend via `fsspec`
229
256
6. Build JSON metadata structure
230
257
7. Store JSON in database column
231
258
259
+
## Transaction Handling
260
+
261
+
File uploads and database inserts must be coordinated to maintain consistency. Since storage backends don't support distributed transactions with MySQL, DataJoint uses a **upload-first** strategy with cleanup on failure.
0 commit comments