Skip to content

Commit af6cef2

Browse files
committed
Add store metadata and client verification mechanism
Store metadata (dj-store-meta.json): - Located at store root with project_name, created, format_version - Lists schemas using the store - Created on first file operation Client verification: - project_name required in client settings - Must match store metadata on connect - Raises DataJointError on mismatch - Ensures all clients use same configuration Also renamed hash_length to token_length throughout spec.
1 parent 4f15c90 commit af6cef2

File tree

1 file changed

+96
-8
lines changed

1 file changed

+96
-8
lines changed

docs/src/design/tables/file-type-spec.md

Lines changed: 96 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -69,34 +69,37 @@ Object storage is configured in `datajoint.json` using the existing settings sys
6969
"database.host": "localhost",
7070
"database.user": "datajoint",
7171

72+
"object_storage.project_name": "my_project",
7273
"object_storage.protocol": "s3",
7374
"object_storage.endpoint": "s3.amazonaws.com",
7475
"object_storage.bucket": "my-bucket",
7576
"object_storage.location": "my_project",
76-
"object_storage.partition_pattern": "subject{subject_id}/session{session_id}"
77+
"object_storage.partition_pattern": "{subject_id}/{session_id}"
7778
}
7879
```
7980

8081
For local filesystem storage:
8182

8283
```json
8384
{
85+
"object_storage.project_name": "my_project",
8486
"object_storage.protocol": "file",
8587
"object_storage.location": "/data/my_project",
86-
"object_storage.partition_pattern": "subject{subject_id}/session{session_id}"
88+
"object_storage.partition_pattern": "{subject_id}/{session_id}"
8789
}
8890
```
8991

9092
### Settings Schema
9193

9294
| Setting | Type | Required | Description |
9395
|---------|------|----------|-------------|
96+
| `object_storage.project_name` | string | Yes | Unique project identifier (must match store metadata) |
9497
| `object_storage.protocol` | string | Yes | Storage backend: `file`, `s3`, `gcs`, `azure` |
9598
| `object_storage.location` | string | Yes | Base path or bucket prefix |
9699
| `object_storage.bucket` | string | For cloud | Bucket name (S3, GCS, Azure) |
97100
| `object_storage.endpoint` | string | For S3 | S3 endpoint URL |
98101
| `object_storage.partition_pattern` | string | No | Path pattern with `{attribute}` placeholders |
99-
| `object_storage.hash_length` | int | No | Random suffix length for filenames (default: 8, range: 4-16) |
102+
| `object_storage.token_length` | int | No | Random suffix length for filenames (default: 8, range: 4-16) |
100103
| `object_storage.access_key` | string | For cloud | Access key (can use secrets file) |
101104
| `object_storage.secret_key` | string | For cloud | Secret key (can use secrets file) |
102105

@@ -139,6 +142,90 @@ s3://my-bucket/my_project/subject123/session45/schema_name/objects/Recording-raw
139142

140143
If no partition pattern is specified, files are organized directly under `{location}/{schema}/objects/`.
141144

145+
## Store Metadata (`dj-store-meta.json`)
146+
147+
Each object store contains a metadata file at its root that identifies the store and enables verification by DataJoint clients.
148+
149+
### Location
150+
151+
```
152+
{location}/dj-store-meta.json
153+
```
154+
155+
For cloud storage:
156+
```
157+
s3://bucket/my_project/dj-store-meta.json
158+
```
159+
160+
### Content
161+
162+
```json
163+
{
164+
"project_name": "my_project",
165+
"created": "2025-01-15T10:30:00Z",
166+
"format_version": "1.0",
167+
"datajoint_version": "0.15.0",
168+
"schemas": ["schema1", "schema2"]
169+
}
170+
```
171+
172+
### Schema
173+
174+
| Field | Type | Required | Description |
175+
|-------|------|----------|-------------|
176+
| `project_name` | string | Yes | Unique project identifier |
177+
| `created` | string | Yes | ISO 8601 timestamp of store creation |
178+
| `format_version` | string | Yes | Store format version for compatibility |
179+
| `datajoint_version` | string | Yes | DataJoint version that created the store |
180+
| `schemas` | array | No | List of schemas using this store (updated on schema creation) |
181+
182+
### Store Initialization
183+
184+
The store metadata file is created when the first `file` attribute is used:
185+
186+
```
187+
┌─────────────────────────────────────────────────────────┐
188+
│ 1. Client attempts first file operation │
189+
├─────────────────────────────────────────────────────────┤
190+
│ 2. Check if dj-store-meta.json exists │
191+
│ ├─ If exists: verify project_name matches │
192+
│ └─ If not: create with current project_name │
193+
├─────────────────────────────────────────────────────────┤
194+
│ 3. On mismatch: raise DataJointError │
195+
└─────────────────────────────────────────────────────────┘
196+
```
197+
198+
### Client Verification
199+
200+
All DataJoint clients must use **identical `project_name`** settings to ensure store-database cohesion:
201+
202+
1. **On connect**: Client reads `dj-store-meta.json` from store
203+
2. **Verify**: `project_name` in client settings matches store metadata
204+
3. **On mismatch**: Raise `DataJointError` with descriptive message
205+
206+
```python
207+
# Example error
208+
DataJointError: Object store project name mismatch.
209+
Client configured: "project_a"
210+
Store metadata: "project_b"
211+
Ensure all clients use the same object_storage.project_name setting.
212+
```
213+
214+
### Schema Registration
215+
216+
When a schema first uses the `file` type, it is added to the `schemas` list in the metadata:
217+
218+
```python
219+
# After creating Recording table with file attribute in my_schema
220+
# dj-store-meta.json is updated:
221+
{
222+
"project_name": "my_project",
223+
"schemas": ["my_schema"] # my_schema added
224+
}
225+
```
226+
227+
This provides a record of which schemas have data in the store.
228+
142229
## Syntax
143230

144231
```python
@@ -211,7 +298,7 @@ Storage paths are **deterministically constructed** from record metadata, enabli
211298
5. **Table name** - the table class name
212299
6. **Primary key encoding** - remaining PK attributes and values
213300
7. **Field name** - the attribute name
214-
8. **Suffixed filename** - original name with random hash suffix
301+
8. **Suffixed filename** - original name with random token suffix
215302

216303
### Path Template
217304

@@ -310,7 +397,7 @@ description=a1b2c3d4_abc123 # long string truncated + hash
310397

311398
### Filename Collision Avoidance
312399

313-
To prevent filename collisions, each stored file receives a **random hash suffix** appended to its basename:
400+
To prevent filename collisions, each stored file receives a **random token suffix** appended to its basename:
314401

315402
```
316403
original: recording.dat
@@ -320,10 +407,10 @@ original: image.analysis.tiff
320407
stored: image.analysis_pL9nR4wE.tiff
321408
```
322409

323-
#### Hash Suffix Specification
410+
#### Token Suffix Specification
324411

325412
- **Alphabet**: URL-safe and filename-safe Base64 characters: `A-Z`, `a-z`, `0-9`, `-`, `_`
326-
- **Length**: Configurable via `object_storage.hash_length` (default: 8, range: 4-16)
413+
- **Length**: Configurable via `object_storage.token_length` (default: 8, range: 4-16)
327414
- **Generation**: Cryptographically random using `secrets.token_urlsafe()`
328415

329416
At 8 characters with 64 possible values per character: 64^8 = 281 trillion combinations.
@@ -511,12 +598,13 @@ class ObjectStorageSettings(BaseSettings):
511598
validate_assignment=True,
512599
)
513600

601+
project_name: str | None = None # Must match store metadata
514602
protocol: Literal["file", "s3", "gcs", "azure"] | None = None
515603
location: str | None = None
516604
bucket: str | None = None
517605
endpoint: str | None = None
518606
partition_pattern: str | None = None
519-
hash_length: int = Field(default=8, ge=4, le=16)
607+
token_length: int = Field(default=8, ge=4, le=16)
520608
access_key: str | None = None
521609
secret_key: SecretStr | None = None
522610
```

0 commit comments

Comments
 (0)