Skip to content

Commit d2c284b

Browse files
committed
Revert page to legacy docs
1 parent cdefe87 commit d2c284b

File tree

1 file changed

+56
-36
lines changed

1 file changed

+56
-36
lines changed

docs/src/design/tables/filepath.md

Lines changed: 56 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -1,44 +1,48 @@
11
# Filepath Datatype
22

3-
## Configuration & usage
3+
Note: Filepath Datatype is available as a preview feature in DataJoint Python v0.12.
4+
This means that the feature is required to be explicitly enabled. To do so, make sure
5+
to set the environment variable `FILEPATH_FEATURE_SWITCH=TRUE` prior to use.
46

5-
https://github.com/datajoint/datajoint-python/issues/481
7+
## Configuration & Usage
68

7-
The `filepath` attribute type links DataJoint records to files already
9+
Corresponding to issue
10+
[#481](https://github.com/datajoint/datajoint-python/issues/481),
11+
the `filepath` attribute type links DataJoint records to files already
812
managed outside of DataJoint. This can aid in sharing data with
9-
other systems, such as allowing an image viewer application to
13+
other systems such as allowing an image viewer application to
1014
directly use files from a DataJoint pipeline, or to allow downstream
11-
tables to reference data which lives outside of the DataJoint
12-
pipeline.
15+
tables to reference data which reside outside of DataJoint
16+
pipelines.
1317

1418
To define a table using the `filepath` datatype, an existing DataJoint
1519
[store](../../sysadmin/external-store.md) should be created and then referenced in the
1620
new table definition. For example, given a simple store:
1721

18-
```json
19-
dj.config['stores'] = {
20-
'data': {
21-
'protocol': 'file',
22-
'location': '/data',
23-
'stage': '/data'
24-
}
25-
}
22+
```python
23+
dj.config['stores'] = {
24+
'data': {
25+
'protocol': 'file',
26+
'location': '/data',
27+
'stage': '/data'
28+
}
29+
}
2630
```
2731

28-
We can define an ScanImages table as follows:
32+
we can define an `ScanImages` table as follows:
2933

3034
```python
3135
@schema
3236
class ScanImages(dj.Manual):
33-
definition = """
34-
-> Session
35-
image_id: int
36-
---
37-
image_path: filepath@data
38-
"""
37+
definition = """
38+
-> Session
39+
image_id: int
40+
---
41+
image_path: filepath@data
42+
"""
3943
```
4044

41-
This table can now be used for tracking paths within the '/data' area.
45+
This table can now be used for tracking paths within the `/data` local directory.
4246
For example:
4347

4448
```python
@@ -50,27 +54,43 @@ For example:
5054
As can be seen from the example, unlike [blob](blobs.md) records, file
5155
paths are managed as path locations to the underlying file.
5256

53-
## Filepath integrity notes
57+
## Integrity Notes
5458

5559
Unlike other data in DataJoint, data in `filepath` records are
5660
deliberately intended for shared use outside of DataJoint. To help
57-
ensure integrity of filepath records, DataJoint will record a
58-
checksum of the file data on insert, and will verify this checksum
59-
on fetch. However, since the underlying file data may be shared
61+
ensure integrity of `filepath` records, DataJoint will record a
62+
checksum of the file data on `insert`, and will verify this checksum
63+
on `fetch`. However, since the underlying file data may be shared
6064
with other applications, special care should be taken to ensure
6165
records stored in `filepath` attributes are not modified outside
6266
of the pipeline, or, if they are, that records in the pipeline are
63-
updated accordingly. A safe method of changing filepath data is
67+
updated accordingly. A safe method of changing `filepath` data is
6468
as follows:
6569

66-
1. Delete filepath database record
67-
- This will ensure that any downstream records in the pipeline depending
68-
on the `filepath` record are purged from the database
69-
2. Modify filepath data
70-
3. Re-insert corresponding filepath record
71-
- This will add the record back to DataJoint with an updated file checksum
72-
4. Compute any downstream dependencies, if needed
73-
- This will ensure that downstream results dependent on the filepath
74-
record are updated to reflect the newer filepath contents.
70+
1. Delete the `filepath` database record.
71+
This will ensure that any downstream records in the pipeline depending
72+
on the `filepath` record are purged from the database.
73+
2. Modify `filepath` data.
74+
3. Re-insert corresponding the `filepath` record.
75+
This will add the record back to DataJoint with an updated file checksum.
76+
4. Compute any downstream dependencies, if needed.
77+
This will ensure that downstream results dependent on the `filepath`
78+
record are updated to reflect the newer `filepath` contents.
79+
80+
### Disable Fetch Verification
81+
82+
Note: Skipping the checksum is not recommended as it ensures file integrity i.e.
83+
downloaded files are not corrupted. With S3 stores, most of the time to complete a
84+
`.fetch()` is from the file download itself as opposed to evaluating the checksum. This
85+
option will primarily benefit `filepath` usage connected to a local `file` store.
86+
87+
To disable checksums you can set a threshold in bytes
88+
for when to stop evaluating checksums like in the example below:
89+
90+
```python
91+
dj.config["filepath_checksum_size_limit"] = 5 * 1024**3 # Skip for all files greater than 5GiB
92+
```
93+
94+
The default is `None` which means it will always verify checksums.
7595

7696
<!-- TODO: purging filepath data -->

0 commit comments

Comments
 (0)