Skip to content

Commit bceb89b

Browse files
committed
Revert page to legacy docs
1 parent 6da43bf commit bceb89b

File tree

1 file changed

+65
-109
lines changed

1 file changed

+65
-109
lines changed

docs/src/design/tables/tiers.md

Lines changed: 65 additions & 109 deletions
Original file line numberDiff line numberDiff line change
@@ -1,112 +1,68 @@
1-
# Table Tiers
1+
# Data Tiers
22

3-
The key to reproducibility in DataJoint is clear data provenance. In any experiment,
4-
there are stages for data entry, ingestion, and processing or analysis. DataJoint
5-
helps make these stages explicit with data tiers, indicating data origin.
3+
DataJoint assigns all tables to one of the following data tiers that differentiate how
4+
the data originate.
65

7-
| Table Type | Description | Example |
8-
| -- | -- | -- |
9-
| Lookup | Small reference tables containing general information or settings. | Analysis parameter set. |
10-
| Manual | Data entered entered with by hand or with external helper scripts. | Manual subject metadata entry. |
11-
| Imported | Data ingested automatically from outside files. | Loading a raw data file. |
12-
| Computed | Data computed automatically entirely inside the pipeline. | Running analyses and storing results. |
13-
| Part\* | Data in a many-to-one relationship with the corresponding master table. While all other types correspond to their data tier, Part tables inherit the tier of their master table. | Independent unit results from a given analysis. |
14-
15-
Lookup and Manual tables generally handle manually added data. Imported and Computed
16-
tables both allow for automation, but differ in the source of information. And Part
17-
tables have a unique relationship to their corresponding Master table.
18-
19-
## Data Entry: Lookup and Manual
20-
21-
Manual tables are populated during experiments through a variety of interfaces. Not all
22-
manual information is entered by typing. Automated software can enter it directly into
23-
the database. What makes a manual table manual is that it does not perform any
24-
computations within the DataJoint pipeline.
25-
26-
Lookup tables contain basic facts that are not specific to an experiment and are fairly
27-
persistent. In GUIs, lookup tables are often used for drop-down menus or radio buttons.
28-
In Computed tables, the contents of Lookup tables are often used to specify alternative
29-
methods for computations. Unlike Manual tables, Lookup tables can specify contents in
30-
the schema definition.
31-
32-
Lookup tables are especially useful for entities with many unique features. Rather than
33-
adding many primary keys, this information can be retrieved through an index. For an
34-
example, see *ClusteringParamSet* in Element Array Ephys.
35-
36-
<!-- TODO: Add link to ephys ClusteringParamSet -->
37-
38-
While this distinction is useful for structuring a pipeline, it is not enforced, and
39-
left to the best judgement of the researcher.
40-
41-
## Automation: Imported and Computed
42-
43-
Auto-populated tables are used to define, execute, and coordinate computations in a
44-
DataJoint pipeline. These tables belong to one of the two auto-populated data tiers:
45-
*Imported* and *Computed*. The difference is not strictly enforced, but the convention
46-
helps researchers understand data provenance at a glance.
47-
48-
*Imported* tables require access to external files, such as raw storage, outside the
49-
database. If a entry were deleted, it could be retrieved from the raw files on disk.
50-
An *EphysRecording* table, for example, would load metadata and raw data from
51-
experimental recordings.
52-
53-
<!-- TODO: Add link to EphysRecording -->
54-
55-
*Computed* tables only require to other data within the pipeline. If an entry were
56-
deleted, it could could be recovered by simply running the relevant command. For
57-
analysis, many pipelines feature a task table that pairs sets of primary keys ready
58-
for computation. The
59-
[*PoseEstimationTask*](https://datajoint.com/docs/elements/element-deeplabcut/0.2/api/element_deeplabcut/model/#element_deeplabcut.model.PoseEstimationTask)
60-
in Element DeepLabCut pairs videos and models. The
61-
[*PoseEstimation*](https://datajoint.com/docs/elements/element-deeplabcut/0.2/api/element_deeplabcut/model/#element_deeplabcut.model.PoseEstimationTask)
62-
table executes these computations and stores the results.
63-
64-
Data should never be directly inserted into auto-populated tables. Instead, these tables
65-
specify a [`make` method](../make-method).
6+
## Table tiers
667

67-
## Master-Part Relationship
68-
69-
An entity in one table might be inseparably associated with a group of entities in
70-
another, forming a **master-part** relationship, with two important features.
71-
72-
1. Part tables permit a many-to-one relationship with the master.
73-
74-
2. Data entry and deletion should impact all part tables as well as the master.
75-
76-
If you're considering adding a Part table, consider whether or not there could be a
77-
reason to modify the part but not the master. If so, Manual and/or Lookup tables are
78-
likely more appropriate. Populate and delete commands should always target the master,
79-
and never individual parts. This facilitates data integrity by treating the entire
80-
process as one transaction. Either (a) all data are inserted/committed or deleted, or
81-
(b) the entire transaction is rolled back. This ensures that partial results never
82-
appear in the database.
83-
84-
As an example, Element Calcium Imaging features a *MotionCorrection* computed table
85-
segmenting an image into masks. The resulting correction is inseparable from the rigid
86-
and nonrigid correction parameters that it produces, with
87-
*MotionCorrection.RigidMotionCorrection* and *MotionCorrection.NonRigidMotionCorrection*
88-
part tables.
89-
90-
<!-- TODO: Add calcium imaging link -->
91-
92-
The master-part relationship cannot be chained or nested. DataJoint does not allow part
93-
tables of other part tables. However, it is common to have a master table with multiple
94-
part tables that depend on each other. See link above.
95-
96-
## Example
97-
98-
<!-- "src/images/concepts-table-tiers-diagram.md"-->
99-
100-
In this example, the experimenter first enters information into the Manual tables, shown
101-
in green. They enter information about a mouse, then a session, and then each scan
102-
performed, with the stimuli. Next the automated portion of the pipeline takes over,
103-
Importing the raw data and performing image alignment, shown in blue. Computed tables
104-
are shown in red. Image segmentation identifies cells in the images, and extraction of
105-
calcium traces. In grey, the segmentation method is a Lookup table. Finally, the
106-
receptive field (RF) computation is performed by relating the imaging signals to the
107-
visual stimulus information.
108-
109-
For more information on table dependencies and diagrams, see their respective articles:
110-
111-
- [Dependencies](./dependencies)
112-
- [Diagrams](../diagrams)
8+
| Tier | Superclass | Description |
9+
| -- | -- | -- |
10+
| Lookup | `dj.Lookup` | Small tables containing general facts and settings of the data pipeline; not specific to any experiment or dataset. |
11+
| Manual | `dj.Manual` | Data entered from outside the pipeline, either by hand or with external helper scripts. |
12+
| Imported | `dj.Imported` | Data ingested automatically inside the pipeline but requiring access to data outside the pipeline. |
13+
| Computed | `dj.Computed` | Data computed automatically entirely inside the pipeline. |
14+
15+
Table data tiers indicate to database administrators how valuable the data are.
16+
Manual data are the most valuable, as re-entry may be tedious or impossible.
17+
Computed data are safe to delete, as the data can always be recomputed from within DataJoint.
18+
Imported data are safer than manual data but less safe than computed data because of
19+
dependency on external data sources.
20+
With these considerations, database administrators may opt not to back up computed
21+
data, for example, or to back up imported data less frequently than manual data.
22+
23+
The data tier of a table is specified by the superclass of its class.
24+
For example, the User class in [definitions](declare.md) uses the `dj.Manual`
25+
superclass.
26+
Therefore, the corresponding User table on the database would be of the Manual tier.
27+
Furthermore, the classes for **imported** and **computed** tables have additional
28+
capabilities for automated processing as described in
29+
[Auto-populate](../../compute/populate.md).
30+
31+
## Internal conventions for naming tables
32+
33+
On the server side, DataJoint uses a naming scheme to generate a table name
34+
corresponding to a given class.
35+
The naming scheme includes prefixes specifying each table's data tier.
36+
37+
First, the name of the class is converted from `CamelCase` to `snake_case`
38+
([separation by underscores](https://en.wikipedia.org/wiki/Snake_case)).
39+
Then the name is prefixed according to the data tier.
40+
41+
- `Manual` tables have no prefix.
42+
- `Lookup` tables are prefixed with `#`.
43+
- `Imported` tables are prefixed with `_`, a single underscore.
44+
- `Computed` tables are prefixed with `__`, two underscores.
45+
46+
For example:
47+
48+
The table for the class `StructuralScan` subclassing `dj.Manual` will be named
49+
`structural_scan`.
50+
51+
The table for the class `SpatialFilter` subclassing `dj.Lookup` will be named
52+
`#spatial_filter`.
53+
54+
Again, the internal table names including prefixes are used only on the server side.
55+
These are never visible to the user, and DataJoint users do not need to know these
56+
conventions
57+
However, database administrators may use these naming patterns to set backup policies
58+
or to restrict access based on data tiers.
59+
60+
## Part tables
61+
62+
[Part tables](master-part.md) do not have their own tier.
63+
Instead, they share the same tier as their master table.
64+
The prefix for part tables also differs from the other tiers.
65+
They are prefixed by the name of their master table, separated by two underscores.
66+
67+
For example, the table for the class `Channel(dj.Part)` with the master
68+
`Ephys(dj.Imported)` will be named `_ephys__channel`.

0 commit comments

Comments
 (0)