|
51 | 51 | "If you visit the [documentation for DataJoint](https://docs.datajoint.io/introduction/Data-pipelines.html), we define a data pipeline as follows:\n", |
52 | 52 | "> A data pipeline is a sequence of steps (more generally a directed acyclic graph) with integrated storage at each step. These steps may be thought of as nodes in a graph.\n", |
53 | 53 | "\n", |
54 | | - ">* Nodes in this graph are represented as database **tables**. Examples of such tables include \"Subject\", \"Session\", \"Implantation\", \"Experimenter\", \"Equipment\", but also \"OptoWaveform\", \"OptoStimParams\", or \"Neuronal spikes\". \n", |
| 54 | + ">* Nodes in this graph are represented as database **tables**. Examples of such tables include `Subject`, `Session`, `Implantation`, `Experimenter`, `Equipment`, but also `OptoWaveform`, `OptoStimParams`, or `NeuronalSpikes`. \n", |
55 | 55 | "\n", |
56 | 56 | ">* The data pipeline is formed by making these tables interdependent (as the nodes are connected in a network). A **dependency** is a situation where a step of the data pipeline is dependent on a result from a sequentially previous step before it can complete its execution. A dependency graph forms an entire cohesive data pipeline. \n", |
57 | 57 | "\n", |
58 | | - "1. define these \"things\" as tables in which you can store the information about them\n", |
59 | | - "2. define the relationships (in particular the dependencies) between the \"things\"\n", |
| 58 | + "In order to create a data pipeline, you need to know the \"things\" in your experiments\n", |
| 59 | + "and the relationship between them. Within the pipeline you will then:\n", |
60 | 60 | "\n", |
61 | | - "A data pipeline can then serve as a map that describes everything that goes on in your experiment, capturing what is collected, what is processed, and what is analyzed/computed. A well designed data pipeline not only let's you organize your data well, but can bring out logical clarity to your experiment, and may even bring about new insights by making how everything in your experiment relates together obvious.\n", |
| 61 | + "1. define these \"things\" as tables in which you can store the information about them.\n", |
| 62 | + "2. define the relationships (in particular the dependencies) between the \"things\".\n", |
| 63 | + "\n", |
| 64 | + "The data pipeline can then serve as a map that describes everything that goes on in your experiment, capturing what is collected, what is processed, and what is analyzed/computed. A well designed data pipeline not only let's you organize your data well, but can bring out logical clarity to your experiment, and may even bring about new insights by making how everything in your experiment relates together obvious.\n", |
62 | 65 | "\n", |
63 | 66 | "Let's go ahead and build together a pipeline from scratch to better understand what a data pipeline is all about." |
64 | 67 | ] |
|
67 | 70 | "cell_type": "markdown", |
68 | 71 | "metadata": {}, |
69 | 72 | "source": [ |
70 | | - "##### Practical examples" |
| 73 | + "#### Practical examples" |
71 | 74 | ] |
72 | 75 | }, |
73 | 76 | { |
|
131 | 134 | "cell_type": "markdown", |
132 | 135 | "metadata": {}, |
133 | 136 | "source": [ |
134 | | - "Just by going through the description, we can start to identify **entities** that needs to be stored and represented in our data pipeline:\n", |
135 | | - "\n", |
136 | | - "* mouse\n", |
137 | | - "* experimental session\n", |
138 | | - "\n", |
139 | | - "For ephys:\n", |
140 | | - "\n", |
141 | | - ">* neuron\n", |
142 | | - ">* spikes\n", |
143 | | - "\n", |
144 | | - "For calcium imaging:\n", |
145 | | - "\n", |
146 | | - ">* scan\n", |
147 | | - ">* regions of interest\n", |
148 | | - ">* trace" |
| 137 | + "Just by going through the description, we can start to identify **entities** that need to be stored and represented in our data pipeline:\n", |
| 138 | + "\n", |
| 139 | + "* mouse\n", |
| 140 | + "* experimental session\n", |
| 141 | + "\n", |
| 142 | + "For ephys:\n", |
| 143 | + "\n", |
| 144 | + ">* neuron\n", |
| 145 | + ">* spikes\n", |
| 146 | + "\n", |
| 147 | + "For calcium imaging:\n", |
| 148 | + "\n", |
| 149 | + ">* scan\n", |
| 150 | + ">* regions of interest\n", |
| 151 | + ">* trace" |
149 | 152 | ] |
150 | 153 | }, |
151 | 154 | { |
|
159 | 162 | "cell_type": "markdown", |
160 | 163 | "metadata": {}, |
161 | 164 | "source": [ |
162 | | - "### Schemas and tables" |
| 165 | + "### Schemas and tables" |
163 | 166 | ] |
164 | 167 | }, |
165 | 168 | { |
|
173 | 176 | "cell_type": "markdown", |
174 | 177 | "metadata": {}, |
175 | 178 | "source": [ |
176 | | - "In a data pipeline, we represent these **entities** as **tables**. Different *kinds* of entities become distinct tables, and each table row is a single example (instance) of the entity's category. \n", |
177 | | - "\n", |
178 | | - "For example, if we have a `Mouse` table, each row in the mouse table represents a single mouse. \n", |
179 | | - "\n", |
180 | | - "It is essential to think about what information will **uniquely identify** each entry. \n", |
181 | | - "\n", |
182 | | - "In this case, the information that uniquely identifies the `Mouse` table is their **mouse IDs** - a unique ID number assigned to each animal in the lab. This attribute is named the **primary key** of the table.\n", |
183 | | - "\n", |
184 | | - "| Mouse_ID (*Primary key attribute*)|\n", |
185 | | - "|:--------: | \n", |
186 | | - "| 11234 |\n", |
187 | | - "| 11432 |" |
| 179 | + "In a data pipeline, we represent these **entities** as **tables**. Different *kinds* of entities become distinct tables, and each table row is a single example (instance) of the entity's category. \n", |
| 180 | + "\n", |
| 181 | + "For example, if we have a `Mouse` table, each row in the mouse table represents a single mouse. \n", |
| 182 | + "\n", |
| 183 | + "It is essential to think about what information will **uniquely identify** each entry. \n", |
| 184 | + "\n", |
| 185 | + "In this case, the information that uniquely identifies the `Mouse` table is their\n", |
| 186 | + "**mouse ID** - a unique ID number assigned to each animal in the lab. This attribute is\n", |
| 187 | + "named the **primary key** of the table. By convention, table attributes are lower case\n", |
| 188 | + "and do not contain spaces.\n", |
| 189 | + "\n", |
| 190 | + "| `mouse_id*` (*Primary key attribute*)|\n", |
| 191 | + "|:--------: | \n", |
| 192 | + "| 11234 |\n", |
| 193 | + "| 11432 |" |
188 | 194 | ] |
189 | 195 | }, |
190 | 196 | { |
|
197 | 203 | "\n", |
198 | 204 | "Such an attribute is called the **primary key** of the table: the subset of table attributes uniquely identifying each entity in the table. The **secondary attribute** refers to any field in a table, not in the primary key.\n", |
199 | 205 | "\n", |
200 | | - "| Mouse_ID (*Primary key attribute*) \n", |
| 206 | + "| `mouse_id*` (*Primary key attribute*) \n", |
201 | 207 | "|:--------:| \n", |
202 | 208 | "| 11234 (*Secondary attribute*)\n", |
203 | 209 | "| 11432 (*Secondary attribute*)" |
|
207 | 213 | "cell_type": "markdown", |
208 | 214 | "metadata": {}, |
209 | 215 | "source": [ |
210 | | - "Once we have successfully identified the table's primary key, we can now think about what other columns, or **non-primary key attributes** - additional information **about each entry in the table that need to be stored as well**." |
| 216 | + "Once we have successfully identified the table's primary key, we can now think about what other columns, or **non-primary key attributes** - additional information **about each entry in the table that need to be stored as well**." |
211 | 217 | ] |
212 | 218 | }, |
213 | 219 | { |
|
221 | 227 | "cell_type": "markdown", |
222 | 228 | "metadata": {}, |
223 | 229 | "source": [ |
224 | | - "| Mouse_ID | DOB | sex |\n", |
| 230 | + "| `mouse_id*` | `dob` | `sex` |\n", |
225 | 231 | "|:--------:|------------|--------|\n", |
226 | 232 | "| 11234 | 2017-11-17 | M |\n", |
227 | 233 | "| 11432 | 2018-03-04 | F |" |
|
231 | 237 | "cell_type": "markdown", |
232 | 238 | "metadata": {}, |
233 | 239 | "source": [ |
234 | | - "Now that we have an idea of how to represent information about the mouse, let's create the table using **DataJoint**!" |
| 240 | + "Now that we have an idea of how to represent information about the mouse, let's create the table using **DataJoint**!" |
235 | 241 | ] |
236 | 242 | }, |
237 | 243 | { |
|
245 | 251 | "cell_type": "markdown", |
246 | 252 | "metadata": {}, |
247 | 253 | "source": [ |
248 | | - "##### Schema" |
| 254 | + "##### Schema" |
249 | 255 | ] |
250 | 256 | }, |
251 | 257 | { |
|
283 | 289 | "cell_type": "markdown", |
284 | 290 | "metadata": {}, |
285 | 291 | "source": [ |
286 | | - "##### Table" |
| 292 | + "##### Table" |
287 | 293 | ] |
288 | 294 | }, |
289 | 295 | { |
290 | 296 | "cell_type": "markdown", |
291 | 297 | "metadata": {}, |
292 | 298 | "source": [ |
293 | | - "In DataJoint, you define each table as a `class`, and provide the table definition (e.g., attribute definitions) as the `definition` static string property. The class will inherit from the `dj.Manual` class provided by DataJoint (more on this later)." |
| 299 | + "In DataJoint, you define each table as a `class`, and provide the table definition (e.g. attribute definitions) as the `definition` static string property. The class will inherit from the `dj.Manual` class provided by DataJoint (more on this later)." |
294 | 300 | ] |
295 | 301 | }, |
296 | 302 | { |
|
330 | 336 | "cell_type": "markdown", |
331 | 337 | "metadata": {}, |
332 | 338 | "source": [ |
333 | | - "### Basic relational operators" |
| 339 | + "### Basic relational operators" |
334 | 340 | ] |
335 | 341 | }, |
336 | 342 | { |
|
477 | 483 | "cell_type": "markdown", |
478 | 484 | "metadata": {}, |
479 | 485 | "source": [ |
480 | | - "##### Data integrity" |
| 486 | + "##### Data integrity" |
481 | 487 | ] |
482 | 488 | }, |
483 | 489 | { |
|
563 | 569 | "cell_type": "markdown", |
564 | 570 | "metadata": {}, |
565 | 571 | "source": [ |
566 | | - "As with `mouse`, we should consider **what information (i.e., attributes) is needed to identify an `experimental session`** uniquely. Here is the relevant section of the project description:\n", |
567 | | - "\n", |
568 | | - "> * As a hard working neuroscientist, you perform experiments every day, sometimes working with **more than one mouse in a day**! However, on an any given day, **a mouse undergoes at most one recording session**.\n", |
569 | | - "> * For each experimental session, you would like to record **what mouse you worked with** and **when you performed the experiment**. You would also like to keep track of other helpful information such as the **experimental setup** you worked on." |
| 572 | + "As with `mouse`, we should consider **what information (i.e. attributes) is needed to identify an experimental `session`** uniquely. Here is the relevant section of the project description:\n", |
| 573 | + "\n", |
| 574 | + "> * As a hard working neuroscientist, you perform experiments every day, sometimes working with **more than one mouse in a day**! However, on an any given day, **a mouse undergoes at most one recording session**.\n", |
| 575 | + "> * For each experimental session, you would like to record **what mouse you worked with** and **when you performed the experiment**. You would also like to keep track of other helpful information such as the **experimental setup** you worked on." |
570 | 576 | ] |
571 | 577 | }, |
572 | 578 | { |
573 | 579 | "cell_type": "markdown", |
574 | 580 | "metadata": {}, |
575 | 581 | "source": [ |
576 | | - "Based on the above, it seems that you need to know these two data to uniquely identify a single experimental session:\n", |
577 | | - "\n", |
578 | | - "* the date of the session\n", |
579 | | - "* the mouse you recorded from in that session\n", |
580 | | - "\n", |
581 | | - "to uniquely identify a single experimental session." |
| 582 | + "Based on the above, it seems that you need to know the following data to uniquely identify a single experimental session:\n", |
| 583 | + "\n", |
| 584 | + "* the date of the session\n", |
| 585 | + "* the mouse you recorded from in that session" |
582 | 586 | ] |
583 | 587 | }, |
584 | 588 | { |
585 | 589 | "cell_type": "markdown", |
586 | 590 | "metadata": {}, |
587 | 591 | "source": [ |
588 | | - "Note that, to uniquely identify an experimental session (or simply a `Session`), we need to know the mouse that the session was about. In other words, a session cannot existing without a corresponding mouse! \n", |
589 | | - "\n", |
590 | | - "With **mouse** already represented as a table in our pipeline, we say that the session **depends on** the mouse! We could graphically represent this in an **entity relationship diagram (ERD)** by drawing the line between two tables, with the one below (**session**) depending on the one above (**mouse**)." |
| 592 | + "Note that, to uniquely identify an experimental session (or simply a `Session`), we need to know the mouse that the session was about. In other words, a session cannot exist without a corresponding mouse! \n", |
| 593 | + "\n", |
| 594 | + "With **mouse** already represented as a table in our pipeline, we say that the session **depends on** the mouse! We could graphically represent this in an **entity relationship diagram (ERD)** by drawing the line between two tables, with the one below (**session**) depending on the one above (**mouse**)." |
591 | 595 | ] |
592 | 596 | }, |
593 | 597 | { |
|
778 | 782 | "cell_type": "markdown", |
779 | 783 | "metadata": {}, |
780 | 784 | "source": [ |
781 | | - "We will introduce significant types of queries used in DataJoint:\n", |
782 | | - "* 1. Restriction (`&`) and negative restriction (`-`): filter the data with certain conditions\n", |
783 | | - "* 2. Join (`*`): bring fields from different tables together\n", |
784 | | - "* 3. Projection (`.proj()`): focus on a subset of attributes\n", |
785 | | - "* 4. Fetch (`.fetch()`): pull the data from the database\n", |
786 | | - "* 5. Deletion (`.delete()`): delete entries and their dependencies\n", |
787 | | - "* 6. Drop (`.drop()`): drop the table from the schema" |
| 785 | + "We will introduce the major types of queries used in DataJoint:\n", |
| 786 | + "1. Restriction (`&`) and negative restriction (`-`): filter the data with certain conditions\n", |
| 787 | + "2. Join (`*`): bring fields from different tables together\n", |
| 788 | + "3. Projection (`.proj()`): focus on a subset of attributes\n", |
| 789 | + "\n", |
| 790 | + "Following the query operations, you might work with one or more of the following\n", |
| 791 | + "data manipulation operations supported by DataJoint:\n", |
| 792 | + " \n", |
| 793 | + "1. Fetch (`.fetch()`): pull the data from the database\n", |
| 794 | + "2. Deletion (`.delete()`): delete entries and their dependencies\n", |
| 795 | + "3. Drop (`.drop()`): drop the table from the schema" |
788 | 796 | ] |
789 | 797 | }, |
790 | 798 | { |
|
805 | 813 | "cell_type": "markdown", |
806 | 814 | "metadata": {}, |
807 | 815 | "source": [ |
808 | | - "##### Exact match" |
| 816 | + "#### Exact match" |
809 | 817 | ] |
810 | 818 | }, |
811 | 819 | { |
|
876 | 884 | "cell_type": "markdown", |
877 | 885 | "metadata": {}, |
878 | 886 | "source": [ |
879 | | - "### Inequality" |
| 887 | + "#### Inequality" |
880 | 888 | ] |
881 | 889 | }, |
882 | 890 | { |
|
1010 | 1018 | "cell_type": "markdown", |
1011 | 1019 | "metadata": {}, |
1012 | 1020 | "source": [ |
1013 | | - "### Restriction one table with another" |
| 1021 | + "#### Restrict one table with another" |
1014 | 1022 | ] |
1015 | 1023 | }, |
1016 | 1024 | { |
|
1033 | 1041 | "cell_type": "markdown", |
1034 | 1042 | "metadata": {}, |
1035 | 1043 | "source": [ |
1036 | | - "### Combining restrictions" |
| 1044 | + "#### Combine restrictions" |
1037 | 1045 | ] |
1038 | 1046 | }, |
1039 | 1047 | { |
|
1079 | 1087 | "cell_type": "markdown", |
1080 | 1088 | "metadata": {}, |
1081 | 1089 | "source": [ |
1082 | | - "### Negative restriction - with the `-` operator" |
| 1090 | + "#### Negative restriction: with the `-` operator" |
1083 | 1091 | ] |
1084 | 1092 | }, |
1085 | 1093 | { |
|
1134 | 1142 | "source": [ |
1135 | 1143 | "Behavior of join:\n", |
1136 | 1144 | "\n", |
1137 | | - "1. match the common field(s) of the primary keys in the two tables\n", |
1138 | | - "2. do a combination of the non-matched part of the primary key\n", |
1139 | | - "3. listing out the secondary attributes for each combination\n", |
1140 | | - "4. if two tables have secondary attributes that share a same name, it will throw an error. To join, we need to rename that attribute for at least one of the tables." |
| 1145 | + "1. Match the common field(s) of the primary keys in the two tables.\n", |
| 1146 | + "2. Do a combination of the non-matched part of the primary key.\n", |
| 1147 | + "3. Listing out the secondary attributes for each combination.\n", |
| 1148 | + "4. If two tables have secondary attributes that share a same name, it will throw an error. To join, we need to rename that attribute for at least one of the tables." |
1141 | 1149 | ] |
1142 | 1150 | }, |
1143 | 1151 | { |
|
0 commit comments