|
51 | 51 | "If you visit the [documentation for DataJoint](https://docs.datajoint.io/introduction/Data-pipelines.html), we define a data pipeline as follows:\n", |
52 | 52 | "> A data pipeline is a sequence of steps (more generally a directed acyclic graph) with integrated storage at each step. These steps may be thought of as nodes in a graph.\n", |
53 | 53 | "\n", |
54 | | - "While this is an accurate description, it may not be the most intuitive definition. Put succinctly, a data pipeline is a listing or a \"map\" of various \"things\" that you work with in a project, with line connecting things to each other to indicate their dependencies. The \"things\" in a data pipeline tends to be the *nouns* you find when describing a project. The \"things\" may include anything from mouse, experimenter, equipment, to experiment session, trial, two-photon scans, electric activities, to receptive fields, neuronal spikes, to figures for a publication! A data pipeline gives you a framework to:\n", |
| 54 | + ">* Nodes in this graph are represented as database **tables**. Examples of such tables include \"Subject\", \"Session\", \"Implantation\", \"Experimenter\", \"Equipment\", but also \"OptoWaveform\", \"OptoStimParams\", or \"Neuronal spikes\". \n", |
| 55 | + "\n", |
| 56 | + ">* The data pipeline is formed by making these tables interdependent (as the nodes are connected in a network). A **dependency** is a situation where a step of the data pipeline is dependent on a result from a sequentially previous step before it can complete its execution. A dependency graph forms an entire cohesive data pipeline. \n", |
55 | 57 | "\n", |
56 | 58 | "1. define these \"things\" as tables in which you can store the information about them\n", |
57 | 59 | "2. define the relationships (in particular the dependencies) between the \"things\"\n", |
|
65 | 67 | "cell_type": "markdown", |
66 | 68 | "metadata": {}, |
67 | 69 | "source": [ |
68 | | - "# Building our first pipeline: " |
| 70 | + "##### Practical examples" |
69 | 71 | ] |
70 | 72 | }, |
71 | 73 | { |
|
129 | 131 | "cell_type": "markdown", |
130 | 132 | "metadata": {}, |
131 | 133 | "source": [ |
132 | | - "Just by going though the description, we can start to identify **things** or **entities** that we might want to store and represent in our data pipeline:\n", |
133 | | - "\n", |
134 | | - "* mouse\n", |
135 | | - "* experimental session\n", |
136 | | - "\n", |
137 | | - "For ephys:\n", |
138 | | - "\n", |
139 | | - ">* neuron\n", |
140 | | - ">* spikes\n", |
141 | | - "\n", |
142 | | - "For calcium imaging:\n", |
143 | | - "\n", |
144 | | - ">* scan\n", |
145 | | - ">* regions of interest\n", |
146 | | - ">* trace" |
| 134 | + "Just by going through the description, we can start to identify **entities** that needs to be stored and represented in our data pipeline:\n", |
| 135 | + "\n", |
| 136 | + "* mouse\n", |
| 137 | + "* experimental session\n", |
| 138 | + "\n", |
| 139 | + "For ephys:\n", |
| 140 | + "\n", |
| 141 | + ">* neuron\n", |
| 142 | + ">* spikes\n", |
| 143 | + "\n", |
| 144 | + "For calcium imaging:\n", |
| 145 | + "\n", |
| 146 | + ">* scan\n", |
| 147 | + ">* regions of interest\n", |
| 148 | + ">* trace" |
147 | 149 | ] |
148 | 150 | }, |
149 | 151 | { |
|
157 | 159 | "cell_type": "markdown", |
158 | 160 | "metadata": {}, |
159 | 161 | "source": [ |
160 | | - "In DataJoint data pipeline, we represent these **entities** as **tables**. Different *kinds* of entities become distinct tables, and each row of the table is a single example (instance) of the category of entity. \n", |
161 | | - "\n", |
162 | | - "For example, if we have a `Mouse` table, then each row in the mouse table represents a single mouse!" |
| 162 | + "### Schemas and tables" |
| 163 | + ] |
| 164 | + }, |
| 165 | + { |
| 166 | + "cell_type": "markdown", |
| 167 | + "metadata": {}, |
| 168 | + "source": [ |
| 169 | + "##### Concepts" |
163 | 170 | ] |
164 | 171 | }, |
165 | 172 | { |
166 | 173 | "cell_type": "markdown", |
167 | 174 | "metadata": {}, |
168 | 175 | "source": [ |
169 | | - "When constructing such table, we need to figure out what it would take to **uniquely identify** each entry. Let's take the example of the **mouse** and think about what it would take to uniquely identify a mouse." |
| 176 | + "In a data pipeline, we represent these **entities** as **tables**. Different *kinds* of entities become distinct tables, and each table row is a single example (instance) of the entity's category. \n", |
| 177 | + "\n", |
| 178 | + "For example, if we have a `Mouse` table, each row in the mouse table represents a single mouse. \n", |
| 179 | + "\n", |
| 180 | + "It is essential to think about what information will **uniquely identify** each entry. \n", |
| 181 | + "\n", |
| 182 | + "In this case, the information that uniquely identifies the `Mouse` table is their **mouse IDs** - a unique ID number assigned to each animal in the lab. This attribute is named the **primary key** of the table.\n", |
| 183 | + "\n", |
| 184 | + "| Mouse_ID (*Primary key attribute*)|\n", |
| 185 | + "|:--------: | \n", |
| 186 | + "| 11234 |\n", |
| 187 | + "| 11432 |" |
170 | 188 | ] |
171 | 189 | }, |
172 | 190 | { |
|
175 | 193 | "source": [ |
176 | 194 | "After some thought, we might conclude that each mouse can be uniquely identified by knowing its **mouse ID** - a unique ID number assigned to each mouse in the lab. The mouse ID is then a column in the table or an **attribute** that can be used to **uniquely identify** each mouse. Such attribute is called the **primary key** of the table.\n", |
177 | 195 | "\n", |
178 | | - "| mouse_id* |\n", |
179 | | - "|:--------:|\n", |
180 | | - "| 11234 |\n", |
181 | | - "| 11432 |" |
| 196 | + "The mouse ID is then a column in the table or an **attribute** that can be used to **uniquely identify** each mouse. \n", |
| 197 | + "\n", |
| 198 | + "Such an attribute is called the **primary key** of the table: the subset of table attributes uniquely identifying each entity in the table. The **secondary attribute** refers to any field in a table, not in the primary key.\n", |
| 199 | + "\n", |
| 200 | + "| Mouse_ID (*Primary key attribute*) \n", |
| 201 | + "|:--------:| \n", |
| 202 | + "| 11234 (*Secondary attribute*)\n", |
| 203 | + "| 11432 (*Secondary attribute*)" |
182 | 204 | ] |
183 | 205 | }, |
184 | 206 | { |
185 | 207 | "cell_type": "markdown", |
186 | 208 | "metadata": {}, |
187 | 209 | "source": [ |
188 | | - "Once we have successfully identified the primary key of the table, we can now think about what other columns, or **non-primary key attributes** that we would want to include in the table. These are additional information **about each entry in the table that we want to store**." |
| 210 | + "Once we have successfully identified the table's primary key, we can now think about what other columns, or **non-primary key attributes** - additional information **about each entry in the table that need to be stored as well**." |
189 | 211 | ] |
190 | 212 | }, |
191 | 213 | { |
|
199 | 221 | "cell_type": "markdown", |
200 | 222 | "metadata": {}, |
201 | 223 | "source": [ |
202 | | - "| mouse_id* | dob | sex |\n", |
| 224 | + "| Mouse_ID | DOB | sex |\n", |
203 | 225 | "|:--------:|------------|--------|\n", |
204 | 226 | "| 11234 | 2017-11-17 | M |\n", |
205 | 227 | "| 11432 | 2018-03-04 | F |" |
|
209 | 231 | "cell_type": "markdown", |
210 | 232 | "metadata": {}, |
211 | 233 | "source": [ |
212 | | - "Now we have an idea on how to represent information about mouse, let's create the table using **DataJoint**!" |
| 234 | + "Now that we have an idea of how to represent information about the mouse, let's create the table using **DataJoint**!" |
213 | 235 | ] |
214 | 236 | }, |
215 | 237 | { |
216 | 238 | "cell_type": "markdown", |
217 | 239 | "metadata": {}, |
218 | 240 | "source": [ |
219 | | - "## Create a schema - house for your tables" |
| 241 | + "##### Practical example" |
| 242 | + ] |
| 243 | + }, |
| 244 | + { |
| 245 | + "cell_type": "markdown", |
| 246 | + "metadata": {}, |
| 247 | + "source": [ |
| 248 | + "##### Schema" |
220 | 249 | ] |
221 | 250 | }, |
222 | 251 | { |
|
254 | 283 | "cell_type": "markdown", |
255 | 284 | "metadata": {}, |
256 | 285 | "source": [ |
257 | | - "## Creating your first table" |
| 286 | + "##### Table" |
258 | 287 | ] |
259 | 288 | }, |
260 | 289 | { |
261 | 290 | "cell_type": "markdown", |
262 | 291 | "metadata": {}, |
263 | 292 | "source": [ |
264 | | - "In DataJoint, you define each table as a class, and provide the table definition (e.g. attribute definitions) as the `definition` static string property. The class will inherit from the `dj.Manual` class provided by DataJoint (more on this later)." |
| 293 | + "In DataJoint, you define each table as a `class`, and provide the table definition (e.g., attribute definitions) as the `definition` static string property. The class will inherit from the `dj.Manual` class provided by DataJoint (more on this later)." |
265 | 294 | ] |
266 | 295 | }, |
267 | 296 | { |
|
301 | 330 | "cell_type": "markdown", |
302 | 331 | "metadata": {}, |
303 | 332 | "source": [ |
304 | | - "## Insert entries with `insert1` and `insert` methods" |
| 333 | + "### Basic relational operators" |
| 334 | + ] |
| 335 | + }, |
| 336 | + { |
| 337 | + "cell_type": "markdown", |
| 338 | + "metadata": {}, |
| 339 | + "source": [ |
| 340 | + "##### Insert operators" |
305 | 341 | ] |
306 | 342 | }, |
307 | 343 | { |
|
441 | 477 | "cell_type": "markdown", |
442 | 478 | "metadata": {}, |
443 | 479 | "source": [ |
444 | | - "## Data integrity" |
| 480 | + "##### Data integrity" |
445 | 481 | ] |
446 | 482 | }, |
447 | 483 | { |
|
527 | 563 | "cell_type": "markdown", |
528 | 564 | "metadata": {}, |
529 | 565 | "source": [ |
530 | | - "As with mouse, we should think about **what information (i.e. attributes) is needed to uniquely identify an experimental session**. Here is the relevant section of the project description:\n", |
531 | | - "\n", |
532 | | - "> * As a hard working neuroscientist, you perform experiments every day, sometimes working with **more than one mouse in a day**! However, on an any given day, **a mouse undergoes at most one recording session**.\n", |
533 | | - "> * For each experimental session, you would like to record **what mouse you worked with** and **when you performed the experiment**. You would also like to keep track of other helpful information such as the **experimental setup** you worked on." |
| 566 | + "As with `mouse`, we should consider **what information (i.e., attributes) is needed to identify an `experimental session`** uniquely. Here is the relevant section of the project description:\n", |
| 567 | + "\n", |
| 568 | + "> * As a hard working neuroscientist, you perform experiments every day, sometimes working with **more than one mouse in a day**! However, on an any given day, **a mouse undergoes at most one recording session**.\n", |
| 569 | + "> * For each experimental session, you would like to record **what mouse you worked with** and **when you performed the experiment**. You would also like to keep track of other helpful information such as the **experimental setup** you worked on." |
534 | 570 | ] |
535 | 571 | }, |
536 | 572 | { |
537 | 573 | "cell_type": "markdown", |
538 | 574 | "metadata": {}, |
539 | 575 | "source": [ |
540 | | - "Based on the above, it appears that you need to know:\n", |
541 | | - "\n", |
542 | | - "* the date of the session\n", |
543 | | - "* the mouse you recorded from in that session\n", |
544 | | - "\n", |
545 | | - "to uniquely identify a single experimental session." |
| 576 | + "Based on the above, it seems that you need to know these two data to uniquely identify a single experimental session:\n", |
| 577 | + "\n", |
| 578 | + "* the date of the session\n", |
| 579 | + "* the mouse you recorded from in that session\n", |
| 580 | + "\n", |
| 581 | + "to uniquely identify a single experimental session." |
546 | 582 | ] |
547 | 583 | }, |
548 | 584 | { |
549 | 585 | "cell_type": "markdown", |
550 | 586 | "metadata": {}, |
551 | 587 | "source": [ |
552 | | - "Note that, to uniquely identify an experimental session (or simply a **session**), we need to know the mouse that the session was about. In other words, a session cannot existing without a corresponding mouse! \n", |
553 | | - "\n", |
554 | | - "With **mouse** already represented as a table in our pipeline, we say that the session **depends on** the mouse! We could graphically represent this in an **entity relationship diagram (ERD)** by drawing the line between two tables, with the one below (**session**) depending on the one above (**mouse**)." |
| 588 | + "Note that, to uniquely identify an experimental session (or simply a `Session`), we need to know the mouse that the session was about. In other words, a session cannot existing without a corresponding mouse! \n", |
| 589 | + "\n", |
| 590 | + "With **mouse** already represented as a table in our pipeline, we say that the session **depends on** the mouse! We could graphically represent this in an **entity relationship diagram (ERD)** by drawing the line between two tables, with the one below (**session**) depending on the one above (**mouse**)." |
555 | 591 | ] |
556 | 592 | }, |
557 | 593 | { |
|
560 | 596 | "source": [ |
561 | 597 | "Thus we will need both **mouse** and a new attribute **session_date** to uniquely identify a single session. \n", |
562 | 598 | "\n", |
563 | | - "Remember that a **mouse** is already uniquely identified by its primary key - **mouse_id**. In DataJoint, you can declare that **session** depends on the mouse, and DataJoint will automatically include the mouse's primary key (`mouse_id`) as part of the session's primary key, along side any additional attribute(s) you specify." |
| 599 | + "Remember that a **mouse** is uniquely identified by its primary key - **mouse_id**. In DataJoint, you can declare that **session** depends on the mouse, and DataJoint will automatically include the mouse's primary key (`mouse_id`) as part of the session's primary key, alongside any additional attribute(s) you specify." |
564 | 600 | ] |
565 | 601 | }, |
566 | 602 | { |
|
0 commit comments