Skip to content

Commit 7b6fae1

Browse files
Merge pull request #2 from MilagrosMarin/fix_update_tutorial
Fix the bug with the Jupyter Notebook version before merge
2 parents ce8b2e2 + bd0d539 commit 7b6fae1

File tree

3 files changed

+93
-59
lines changed

3 files changed

+93
-59
lines changed

README.md

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,21 +1,24 @@
11
# Welcome to DataJoint tutorials!
22

3-
DataJoint is an open-source library for science labs to design and build data pipelines for automated data analysis and sharing.
3+
DataJoint is an open-source library for scientific research labs to design and build
4+
data pipelines for automated data analysis and sharing.
45

5-
This document will guide you as a new DataJoint user through interactive tutorials organized in [Jupyter notebooks](https://jupyter-notebook.readthedocs.io/en/stable/) and written in [Python](https://www.python.org/).
6+
This document will guide you through interactive tutorials written in
7+
[Python](https://www.python.org/) and organized in [Jupyter
8+
notebooks](https://jupyter-notebook.readthedocs.io/en/stable/).
69

710
*Please note that these hands-on DataJoint tutorials are friendly to non-expert users, and advanced programming skills are not required.*
811

912

1013
## Table of contents
11-
- In the [tutorials](./tutorials) folder are interactive Jupyter notebooks to learn DataJoint. The calcium imaging and electrophysiology tutorials provide examples of defining and interacting with data pipelines. In addition, some fill-in-the-blank sections are included for you to code yourself!
14+
- The [tutorials](./tutorials) folder contains interactive Jupyter notebooks designed to teach DataJoint. The calcium imaging and electrophysiology tutorials provide examples of defining and interacting with data pipelines. In addition, some fill-in-the-blank sections are included for you to code yourself!
1215
- 01-DataJoint Basics
1316
- 02-Calcium Imaging Imported Tables
1417
- 03-Calcium Imaging Computed Tables
1518
- 04-Electrophysiology Imported Tables
1619
- 05-Electrophysiology Computed Tables
1720

18-
- In the [completed_tutorials](./completed_tutorials) folder are Jupyter notebooks with the code sections completed and solved.
21+
- The [completed_tutorials](./completed_tutorials) folder contains Jupyter notebooks with all code sections completed and solved.
1922

2023
- You will find the following notebooks in the [short_tutorials](./short_tutorials) folder:
2124
- DataJoint in 30min

tutorials/01-DataJoint Basics.ipynb

Lines changed: 83 additions & 47 deletions
Original file line numberDiff line numberDiff line change
@@ -51,7 +51,9 @@
5151
"If you visit the [documentation for DataJoint](https://docs.datajoint.io/introduction/Data-pipelines.html), we define a data pipeline as follows:\n",
5252
"> A data pipeline is a sequence of steps (more generally a directed acyclic graph) with integrated storage at each step. These steps may be thought of as nodes in a graph.\n",
5353
"\n",
54-
"While this is an accurate description, it may not be the most intuitive definition. Put succinctly, a data pipeline is a listing or a \"map\" of various \"things\" that you work with in a project, with line connecting things to each other to indicate their dependencies. The \"things\" in a data pipeline tends to be the *nouns* you find when describing a project. The \"things\" may include anything from mouse, experimenter, equipment, to experiment session, trial, two-photon scans, electric activities, to receptive fields, neuronal spikes, to figures for a publication! A data pipeline gives you a framework to:\n",
54+
">* Nodes in this graph are represented as database **tables**. Examples of such tables include \"Subject\", \"Session\", \"Implantation\", \"Experimenter\", \"Equipment\", but also \"OptoWaveform\", \"OptoStimParams\", or \"Neuronal spikes\". \n",
55+
"\n",
56+
">* The data pipeline is formed by making these tables interdependent (as the nodes are connected in a network). A **dependency** is a situation where a step of the data pipeline is dependent on a result from a sequentially previous step before it can complete its execution. A dependency graph forms an entire cohesive data pipeline. \n",
5557
"\n",
5658
"1. define these \"things\" as tables in which you can store the information about them\n",
5759
"2. define the relationships (in particular the dependencies) between the \"things\"\n",
@@ -65,7 +67,7 @@
6567
"cell_type": "markdown",
6668
"metadata": {},
6769
"source": [
68-
"# Building our first pipeline: "
70+
"##### Practical examples"
6971
]
7072
},
7173
{
@@ -129,21 +131,21 @@
129131
"cell_type": "markdown",
130132
"metadata": {},
131133
"source": [
132-
"Just by going though the description, we can start to identify **things** or **entities** that we might want to store and represent in our data pipeline:\n",
133-
"\n",
134-
"* mouse\n",
135-
"* experimental session\n",
136-
"\n",
137-
"For ephys:\n",
138-
"\n",
139-
">* neuron\n",
140-
">* spikes\n",
141-
"\n",
142-
"For calcium imaging:\n",
143-
"\n",
144-
">* scan\n",
145-
">* regions of interest\n",
146-
">* trace"
134+
"Just by going through the description, we can start to identify **entities** that needs to be stored and represented in our data pipeline:\n",
135+
"\n",
136+
"* mouse\n",
137+
"* experimental session\n",
138+
"\n",
139+
"For ephys:\n",
140+
"\n",
141+
">* neuron\n",
142+
">* spikes\n",
143+
"\n",
144+
"For calcium imaging:\n",
145+
"\n",
146+
">* scan\n",
147+
">* regions of interest\n",
148+
">* trace"
147149
]
148150
},
149151
{
@@ -157,16 +159,32 @@
157159
"cell_type": "markdown",
158160
"metadata": {},
159161
"source": [
160-
"In DataJoint data pipeline, we represent these **entities** as **tables**. Different *kinds* of entities become distinct tables, and each row of the table is a single example (instance) of the category of entity. \n",
161-
"\n",
162-
"For example, if we have a `Mouse` table, then each row in the mouse table represents a single mouse!"
162+
"### Schemas and tables"
163+
]
164+
},
165+
{
166+
"cell_type": "markdown",
167+
"metadata": {},
168+
"source": [
169+
"##### Concepts"
163170
]
164171
},
165172
{
166173
"cell_type": "markdown",
167174
"metadata": {},
168175
"source": [
169-
"When constructing such table, we need to figure out what it would take to **uniquely identify** each entry. Let's take the example of the **mouse** and think about what it would take to uniquely identify a mouse."
176+
"In a data pipeline, we represent these **entities** as **tables**. Different *kinds* of entities become distinct tables, and each table row is a single example (instance) of the entity's category. \n",
177+
"\n",
178+
"For example, if we have a `Mouse` table, each row in the mouse table represents a single mouse. \n",
179+
"\n",
180+
"It is essential to think about what information will **uniquely identify** each entry. \n",
181+
"\n",
182+
"In this case, the information that uniquely identifies the `Mouse` table is their **mouse IDs** - a unique ID number assigned to each animal in the lab. This attribute is named the **primary key** of the table.\n",
183+
"\n",
184+
"| Mouse_ID (*Primary key attribute*)|\n",
185+
"|:--------: | \n",
186+
"| 11234 |\n",
187+
"| 11432 |"
170188
]
171189
},
172190
{
@@ -175,17 +193,21 @@
175193
"source": [
176194
"After some thought, we might conclude that each mouse can be uniquely identified by knowing its **mouse ID** - a unique ID number assigned to each mouse in the lab. The mouse ID is then a column in the table or an **attribute** that can be used to **uniquely identify** each mouse. Such attribute is called the **primary key** of the table.\n",
177195
"\n",
178-
"| mouse_id* |\n",
179-
"|:--------:|\n",
180-
"| 11234 |\n",
181-
"| 11432 |"
196+
"The mouse ID is then a column in the table or an **attribute** that can be used to **uniquely identify** each mouse. \n",
197+
"\n",
198+
"Such an attribute is called the **primary key** of the table: the subset of table attributes uniquely identifying each entity in the table. The **secondary attribute** refers to any field in a table, not in the primary key.\n",
199+
"\n",
200+
"| Mouse_ID (*Primary key attribute*) \n",
201+
"|:--------:| \n",
202+
"| 11234 (*Secondary attribute*)\n",
203+
"| 11432 (*Secondary attribute*)"
182204
]
183205
},
184206
{
185207
"cell_type": "markdown",
186208
"metadata": {},
187209
"source": [
188-
"Once we have successfully identified the primary key of the table, we can now think about what other columns, or **non-primary key attributes** that we would want to include in the table. These are additional information **about each entry in the table that we want to store**."
210+
"Once we have successfully identified the table's primary key, we can now think about what other columns, or **non-primary key attributes** - additional information **about each entry in the table that need to be stored as well**."
189211
]
190212
},
191213
{
@@ -199,7 +221,7 @@
199221
"cell_type": "markdown",
200222
"metadata": {},
201223
"source": [
202-
"| mouse_id* | dob | sex |\n",
224+
"| Mouse_ID | DOB | sex |\n",
203225
"|:--------:|------------|--------|\n",
204226
"| 11234 | 2017-11-17 | M |\n",
205227
"| 11432 | 2018-03-04 | F |"
@@ -209,14 +231,21 @@
209231
"cell_type": "markdown",
210232
"metadata": {},
211233
"source": [
212-
"Now we have an idea on how to represent information about mouse, let's create the table using **DataJoint**!"
234+
"Now that we have an idea of how to represent information about the mouse, let's create the table using **DataJoint**!"
213235
]
214236
},
215237
{
216238
"cell_type": "markdown",
217239
"metadata": {},
218240
"source": [
219-
"## Create a schema - house for your tables"
241+
"##### Practical example"
242+
]
243+
},
244+
{
245+
"cell_type": "markdown",
246+
"metadata": {},
247+
"source": [
248+
"##### Schema"
220249
]
221250
},
222251
{
@@ -254,14 +283,14 @@
254283
"cell_type": "markdown",
255284
"metadata": {},
256285
"source": [
257-
"## Creating your first table"
286+
"##### Table"
258287
]
259288
},
260289
{
261290
"cell_type": "markdown",
262291
"metadata": {},
263292
"source": [
264-
"In DataJoint, you define each table as a class, and provide the table definition (e.g. attribute definitions) as the `definition` static string property. The class will inherit from the `dj.Manual` class provided by DataJoint (more on this later)."
293+
"In DataJoint, you define each table as a `class`, and provide the table definition (e.g., attribute definitions) as the `definition` static string property. The class will inherit from the `dj.Manual` class provided by DataJoint (more on this later)."
265294
]
266295
},
267296
{
@@ -301,7 +330,14 @@
301330
"cell_type": "markdown",
302331
"metadata": {},
303332
"source": [
304-
"## Insert entries with `insert1` and `insert` methods"
333+
"### Basic relational operators"
334+
]
335+
},
336+
{
337+
"cell_type": "markdown",
338+
"metadata": {},
339+
"source": [
340+
"##### Insert operators"
305341
]
306342
},
307343
{
@@ -441,7 +477,7 @@
441477
"cell_type": "markdown",
442478
"metadata": {},
443479
"source": [
444-
"## Data integrity"
480+
"##### Data integrity"
445481
]
446482
},
447483
{
@@ -527,31 +563,31 @@
527563
"cell_type": "markdown",
528564
"metadata": {},
529565
"source": [
530-
"As with mouse, we should think about **what information (i.e. attributes) is needed to uniquely identify an experimental session**. Here is the relevant section of the project description:\n",
531-
"\n",
532-
"> * As a hard working neuroscientist, you perform experiments every day, sometimes working with **more than one mouse in a day**! However, on an any given day, **a mouse undergoes at most one recording session**.\n",
533-
"> * For each experimental session, you would like to record **what mouse you worked with** and **when you performed the experiment**. You would also like to keep track of other helpful information such as the **experimental setup** you worked on."
566+
"As with `mouse`, we should consider **what information (i.e., attributes) is needed to identify an `experimental session`** uniquely. Here is the relevant section of the project description:\n",
567+
"\n",
568+
"> * As a hard working neuroscientist, you perform experiments every day, sometimes working with **more than one mouse in a day**! However, on an any given day, **a mouse undergoes at most one recording session**.\n",
569+
"> * For each experimental session, you would like to record **what mouse you worked with** and **when you performed the experiment**. You would also like to keep track of other helpful information such as the **experimental setup** you worked on."
534570
]
535571
},
536572
{
537573
"cell_type": "markdown",
538574
"metadata": {},
539575
"source": [
540-
"Based on the above, it appears that you need to know:\n",
541-
"\n",
542-
"* the date of the session\n",
543-
"* the mouse you recorded from in that session\n",
544-
"\n",
545-
"to uniquely identify a single experimental session."
576+
"Based on the above, it seems that you need to know these two data to uniquely identify a single experimental session:\n",
577+
"\n",
578+
"* the date of the session\n",
579+
"* the mouse you recorded from in that session\n",
580+
"\n",
581+
"to uniquely identify a single experimental session."
546582
]
547583
},
548584
{
549585
"cell_type": "markdown",
550586
"metadata": {},
551587
"source": [
552-
"Note that, to uniquely identify an experimental session (or simply a **session**), we need to know the mouse that the session was about. In other words, a session cannot existing without a corresponding mouse! \n",
553-
"\n",
554-
"With **mouse** already represented as a table in our pipeline, we say that the session **depends on** the mouse! We could graphically represent this in an **entity relationship diagram (ERD)** by drawing the line between two tables, with the one below (**session**) depending on the one above (**mouse**)."
588+
"Note that, to uniquely identify an experimental session (or simply a `Session`), we need to know the mouse that the session was about. In other words, a session cannot existing without a corresponding mouse! \n",
589+
"\n",
590+
"With **mouse** already represented as a table in our pipeline, we say that the session **depends on** the mouse! We could graphically represent this in an **entity relationship diagram (ERD)** by drawing the line between two tables, with the one below (**session**) depending on the one above (**mouse**)."
555591
]
556592
},
557593
{
@@ -560,7 +596,7 @@
560596
"source": [
561597
"Thus we will need both **mouse** and a new attribute **session_date** to uniquely identify a single session. \n",
562598
"\n",
563-
"Remember that a **mouse** is already uniquely identified by its primary key - **mouse_id**. In DataJoint, you can declare that **session** depends on the mouse, and DataJoint will automatically include the mouse's primary key (`mouse_id`) as part of the session's primary key, along side any additional attribute(s) you specify."
599+
"Remember that a **mouse** is uniquely identified by its primary key - **mouse_id**. In DataJoint, you can declare that **session** depends on the mouse, and DataJoint will automatically include the mouse's primary key (`mouse_id`) as part of the session's primary key, alongside any additional attribute(s) you specify."
564600
]
565601
},
566602
{

tutorials/02-Calcium Imaging Imported Tables.ipynb

Lines changed: 3 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -73,11 +73,7 @@
7373
"cell_type": "markdown",
7474
"metadata": {},
7575
"source": [
76-
"In the `data` folder in this `DataJoint-Tutorials`, you can find a small dataset of three different cases of calcium imaging scans:\n",
77-
"\n",
78-
"- `/workspaces/datajoint-tutorials/data/example_scan_02.tif`\n",
79-
"- `/workspaces/datajoint-tutorials/data/example_scan_03.tif`\n",
80-
"- `/workspaces/datajoint-tutorials/data/example_scan_01.tif`\n",
76+
"In the `data` folder in this `DataJoint-Tutorials`, you can find a small dataset of three different cases of calcium imaging scans: `example_scan_01.tif`, `example_scan_02.tif`and `example_scan_03.tif`.\n",
8177
"\n",
8278
"As you might know, calcium imaging scans (raw data) are stored as *.tif* files. \n",
8379
"\n",
@@ -291,8 +287,7 @@
291287
"source": [
292288
"Particularly, this example contains 100 frames. \n",
293289
"\n",
294-
"Let's calculate the average of the images over the frames and plot the result.\n",
295-
"\n"
290+
"Let's calculate the average of the images over the frames and plot the result.\n"
296291
]
297292
},
298293
{
@@ -387,7 +382,7 @@
387382
"source": [
388383
"In DataJoint, the tier of the table indicates **the nature of the data and the data source for the table**. So far we have encountered two table tiers: `Manual` and `Imported`, and we will encounter the two other major tiers in this session. \n",
389384
"\n",
390-
"DataJoint tables in `Manual` tier, or simply **Manual tables** indicate that its contents are **manually** entered by either experimenters or a recording system, and its content **do not depend on external data files or other tables**. This is the most basic table type you will encounter, especially as the tables at the beginning of the pipeline. In the Diagram, `Manual` tables are depicted by green rectangles.\n",
385+
"DataJoint tables in `Manual` tier, or simply **Manual tables** indicate that its contents are **manually** entered by either experimenters or a recording system, and its content **do not depend on external data files or other tables**. This is the most basic table type you will encounter, especially as the tables at the beginning of the pipeline. In the diagram, `Manual` tables are depicted by green rectangles.\n",
391386
"\n",
392387
"On the other hand, **Imported tables** are understood to pull data (or *import* data) from external data files, and come equipped with functionalities to perform this importing process automatically, as we will see shortly! In the Diagram, `Imported` tables are depicted by blue ellipses."
393388
]

0 commit comments

Comments
 (0)