Skip to content

Commit ae5ad5f

Browse files
Merge pull request #12 from dimitri-yatsenko/main
Refresh sections on relationships and diagrams.
2 parents f65bc4f + d996d03 commit ae5ad5f

26 files changed

+8094
-7055
lines changed

book/00-introduction/49-connect.ipynb

Lines changed: 56 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -28,19 +28,34 @@
2828
{
2929
"cell_type": "markdown",
3030
"metadata": {},
31-
"source": "# Connect with DataJoint\n\nDataJoint is the primary way to connect to the database in this book. The DataJoint client library reads the database credentials from the environment variables `DJ_HOST`, `DJ_USER`, and `DJ_PASS`. \n\nSimply importing the DataJoint library is sufficient—it will connect to the database automatically when needed. Here we call `dj.conn()` only to verify the connection, but this step is not required in normal use."
31+
"source": [
32+
"# Connect with DataJoint\n",
33+
"\n",
34+
"DataJoint is the primary way to connect to the database in this book. The DataJoint client library reads the database credentials from the environment variables `DJ_HOST`, `DJ_USER`, and `DJ_PASS`. \n",
35+
"\n",
36+
"Simply importing the DataJoint library is sufficient—it will connect to the database automatically when needed. Here we call `dj.conn()` only to verify the connection, but this step is not required in normal use."
37+
]
3238
},
3339
{
3440
"cell_type": "code",
3541
"execution_count": null,
3642
"metadata": {},
3743
"outputs": [],
38-
"source": "import datajoint as dj\ndj.conn() # test the connection (optional)"
44+
"source": [
45+
"import datajoint as dj\n",
46+
"dj.conn() # test the connection (optional)"
47+
]
3948
},
4049
{
4150
"cell_type": "markdown",
4251
"metadata": {},
43-
"source": "# Connect with SQL Magic\n\nSQL \"Jupyter magic\" allows executing SQL statements directly in Jupyter notebooks, implemented by the [`jupysql`](https://ploomber.io/blog/jupysql/) library. This is useful for quick interactive SQL queries and for learning SQL syntax. We will use SQL magic in this book for demonstrating SQL concepts, but it is not used as part of Python application code.\n\nThe following cell sets up the SQL magic connection to the database."
52+
"source": [
53+
"# Connect with SQL Magic\n",
54+
"\n",
55+
"SQL \"Jupyter magic\" allows executing SQL statements directly in Jupyter notebooks, implemented by the [`jupysql`](https://ploomber.io/blog/jupysql/) library. This is useful for quick interactive SQL queries and for learning SQL syntax. We will use SQL magic in this book for demonstrating SQL concepts, but it is not used as part of Python application code.\n",
56+
"\n",
57+
"The following cell sets up the SQL magic connection to the database."
58+
]
4459
},
4560
{
4661
"cell_type": "code",
@@ -51,43 +66,74 @@
5166
}
5267
},
5368
"outputs": [],
54-
"source": "%load_ext sql\n%sql mysql+pymysql://dev:devpass@db"
69+
"source": [
70+
"%load_ext sql\n",
71+
"%sql mysql+pymysql://dev:devpass@db"
72+
]
5573
},
5674
{
5775
"cell_type": "markdown",
5876
"metadata": {},
59-
"source": "You can issue SQL commands from a Jupyter cell by starting it with `%%sql`.\nChange the cell type to `SQL` for appropriate syntax highlighting."
77+
"source": [
78+
"You can issue SQL commands from a Jupyter cell by starting it with `%%sql`.\n",
79+
"Change the cell type to `SQL` for appropriate syntax highlighting."
80+
]
6081
},
6182
{
6283
"cell_type": "code",
6384
"execution_count": null,
6485
"metadata": {},
6586
"outputs": [],
66-
"source": "%%sql\n-- show all users\nSELECT User FROM mysql.user"
87+
"source": [
88+
"%%sql\n",
89+
"-- show all users\n",
90+
"SELECT User FROM mysql.user"
91+
]
6792
},
6893
{
6994
"cell_type": "markdown",
7095
"metadata": {},
71-
"source": "# Connect with a Python MySQL Client\n\nTo issue SQL queries directly from Python code (outside of Jupyter magic), you can use a conventional SQL client library such as `pymysql`. This approach gives you full programmatic control over database interactions and is useful when you need to execute raw SQL within Python scripts."
96+
"source": [
97+
"# Connect with a Python MySQL Client\n",
98+
"\n",
99+
"To issue SQL queries directly from Python code (outside of Jupyter magic), you can use a conventional SQL client library such as `pymysql`. This approach gives you full programmatic control over database interactions and is useful when you need to execute raw SQL within Python scripts."
100+
]
72101
},
73102
{
74103
"cell_type": "code",
75104
"execution_count": null,
76105
"metadata": {},
77106
"outputs": [],
78-
"source": "import os\nimport pymysql\n\n# create a database connection\nconn = pymysql.connect(\n host=os.environ['DJ_HOST'], \n user=os.environ['DJ_USER'], \n password=os.environ['DJ_PASS']\n)"
107+
"source": [
108+
"import os\n",
109+
"import pymysql\n",
110+
"\n",
111+
"# create a database connection\n",
112+
"conn = pymysql.connect(\n",
113+
" host=os.environ['DJ_HOST'], \n",
114+
" user=os.environ['DJ_USER'], \n",
115+
" password=os.environ['DJ_PASS']\n",
116+
")"
117+
]
79118
},
80119
{
81120
"cell_type": "code",
82121
"execution_count": null,
83122
"metadata": {},
84123
"outputs": [],
85-
"source": "# create a query cursor and issue an SQL query\ncur = conn.cursor()\ncur.execute('SELECT User FROM mysql.user')\ncur.fetchall()"
124+
"source": [
125+
"# create a query cursor and issue an SQL query\n",
126+
"cur = conn.cursor()\n",
127+
"cur.execute('SELECT User FROM mysql.user')\n",
128+
"cur.fetchall()"
129+
]
86130
},
87131
{
88132
"cell_type": "markdown",
89133
"metadata": {},
90-
"source": "We are all set for executing all the database queries in this book!"
134+
"source": [
135+
"We are all set for executing all the database queries in this book!"
136+
]
91137
}
92138
],
93139
"metadata": {

book/20-concepts/04-integrity.md

Lines changed: 5 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,5 @@
11
---
22
title: Data Integrity
3-
date: 2025-10-31
4-
authors:
5-
- name: Dimitri Yatsenko
63
---
74

85
# Why Data Integrity Matters
@@ -95,7 +92,7 @@ Entity integrity ensures a **one-to-one correspondence** between real-world enti
9592
**Example:** Each mouse in the lab has exactly one unique ID, and that ID refers to exactly one mouse—never two different mice sharing the same ID, and never one mouse having multiple IDs.
9693

9794
**Covered in:**
98-
- [Primary Keys](../30-design/020-primary-key.md) — Entity integrity and the 1:1 correspondence guarantee (elaborated in detail)
95+
- [Primary Keys](../30-design/018-primary-key.md) — Entity integrity and the 1:1 correspondence guarantee (elaborated in detail)
9996
- [UUID](../85-special-topics/025-uuid.ipynb) — Universally unique identifiers
10097

10198
---
@@ -111,7 +108,7 @@ Referential integrity maintains logical associations across tables:
111108
**Example:** A recording session cannot reference a non-existent mouse.
112109

113110
**Covered in:**
114-
- [Foreign Keys](../30-design/030-foreign-keys.ipynb) — Cross-table relationships
111+
- [Foreign Keys](../30-design/030-foreign-keys.md) — Cross-table relationships
115112
- [Relationships](../30-design/050-relationships.ipynb) — Dependency patterns
116113

117114
---
@@ -161,7 +158,7 @@ Workflow integrity maintains valid operation sequences through:
161158
**Example:** An analysis pipeline cannot compute results before acquiring raw data. If `NeuronAnalysis` depends on `SpikeData`, which depends on `RecordingSession`, the database enforces that recordings are created before spike detection, which occurs before analysis—maintaining the integrity of the entire scientific workflow.
162159

163160
**Covered in:**
164-
- [Foreign Keys](../30-design/030-foreign-keys.ipynb) — How foreign keys encode workflow dependencies
161+
- [Foreign Keys](../30-design/030-foreign-keys.md) — How foreign keys encode workflow dependencies
165162
- [Populate](../40-operations/050-populate.ipynb) — Automatic workflow execution and dependency resolution
166163

167164
---
@@ -212,8 +209,8 @@ Now that you understand *why* integrity matters, the next chapter introduces how
212209
The [Design](../30-design/010-schema.ipynb) section then shows *how* to implement each constraint type:
213210
214211
1. **[Tables](../30-design/015-table.ipynb)** — Basic structure with domain integrity
215-
2. **[Primary Keys](../30-design/020-primary-key.md)** — Entity integrity through unique identification
216-
3. **[Foreign Keys](../30-design/030-foreign-keys.ipynb)** — Referential integrity across tables
212+
2. **[Primary Keys](../30-design/018-primary-key.md)** — Entity integrity through unique identification
213+
3. **[Foreign Keys](../30-design/030-foreign-keys.md)** — Referential integrity across tables
217214
218215
Each chapter builds on these foundational integrity concepts.
219216
```

book/30-design/010-schema.ipynb

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
{
44
"cell_type": "markdown",
55
"metadata": {},
6-
"source": "---\ntitle: Schemas\nauthors:\n - name: Dimitri Yatsenko\n---\n\n# What is a schema?\n\nThe term schema has two related meanings in the context of databases:\n\n## 1. Schema as a Data Blueprint\nA **schema** is a formal specification of the structure of data and the rules governing its integrity.\nIt serves as a blueprint that defines how data is organized, stored, and accessed within a database.\nThis ensures that the database reflects the rules and requirements of the underlying business or research project it supports.\n\nIn structured data models, such as the relational model, a schema provides a robust framework for defining:\n* The structure of tables (relations) and their attributes (columns).\n* Rules and constraints that ensure data consistency, accuracy, and reliability.\n* Relationships between tables, such as primary keys (unique identifiers for records) and foreign keys (references to related records in other tables).\n\n### Aims of Good Schema Design\n* **Data Integrity**: Ensures consistency and prevents anomalies.\n* **Query Efficiency**: Facilitates fast and accurate data retrieval, supports complex queries, and optimizes database performance.\n* **Scalability**: Allows the database to grow and adapt as data volumes increase.\n\n### Key Elements of Schema Design\n* **Tables and Attributes**: Each table is defined with specific attributes (columns), each assigned a data type.\n* **Primary Keys**: Uniquely identify each record in a table.\n* **Foreign Keys**: Establish relationships between entities in tables.\n* **Indexes**: Support efficient queries.\n\nThrough careful schema design, database architects create systems that are both efficient and flexible, meeting the current and future needs of an organization. The schema acts as a living document that guides the structure, operations, and integrity of the database.\n\n## 2. Schema as a Database Module\n\nIn complex database designs, the term \"schema\" is also used to describe a distinct module of a larger database with its own namespace that groups related tables together. \nThis modular approach:\n* Separates tables into logical groups for better organization.\n* Avoids naming conflicts in large databases with multiple schemas."
6+
"source": "---\ntitle: Schemas\n---\n\n# What is a schema?\n\nThe term schema has two related meanings in the context of databases:\n\n## 1. Schema as a Data Blueprint\nA **schema** is a formal specification of the structure of data and the rules governing its integrity.\nIt serves as a blueprint that defines how data is organized, stored, and accessed within a database.\nThis ensures that the database reflects the rules and requirements of the underlying business or research project it supports.\n\nIn structured data models, such as the relational model, a schema provides a robust framework for defining:\n* The structure of tables (relations) and their attributes (columns).\n* Rules and constraints that ensure data consistency, accuracy, and reliability.\n* Relationships between tables, such as primary keys (unique identifiers for records) and foreign keys (references to related records in other tables).\n\n### Aims of Good Schema Design\n* **Data Integrity**: Ensures consistency and prevents anomalies.\n* **Query Efficiency**: Facilitates fast and accurate data retrieval, supports complex queries, and optimizes database performance.\n* **Scalability**: Allows the database to grow and adapt as data volumes increase.\n\n### Key Elements of Schema Design\n* **Tables and Attributes**: Each table is defined with specific attributes (columns), each assigned a data type.\n* **Primary Keys**: Uniquely identify each record in a table.\n* **Foreign Keys**: Establish relationships between entities in tables.\n* **Indexes**: Support efficient queries.\n\nThrough careful schema design, database architects create systems that are both efficient and flexible, meeting the current and future needs of an organization. The schema acts as a living document that guides the structure, operations, and integrity of the database.\n\n## 2. Schema as a Database Module\n\nIn complex database designs, the term \"schema\" is also used to describe a distinct module of a larger database with its own namespace that groups related tables together. \nThis modular approach:\n* Separates tables into logical groups for better organization.\n* Avoids naming conflicts in large databases with multiple schemas."
77
},
88
{
99
"cell_type": "markdown",
@@ -40,7 +40,7 @@
4040
{
4141
"cell_type": "markdown",
4242
"metadata": {},
43-
"source": "# Using the `schema` Object\n\nThe schema object groups related tables together and helps prevent naming conflicts.\n\nBy convention, the object created by `dj.Schema` is named `schema`. Typically, only one schema object is used in any given Python namespace, usually at the level of a Python module.\n\nThe schema object serves multiple purposes:\n* **Creating Tables**: Used as a *class decorator* (`@schema`) to declare tables within the schema. \nFor details, see the next section, [Create Tables](015-table.ipynb)\n* **Visualizing the Schema**: Generates diagrams to illustrate relationships between tables.\n* **Exporting Data**: Facilitates exporting data for external use or backup.\n\nWith this foundation, you are ready to begin declaring tables and building your data pipeline."
43+
"source": "# Using the `schema` Object\n\nThe schema object groups related tables together and helps prevent naming conflicts.\n\nBy convention, the object created by `dj.Schema` is named `schema`. Typically, only one schema object is used in any given Python namespace, usually at the level of a Python module.\n\nThe schema object serves multiple purposes:\n* **Creating Tables**: Used as a *class decorator* (`@schema`) to declare tables within the schema. \nFor details, see the next section, [Tables](015-table.ipynb)\n* **Visualizing the Schema**: Generates diagrams to illustrate relationships between tables.\n* **Exporting Data**: Facilitates exporting data for external use or backup.\n\nWith this foundation, you are ready to begin declaring tables and building your data pipeline."
4444
},
4545
{
4646
"cell_type": "markdown",

0 commit comments

Comments
 (0)