You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: book/00-introduction/00-purpose.md
+7-7Lines changed: 7 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,7 +4,7 @@ title: Purpose
4
4
5
5
## What is DataJoint?
6
6
7
-
**{index}`DataJoint` is a {index}`computational database` language and platform that enables scientists to design, implement, and manage data operations for research by unifying data structures and analysis code.** It provides {index}`data integrity`, {index}`automated computation`, {index}`reproducibility`, and seamless collaboration through a {index}`relational database` approach that coordinates relational databases, code repositories, and object storage.
7
+
**DataJoint is a computational database language and platform that enables scientists to design, implement, and manage data operations for research by unifying data structures and analysis code.** It provides data integrity, automated computation, reproducibility, and seamless collaboration through a relational database approach that coordinates relational databases, code repositories, and object storage.
8
8
9
9
## Who This Book Is For
10
10
@@ -28,18 +28,18 @@ Here's what makes DataJoint different: **your database schema IS your data proce
28
28
29
29
Traditional databases store and retrieve data. DataJoint does that too, but it also tracks what gets computed from what. Each table plays a specific role in your workflow:
30
30
31
-
-**{index}`Manual table`s**: Source data entered by researchers
32
-
-**{index}`Imported table`s**: Data acquired from instruments or external sources
33
-
-**{index}`Computed table`s**: Results automatically derived from upstream data
34
-
-**{index}`Lookup table`s**: Reference data and parameters
31
+
-**Manual tables**: Source data entered by researchers
32
+
-**Imported tables**: Data acquired from instruments or external sources
33
+
-**Computed tables**: Results automatically derived from upstream data
34
+
-**Lookup tables**: Reference data and parameters
35
35
36
36
This workflow perspective shapes everything:
37
37
38
38
**Schema as a Map**: Your database diagram becomes a visual flowchart showing exactly how data moves from raw inputs to final results. Dependencies are explicit, not hidden in scattered scripts.
39
39
40
40
**Intelligent Diagrams**: Different table types get distinct visual styles. One glance tells you what's manual, what's automatic, and how everything connects.
41
41
42
-
**{index}`Provenance`, Not Just Integrity**: {index}`Foreign key`s mean more than "this ID exists." They mean "this result was computed FROM this input." When upstream data changes, DataJoint ensures you can't accidentally keep stale downstream results. This is why DataJoint emphasizes INSERT and DELETE over UPDATE—changing input data without recomputing outputs breaks your science, even if the database technically remains consistent.
42
+
**Provenance, Not Just Integrity**: Foreign keys mean more than "this ID exists." They mean "this result was computed FROM this input." When upstream data changes, DataJoint ensures you can't accidentally keep stale downstream results. This is why DataJoint emphasizes INSERT and DELETE over UPDATE—changing input data without recomputing outputs breaks your science, even if the database technically remains consistent.
43
43
44
44
For scientific computing, this workflow-centric design is transformative. Your database doesn't just store results—it guarantees they're valid, reproducible, and traceable back to their origins.
45
45
@@ -62,7 +62,7 @@ This book provides the skills to transform research operations: from fragile scr
62
62
63
63
## DataJoint and SQL: Two Languages, One Foundation
64
64
65
-
**{index}`SQL` (Structured Query Language)** powers virtually every relational database. DataJoint wraps SQL in Pythonic syntax, automatically translating your code into optimized queries.
65
+
**SQL (Structured Query Language)** powers virtually every relational database. DataJoint wraps SQL in Pythonic syntax, automatically translating your code into optimized queries.
66
66
67
67
You could learn DataJoint without ever seeing SQL. But this book teaches both, side by side. You'll understand not just *what* works but *why*—and you'll be able to work directly with SQL when needed.
Copy file name to clipboardExpand all lines: book/00-introduction/05-executive-summary.md
+9-9Lines changed: 9 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -11,7 +11,7 @@ Standard database solutions address storage and querying but not computation. Da
11
11
12
12
## The DataJoint Solution
13
13
14
-
**DataJoint introduces the {index}`Relational Workflow Model`**—an extension of classical relational theory that treats computational transformations as first-class citizens of the data model. The database {index}`schema` becomes an executable specification: it defines not just what data exists, but how data flows through the pipeline and when computations should run.
14
+
**DataJoint introduces the Relational Workflow Model**—an extension of classical relational theory that treats computational transformations as first-class citizens of the data model. The database schema becomes an executable specification: it defines not just what data exists, but how data flows through the pipeline and when computations should run.
15
15
16
16
This creates what we call a **Computational Database**: a system where inserting new raw data automatically triggers all downstream analyses in dependency order, maintaining computational validity throughout. Think of it as a spreadsheet that auto-recalculates, but with the rigor of a relational database and the scale of distributed computing.
17
17
@@ -21,16 +21,16 @@ This creates what we call a **Computational Database**: a system where inserting
21
21
Unlike Entity-Relationship modeling that requires translation to SQL, DataJoint schemas are directly executable. The diagram *is* the implementation. Schema changes propagate immediately. Documentation cannot drift from reality because the schema is the documentation.
22
22
23
23
**Workflow-Aware Foreign Keys**
24
-
Foreign keys in DataJoint do more than enforce {index}`referential integrity`—they encode computational dependencies. A computed result that references raw data will be automatically deleted if that raw data is removed, preventing stale or orphaned results. This maintains *{index}`computational validity`*, not just *referential integrity*.
24
+
Foreign keys in DataJoint do more than enforce referential integrity—they encode computational dependencies. A computed result that references raw data will be automatically deleted if that raw data is removed, preventing stale or orphaned results. This maintains *computational validity*, not just *referential integrity*.
25
25
26
26
**Declarative Computation**
27
-
Computations are defined declaratively through {index}`make() method`s attached to table definitions. The {index}`populate()` operation identifies all missing results and executes computations in dependency order. Parallelization, error handling, and job distribution are handled automatically.
27
+
Computations are defined declaratively through make() methods attached to table definitions. The populate() operation identifies all missing results and executes computations in dependency order. Parallelization, error handling, and job distribution are handled automatically.
28
28
29
-
**{index}`Immutability` by Design**
29
+
**Immutability by Design**
30
30
Computed results are immutable. Correcting upstream data requires deleting dependent results and recomputing—ensuring the database always represents a consistent computational state. This naturally provides complete provenance: every result can be traced to its source data and the exact code that produced it.
31
31
32
32
**Hybrid Storage Model**
33
-
Structured metadata lives in the relational database ({index}`MySQL`/{index}`PostgreSQL`). Large binary objects (images, recordings, arrays) live in scalable {index}`object storage` (S3, GCS, filesystem) with the database maintaining the mapping. Queries operate on metadata; computation accesses objects transparently.
33
+
Structured metadata lives in the relational database (MySQL/PostgreSQL). Large binary objects (images, recordings, arrays) live in scalable object storage (S3, GCS, filesystem) with the database maintaining the mapping. Queries operate on metadata; computation accesses objects transparently.
34
34
35
35
## Architecture Overview
36
36
@@ -70,17 +70,17 @@ This book provides comprehensive coverage of DataJoint from foundations through
70
70
71
71
**Part II: Design**
72
72
- Schema design principles and table definitions
73
-
-{index}`Primary key`s, foreign keys, and dependency structures
74
-
-{index}`Master-part relationship`s for hierarchical data
75
-
-{index}`Normalization` through the lens of workflow entities
73
+
- Primary keys, foreign keys, and dependency structures
74
+
- Master-part relationships for hierarchical data
75
+
- Normalization through the lens of workflow entities
76
76
- Schema evolution and migration strategies
77
77
78
78
**Part III: Operations**
79
79
- Data insertion, deletion, and transaction handling
Copy file name to clipboardExpand all lines: book/20-concepts/00-databases.md
+7-7Lines changed: 7 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,7 +5,7 @@ title: Databases
5
5
## What is a Database?
6
6
7
7
```{card} Database
8
-
A **{index}`database`** is a dynamic (i.e. *time-varying*), systematically organized collection of data that plays an integral role in the operation of an enterprise.
8
+
A **database** is a dynamic (i.e. *time-varying*), systematically organized collection of data that plays an integral role in the operation of an enterprise.
9
9
It supports the enterprise's operations and is accessed by a variety of users in different ways. Examples of enterprises that rely on databases include hotels, airlines, stores, hospitals, universities, banks, and scientific studies.
10
10
11
11
The database not only tracks the current state of the enterprise's processes but also enforces essential *business rules*, ensuring that only valid transactions occur and preventing errors or inconsistencies. It serves as the **system of record**, the **single source of truth**, accurately reflecting the current state and ongoing activities.
@@ -25,7 +25,7 @@ Databases are crucial for the smooth and organized operation of various entities
25
25
## Database Management Systems (DBMS)
26
26
27
27
```{card} Database Management System
28
-
A {index}`Database Management System` ({index}`DBMS`) is a software system that serves as the computational engine powering a database.
28
+
A Database Management System (DBMS) is a software system that serves as the computational engine powering a database.
29
29
It defines and enforces the structure of the data, ensuring that the organization's rules are consistently applied.
30
30
A DBMS manages data storage and efficiently executes data updates and queries while safeguarding the data's structure and integrity, particularly in environments with multiple concurrent users.
31
31
@@ -50,7 +50,7 @@ One of the most critical features distinguishing databases from simple file stor
50
50
51
51
### Authentication and Authorization
52
52
53
-
Before you can work with a database, you must **{index}`authentication`**—prove your identity with a username and password. Once authenticated, the database enforces **{index}`authorization`** rules that determine what you can do:
53
+
Before you can work with a database, you must **authentication**—prove your identity with a username and password. Once authenticated, the database enforces **authorization** rules that determine what you can do:
54
54
55
55
-**Read**: View specific tables or columns
56
56
-**Write**: Add new data to certain tables
@@ -80,11 +80,11 @@ Modern databases typically separate data management from data use through distin
80
80
81
81
### Common Architectures
82
82
83
-
**{index}`Server-client architecture`** (most common): A database server program manages all data operations, while client programs (your scripts, applications, notebooks) connect to request data or submit changes. The server enforces all rules and access permissions consistently for every client. This is like a library where the librarian (server) manages the books and enforces checkout policies, while patrons (clients) request materials.
83
+
**Server-client architecture** (most common): A database server program manages all data operations, while client programs (your scripts, applications, notebooks) connect to request data or submit changes. The server enforces all rules and access permissions consistently for every client. This is like a library where the librarian (server) manages the books and enforces checkout policies, while patrons (clients) request materials.
84
84
The two most popular open-source relational database systems: MySQL and PostgreSQL implement a server-client architecture.
85
85
86
-
**{index}`Embedded database`s**: The database engine runs within your application itself—no separate server. This works for single-user applications like mobile apps or desktop software, but doesn't support multiple users accessing shared data simultaneously.
87
-
{index}`SQLite` is a common embedded database @10.14778/3554821.3554842.
86
+
**Embedded databases**: The database engine runs within your application itself—no separate server. This works for single-user applications like mobile apps or desktop software, but doesn't support multiple users accessing shared data simultaneously.
87
+
SQLite is a common embedded database @10.14778/3554821.3554842.
88
88
89
89
**Distributed Databases**: Data and processing are spread across multiple servers working together. This provides high availability and can handle massive scale, but adds significant complexity. Systems like Google Spanner, Amazon DynamoDB, and CockroachDB use this approach.
90
90
@@ -106,7 +106,7 @@ Separating data management from data use provides critical advantages:
106
106
107
107
This book focuses on **DataJoint**, a framework that extends relational databases specifically for scientific workflows. DataJoint builds on the solid foundation of relational theory while adding capabilities essential for research: automated computation, data provenance, and reproducibility.
108
108
109
-
The {index}`relational data model`—introduced by {index}`Edgar F. Codd` in 1970—revolutionized data management by organizing data into tables with well-defined relationships. This model has dominated database systems for over five decades due to its mathematical rigor and versatility. Modern relational databases like MySQL and PostgreSQL continue to evolve, incorporating new capabilities for scalability and security while maintaining the core principles that make them reliable and powerful.
109
+
The relational data model—introduced by Edgar F. Codd in 1970—revolutionized data management by organizing data into tables with well-defined relationships. This model has dominated database systems for over five decades due to its mathematical rigor and versatility. Modern relational databases like MySQL and PostgreSQL continue to evolve, incorporating new capabilities for scalability and security while maintaining the core principles that make them reliable and powerful.
110
110
111
111
The following chapters build the conceptual foundation you need to understand DataJoint's approach:
112
112
-**Data Models**: What data models are and why schemas matter for scientific work
0 commit comments