Skip to content

Commit a990e0d

Browse files
Merge pull request dimitri-yatsenko#19 from dimitri-yatsenko/claude/fix-glossary-index-015ZtpfYQuusa2bPgFr8Y41p
Claude/fix glossary index 015 ztpf y quusa2b pg fr8 y41p
2 parents 72db81e + db7c703 commit a990e0d

File tree

10 files changed

+49
-53
lines changed

10 files changed

+49
-53
lines changed

book/00-introduction/00-purpose.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ title: Purpose
44

55
## What is DataJoint?
66

7-
**{index}`DataJoint` is a {index}`computational database` language and platform that enables scientists to design, implement, and manage data operations for research by unifying data structures and analysis code.** It provides {index}`data integrity`, {index}`automated computation`, {index}`reproducibility`, and seamless collaboration through a {index}`relational database` approach that coordinates relational databases, code repositories, and object storage.
7+
**DataJoint is a computational database language and platform that enables scientists to design, implement, and manage data operations for research by unifying data structures and analysis code.** It provides data integrity, automated computation, reproducibility, and seamless collaboration through a relational database approach that coordinates relational databases, code repositories, and object storage.
88

99
## Who This Book Is For
1010

@@ -28,18 +28,18 @@ Here's what makes DataJoint different: **your database schema IS your data proce
2828

2929
Traditional databases store and retrieve data. DataJoint does that too, but it also tracks what gets computed from what. Each table plays a specific role in your workflow:
3030

31-
- **{index}`Manual table`s**: Source data entered by researchers
32-
- **{index}`Imported table`s**: Data acquired from instruments or external sources
33-
- **{index}`Computed table`s**: Results automatically derived from upstream data
34-
- **{index}`Lookup table`s**: Reference data and parameters
31+
- **Manual tables**: Source data entered by researchers
32+
- **Imported tables**: Data acquired from instruments or external sources
33+
- **Computed tables**: Results automatically derived from upstream data
34+
- **Lookup tables**: Reference data and parameters
3535

3636
This workflow perspective shapes everything:
3737

3838
**Schema as a Map**: Your database diagram becomes a visual flowchart showing exactly how data moves from raw inputs to final results. Dependencies are explicit, not hidden in scattered scripts.
3939

4040
**Intelligent Diagrams**: Different table types get distinct visual styles. One glance tells you what's manual, what's automatic, and how everything connects.
4141

42-
**{index}`Provenance`, Not Just Integrity**: {index}`Foreign key`s mean more than "this ID exists." They mean "this result was computed FROM this input." When upstream data changes, DataJoint ensures you can't accidentally keep stale downstream results. This is why DataJoint emphasizes INSERT and DELETE over UPDATE—changing input data without recomputing outputs breaks your science, even if the database technically remains consistent.
42+
**Provenance, Not Just Integrity**: Foreign keys mean more than "this ID exists." They mean "this result was computed FROM this input." When upstream data changes, DataJoint ensures you can't accidentally keep stale downstream results. This is why DataJoint emphasizes INSERT and DELETE over UPDATE—changing input data without recomputing outputs breaks your science, even if the database technically remains consistent.
4343

4444
For scientific computing, this workflow-centric design is transformative. Your database doesn't just store results—it guarantees they're valid, reproducible, and traceable back to their origins.
4545

@@ -62,7 +62,7 @@ This book provides the skills to transform research operations: from fragile scr
6262

6363
## DataJoint and SQL: Two Languages, One Foundation
6464

65-
**{index}`SQL` (Structured Query Language)** powers virtually every relational database. DataJoint wraps SQL in Pythonic syntax, automatically translating your code into optimized queries.
65+
**SQL (Structured Query Language)** powers virtually every relational database. DataJoint wraps SQL in Pythonic syntax, automatically translating your code into optimized queries.
6666

6767
You could learn DataJoint without ever seeing SQL. But this book teaches both, side by side. You'll understand not just *what* works but *why*—and you'll be able to work directly with SQL when needed.
6868

book/00-introduction/05-executive-summary.md

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ Standard database solutions address storage and querying but not computation. Da
1111

1212
## The DataJoint Solution
1313

14-
**DataJoint introduces the {index}`Relational Workflow Model`**—an extension of classical relational theory that treats computational transformations as first-class citizens of the data model. The database {index}`schema` becomes an executable specification: it defines not just what data exists, but how data flows through the pipeline and when computations should run.
14+
**DataJoint introduces the Relational Workflow Model**—an extension of classical relational theory that treats computational transformations as first-class citizens of the data model. The database schema becomes an executable specification: it defines not just what data exists, but how data flows through the pipeline and when computations should run.
1515

1616
This creates what we call a **Computational Database**: a system where inserting new raw data automatically triggers all downstream analyses in dependency order, maintaining computational validity throughout. Think of it as a spreadsheet that auto-recalculates, but with the rigor of a relational database and the scale of distributed computing.
1717

@@ -21,16 +21,16 @@ This creates what we call a **Computational Database**: a system where inserting
2121
Unlike Entity-Relationship modeling that requires translation to SQL, DataJoint schemas are directly executable. The diagram *is* the implementation. Schema changes propagate immediately. Documentation cannot drift from reality because the schema is the documentation.
2222

2323
**Workflow-Aware Foreign Keys**
24-
Foreign keys in DataJoint do more than enforce {index}`referential integrity`—they encode computational dependencies. A computed result that references raw data will be automatically deleted if that raw data is removed, preventing stale or orphaned results. This maintains *{index}`computational validity`*, not just *referential integrity*.
24+
Foreign keys in DataJoint do more than enforce referential integrity—they encode computational dependencies. A computed result that references raw data will be automatically deleted if that raw data is removed, preventing stale or orphaned results. This maintains *computational validity*, not just *referential integrity*.
2525

2626
**Declarative Computation**
27-
Computations are defined declaratively through {index}`make() method`s attached to table definitions. The {index}`populate()` operation identifies all missing results and executes computations in dependency order. Parallelization, error handling, and job distribution are handled automatically.
27+
Computations are defined declaratively through make() methods attached to table definitions. The populate() operation identifies all missing results and executes computations in dependency order. Parallelization, error handling, and job distribution are handled automatically.
2828

29-
**{index}`Immutability` by Design**
29+
**Immutability by Design**
3030
Computed results are immutable. Correcting upstream data requires deleting dependent results and recomputing—ensuring the database always represents a consistent computational state. This naturally provides complete provenance: every result can be traced to its source data and the exact code that produced it.
3131

3232
**Hybrid Storage Model**
33-
Structured metadata lives in the relational database ({index}`MySQL`/{index}`PostgreSQL`). Large binary objects (images, recordings, arrays) live in scalable {index}`object storage` (S3, GCS, filesystem) with the database maintaining the mapping. Queries operate on metadata; computation accesses objects transparently.
33+
Structured metadata lives in the relational database (MySQL/PostgreSQL). Large binary objects (images, recordings, arrays) live in scalable object storage (S3, GCS, filesystem) with the database maintaining the mapping. Queries operate on metadata; computation accesses objects transparently.
3434

3535
## Architecture Overview
3636

@@ -70,17 +70,17 @@ This book provides comprehensive coverage of DataJoint from foundations through
7070

7171
**Part II: Design**
7272
- Schema design principles and table definitions
73-
- {index}`Primary key`s, foreign keys, and dependency structures
74-
- {index}`Master-part relationship`s for hierarchical data
75-
- {index}`Normalization` through the lens of workflow entities
73+
- Primary keys, foreign keys, and dependency structures
74+
- Master-part relationships for hierarchical data
75+
- Normalization through the lens of workflow entities
7676
- Schema evolution and migration strategies
7777

7878
**Part III: Operations**
7979
- Data insertion, deletion, and transaction handling
8080
- Caching strategies for performance optimization
8181

8282
**Part IV: Queries**
83-
- DataJoint's five-operator {index}`query algebra`: {index}`restriction`, {index}`projection`, {index}`join`, {index}`aggregation`, {index}`union`
83+
- DataJoint's five-operator query algebra: restriction, projection, join, aggregation, union
8484
- Comparison with SQL and when to use each
8585
- Complex query patterns and optimization
8686

book/20-concepts/00-databases.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ title: Databases
55
## What is a Database?
66

77
```{card} Database
8-
A **{index}`database`** is a dynamic (i.e. *time-varying*), systematically organized collection of data that plays an integral role in the operation of an enterprise.
8+
A **database** is a dynamic (i.e. *time-varying*), systematically organized collection of data that plays an integral role in the operation of an enterprise.
99
It supports the enterprise's operations and is accessed by a variety of users in different ways. Examples of enterprises that rely on databases include hotels, airlines, stores, hospitals, universities, banks, and scientific studies.
1010
1111
The database not only tracks the current state of the enterprise's processes but also enforces essential *business rules*, ensuring that only valid transactions occur and preventing errors or inconsistencies. It serves as the **system of record**, the **single source of truth**, accurately reflecting the current state and ongoing activities.
@@ -25,7 +25,7 @@ Databases are crucial for the smooth and organized operation of various entities
2525
## Database Management Systems (DBMS)
2626

2727
```{card} Database Management System
28-
A {index}`Database Management System` ({index}`DBMS`) is a software system that serves as the computational engine powering a database.
28+
A Database Management System (DBMS) is a software system that serves as the computational engine powering a database.
2929
It defines and enforces the structure of the data, ensuring that the organization's rules are consistently applied.
3030
A DBMS manages data storage and efficiently executes data updates and queries while safeguarding the data's structure and integrity, particularly in environments with multiple concurrent users.
3131
@@ -50,7 +50,7 @@ One of the most critical features distinguishing databases from simple file stor
5050

5151
### Authentication and Authorization
5252

53-
Before you can work with a database, you must **{index}`authentication`**—prove your identity with a username and password. Once authenticated, the database enforces **{index}`authorization`** rules that determine what you can do:
53+
Before you can work with a database, you must **authentication**—prove your identity with a username and password. Once authenticated, the database enforces **authorization** rules that determine what you can do:
5454

5555
- **Read**: View specific tables or columns
5656
- **Write**: Add new data to certain tables
@@ -80,11 +80,11 @@ Modern databases typically separate data management from data use through distin
8080

8181
### Common Architectures
8282

83-
**{index}`Server-client architecture`** (most common): A database server program manages all data operations, while client programs (your scripts, applications, notebooks) connect to request data or submit changes. The server enforces all rules and access permissions consistently for every client. This is like a library where the librarian (server) manages the books and enforces checkout policies, while patrons (clients) request materials.
83+
**Server-client architecture** (most common): A database server program manages all data operations, while client programs (your scripts, applications, notebooks) connect to request data or submit changes. The server enforces all rules and access permissions consistently for every client. This is like a library where the librarian (server) manages the books and enforces checkout policies, while patrons (clients) request materials.
8484
The two most popular open-source relational database systems: MySQL and PostgreSQL implement a server-client architecture.
8585

86-
**{index}`Embedded database`s**: The database engine runs within your application itself—no separate server. This works for single-user applications like mobile apps or desktop software, but doesn't support multiple users accessing shared data simultaneously.
87-
{index}`SQLite` is a common embedded database @10.14778/3554821.3554842.
86+
**Embedded databases**: The database engine runs within your application itself—no separate server. This works for single-user applications like mobile apps or desktop software, but doesn't support multiple users accessing shared data simultaneously.
87+
SQLite is a common embedded database @10.14778/3554821.3554842.
8888

8989
**Distributed Databases**: Data and processing are spread across multiple servers working together. This provides high availability and can handle massive scale, but adds significant complexity. Systems like Google Spanner, Amazon DynamoDB, and CockroachDB use this approach.
9090

@@ -106,7 +106,7 @@ Separating data management from data use provides critical advantages:
106106

107107
This book focuses on **DataJoint**, a framework that extends relational databases specifically for scientific workflows. DataJoint builds on the solid foundation of relational theory while adding capabilities essential for research: automated computation, data provenance, and reproducibility.
108108

109-
The {index}`relational data model`—introduced by {index}`Edgar F. Codd` in 1970—revolutionized data management by organizing data into tables with well-defined relationships. This model has dominated database systems for over five decades due to its mathematical rigor and versatility. Modern relational databases like MySQL and PostgreSQL continue to evolve, incorporating new capabilities for scalability and security while maintaining the core principles that make them reliable and powerful.
109+
The relational data model—introduced by Edgar F. Codd in 1970—revolutionized data management by organizing data into tables with well-defined relationships. This model has dominated database systems for over five decades due to its mathematical rigor and versatility. Modern relational databases like MySQL and PostgreSQL continue to evolve, incorporating new capabilities for scalability and security while maintaining the core principles that make them reliable and powerful.
110110

111111
The following chapters build the conceptual foundation you need to understand DataJoint's approach:
112112
- **Data Models**: What data models are and why schemas matter for scientific work

0 commit comments

Comments
 (0)