datajoint
diff --git a/‎SIMPLIFICATION_RECOMMENDATIONS.md‎
Lines changed: 184 additions & 0 deletions b/‎SIMPLIFICATION_RECOMMENDATIONS.md‎
Lines changed: 184 additions & 0 deletions
diff --git a/‎book/00-introduction/05-executive-summary.md‎
Lines changed: 121 additions & 0 deletions b/‎book/00-introduction/05-executive-summary.md‎
Lines changed: 121 additions & 0 deletions
diff --git a/‎book/00-introduction/20-prerequisites.md‎
Lines changed: 6 additions & 4 deletions b/‎book/00-introduction/20-prerequisites.md‎
Lines changed: 6 additions & 4 deletions
@@ -0,0 +1,184 @@
+# Recommendations for Simplifying Main Text Examples
+
+This report identifies opportunities to simplify examples in the main text by referencing comprehensive examples in the `book/80-examples/` section.
+
+## Executive Summary
+
+After reviewing the main text chapters and the examples section, I identified several opportunities for simplification. However, many examples in the main text serve specific pedagogical purposes and are intentionally minimal to focus on particular concepts. The recommendations below balance simplification with pedagogical effectiveness.
+
+## Examples Section Inventory
+
+| Notebook | Domain | Key Features |
+|----------|--------|--------------|
+| `015-university.ipynb` | Academic administration | Complete schema with Students, Courses, Departments, Terms, Enrollments, Grades; synthetic data generation |
+| `016-university-queries.ipynb` | Query patterns | Comprehensive query examples: restriction, joins, aggregation, universal sets |
+| `010-classic-sales.ipynb` | E-commerce | MySQL sample database; workflow-centric business operations |
+| `070-fractals.ipynb` | Computational pipeline | Table tiers (Manual, Lookup, Computed), populate mechanics, image processing |
+| `075-blob-detection.ipynb` | Image analysis | Master-part relationships, parameter sweeps, computational workflows |
+
+---
+
+## Recommendation 1: Queries Chapter - Reference University Queries
+
+**File**: `book/50-queries/020-restriction.ipynb`
+
+**Current State**: Creates a standalone languages/fluency database example to demonstrate restriction patterns.
+
+**Opportunity**: The restriction chapter could be simplified by:
+1. Keeping the concise language/fluency example for basic concepts
+2. Adding a cross-reference note at the end directing readers to `016-university-queries.ipynb` for more comprehensive query patterns
+
+**Suggested Addition** (at end of chapter):
+```markdown
+## Further Practice
+
+For comprehensive query examples covering all patterns discussed here,
+see the [University Queries](../80-examples/016-university-queries.ipynb) example,
+which demonstrates these concepts on a realistic academic database.
+```
+
+**Impact**: Low - additive, doesn't require removing existing content
+
+---
+
+## Recommendation 2: Relationships Chapter - Reference Classic Sales
+
+**File**: `book/30-database-design/050-relationships.ipynb`
+
+**Current State**: Creates 12 bank schemas (bank1-12) to demonstrate relationship patterns incrementally.
+
+**Analysis**: The bank examples are intentionally minimal and incremental, which is pedagogically valuable. Each schema builds on the previous to illustrate specific cardinality concepts.
+
+**Opportunity**: Add a cross-reference after the core patterns are established:
+
+**Suggested Addition** (after the "Many-to-Many" section):
+```markdown
+:::{tip}
+For a complete business database demonstrating these relationship patterns
+in a realistic context, see the [Classic Sales](../80-examples/010-classic-sales.ipynb)
+example, which models offices, employees, customers, orders, and products
+as an integrated workflow.
+:::
+```
+
+**Impact**: Low - additive only
+
+---
+
+## Recommendation 3: Master-Part Chapter - Reference Blob Detection
+
+**File**: `book/30-database-design/053-master-part.ipynb`
+
+**Current State**: Uses polygon/vertex example for master-part relationships.
+
+**Analysis**: The polygon/vertex example is appropriately minimal for introducing the concept. The chapter already mentions computational workflows.
+
+**Opportunity**: Add a practical cross-reference:
+
+**Suggested Addition** (in "Master-Part in Computations" section):
+```markdown
+For a complete computational example demonstrating master-part relationships
+in an image analysis pipeline, see the [Blob Detection](../80-examples/075-blob-detection.ipynb)
+example, where `Detection` (master) and `Detection.Blob` (part) capture
+aggregate results and per-feature details atomically.
+```
+
+**Impact**: Low - enhances existing content
+
+---
+
+## Recommendation 4: Computation Chapter - Already Well Cross-Referenced
+
+**File**: `book/60-computation/010-computation.ipynb`
+
+**Current State**: Already references `075-blob-detection.ipynb` extensively as a case study.
+
+**Analysis**: This chapter demonstrates best practice - it explains concepts briefly and directs readers to the comprehensive example for implementation details.
+
+**Recommendation**: No changes needed. This is a model for other chapters.
+
+---
+
+## Recommendation 5: Normalization Chapter - Potential for E-commerce Simplification
+
+**File**: `book/30-database-design/055-normalization.ipynb`
+
+**Current State**: Contains extensive E-commerce Order Processing example (Order → Payment → Shipment → Delivery → DeliveryConfirmation) spanning ~100 lines.
+
+**Analysis**: This example is integral to explaining workflow normalization principles. It demonstrates how traditional normalization approaches differ from workflow normalization.
+
+**Opportunity**: Consider adding reference to classic-sales after the e-commerce discussion:
+
+**Suggested Addition**:
+```markdown
+:::{seealso}
+The [Classic Sales](../80-examples/010-classic-sales.ipynb) example demonstrates
+these workflow normalization principles in a complete business database with
+offices, employees, customers, orders, and products.
+:::
+```
+
+**Impact**: Low - additive only
+
+---
+
+## Recommendation 6: Concepts Chapter - Reference Fractals Example
+
+**File**: `book/20-concepts/04-workflows.md`
+
+**Current State**: Explains Relational Workflow Model concepts theoretically.
+
+**Opportunity**: Add reference to practical implementation:
+
+**Suggested Addition** (after "Table Tiers: Workflow Roles" section):
+```markdown
+:::{tip}
+For a hands-on demonstration of all table tiers working together in a
+computational pipeline, see the [Julia Fractals](../80-examples/070-fractals.ipynb)
+example, which shows Manual tables for experimental parameters, Lookup tables
+for reference data, and Computed tables for derived results.
+:::
+```
+
+**Impact**: Low - connects theory to practice
+
+---
+
+## Not Recommended for Simplification
+
+### Bank Examples (050-relationships.ipynb)
+The 12 bank schemas serve a clear pedagogical purpose: demonstrating relationship patterns incrementally. Replacing them with references would lose the step-by-step learning progression.
+
+### Language/Fluency Examples (020-restriction.ipynb)
+These are appropriately minimal for teaching restriction concepts. The university queries example is more complex and would overwhelm the focused explanation.
+
+### Mouse/Cage Examples (055-normalization.ipynb)
+These examples are tightly integrated with the normalization discussion and demonstrate the specific points about workflow normalization vs. entity normalization.
+
+### Polygon/Vertex Example (053-master-part.ipynb)
+This minimal example is ideal for introducing master-part concepts without distraction.
+
+---
+
+## Implementation Priority
+
+| Priority | Recommendation | Effort | Impact |
+|----------|---------------|--------|--------|
+| 1 | Add blob-detection reference to master-part chapter | Low | High - connects concepts to practical example |
+| 2 | Add fractals reference to concepts chapter | Low | Medium - connects theory to practice |
+| 3 | Add university-queries reference to restriction chapter | Low | Medium - provides comprehensive practice |
+| 4 | Add classic-sales reference to relationships chapter | Low | Low - supplementary |
+| 5 | Add classic-sales reference to normalization chapter | Low | Low - supplementary |
+
+---
+
+## Conclusion
+
+The main text examples are generally well-designed for their pedagogical purposes. The primary opportunity is to **add cross-references** to comprehensive examples rather than remove existing content. This approach:
+
+1. Preserves the focused, incremental learning in main text chapters
+2. Directs motivated readers to comprehensive examples for deeper exploration
+3. Demonstrates how concepts apply in realistic, complete systems
+4. Reduces duplication of effort for readers who explore multiple chapters
+
+The computation chapter (`010-computation.ipynb`) already exemplifies best practice by referencing `075-blob-detection.ipynb` as a case study rather than duplicating the full implementation.
@@ -0,0 +1,121 @@
+---
+title: Executive Summary
+subtitle: For Data Architects and Technical Leaders
+---
+
+## The Core Problem
+
+Scientific and engineering organizations face a fundamental challenge: as data volumes grow and analyses become more complex, traditional approaches break down. File-based workflows become unmaintainable. Metadata gets separated from the data it describes. Computational provenance is lost. Teams duplicate effort because they cannot discover or trust each other's work. Reproducing results requires archaeological expeditions through old scripts and folder structures.
+
+Standard database solutions address storage and querying but not computation. Data warehouses and lakes handle scale but not scientific workflows. Workflow engines (Airflow, Luigi, Snakemake) manage task orchestration but lack the data model rigor needed for complex analytical dependencies. The result is a patchwork of tools that don't integrate cleanly, requiring custom glue code that itself becomes a maintenance burden.
+
+## The DataJoint Solution
+
+**DataJoint introduces the Relational Workflow Model**—an extension of classical relational theory that treats computational transformations as first-class citizens of the data model. The database schema becomes an executable specification: it defines not just what data exists, but how data flows through the pipeline and when computations should run.
+
+This creates what we call a **Computational Database**: a system where inserting new raw data automatically triggers all downstream analyses in dependency order, maintaining computational validity throughout. Think of it as a spreadsheet that auto-recalculates, but with the rigor of a relational database and the scale of distributed computing.
+
+### Key Differentiators
+
+**Unified Design and Implementation**
+Unlike Entity-Relationship modeling that requires translation to SQL, DataJoint schemas are directly executable. The diagram *is* the implementation. Schema changes propagate immediately. Documentation cannot drift from reality because the schema is the documentation.
+
+**Workflow-Aware Foreign Keys**
+Foreign keys in DataJoint do more than enforce referential integrity—they encode computational dependencies. A computed result that references raw data will be automatically deleted if that raw data is removed, preventing stale or orphaned results. This maintains *computational validity*, not just *referential integrity*.
+
+**Declarative Computation**
+Computations are defined declaratively through `make()` methods attached to table definitions. The `populate()` operation identifies all missing results and executes computations in dependency order. Parallelization, error handling, and job distribution are handled automatically.
+
+**Immutability by Design**
+Computed results are immutable. Correcting upstream data requires deleting dependent results and recomputing—ensuring the database always represents a consistent computational state. This naturally provides complete provenance: every result can be traced to its source data and the exact code that produced it.
+
+**Hybrid Storage Model**
+Structured metadata lives in the relational database (MySQL/PostgreSQL). Large binary objects (images, recordings, arrays) live in scalable object storage (S3, GCS, filesystem) with the database maintaining the mapping. Queries operate on metadata; computation accesses objects transparently.
+
+## Architecture Overview
+
+The **DataJoint Platform** implements this model through a layered architecture:
+
+```{figure} ../images/Platform.png
+:name: platform-architecture
+:align: center
+:width: 80%
+
+The DataJoint Platform architecture: an open-source core (relational database, code repository, object store) surrounded by functional extensions for interactions, infrastructure, automation, and orchestration.
+```
+
+**Open-Source Core**
+- Relational database (MySQL/PostgreSQL) as system of record
+- Code repository (Git) containing schema definitions and compute methods
+- Object store for large data with structured key naming
+
+**Functional Extensions**
+- *Interactions*: Pipeline navigator, electronic lab notebook integration, visualization dashboards
+- *Infrastructure*: Security, deployment automation, compute resource management
+- *Automation*: Automated population, job orchestration, AI-assisted development
+- *Orchestration*: Data ingest, cross-team collaboration, DOI-based publishing
+
+The core is fully open source. Organizations can build DIY solutions or use managed platform services depending on their needs.
+
+## What This Book Covers
+
+This book provides comprehensive coverage of DataJoint from foundations through advanced applications:
+
+**Part I: Concepts**
+- Database fundamentals and why they matter for scientific work
+- Data models: schema-on-write vs. schema-on-read, and why schemas enable mathematical guarantees
+- Relational theory: the 150-year mathematical foundation from De Morgan through Codd
+- The Relational Workflow Model: DataJoint's extension treating computation as first-class
+- Scientific data pipelines: complete systems integrating database, compute, and collaboration
+
+**Part II: Design**
+- Schema design principles and table definitions
+- Primary keys, foreign keys, and dependency structures
+- Master-part relationships for hierarchical data
+- Normalization through the lens of workflow entities
+- Schema evolution and migration strategies
+
+**Part III: Operations**
+- Data insertion, deletion, and transaction handling
+- Caching strategies for performance optimization
+
+**Part IV: Queries**
+- DataJoint's five-operator query algebra: restriction, projection, join, aggregation, union
+- Comparison with SQL and when to use each
+- Complex query patterns and optimization
+
+**Part V: Computation**
+- The `make()` method pattern for automated computation
+- Parallel execution and distributed computing
+- Error handling and resumable computation
+
+**Part VI: Interfaces and Integration**
+- Python and MATLAB APIs
+- Web interfaces and visualization tools
+- Integration with existing data systems
+
+**Part VII: Examples and Exercises**
+- Complete worked examples from neuroscience, imaging, and other domains
+- Hands-on exercises for each major concept
+
+## Who Should Use DataJoint
+
+DataJoint is designed for organizations where:
+
+- **Data has structure**: Experiments, subjects, sessions, trials, measurements—your domain has natural entities and relationships
+- **Analysis has dependencies**: Results depend on intermediate computations that depend on raw data
+- **Reproducibility matters**: You need to trace any result back to its source data and methodology
+- **Teams collaborate**: Multiple people work with shared data and build on each other's analyses
+- **Scale is growing**: What worked for one researcher doesn't work for a team; what worked for one project doesn't work for ten
+
+DataJoint is used in over a hundred neuroscience labs worldwide, supporting projects of varying sizes and complexity—from single-investigator studies to large multi-site collaborations. It handles multimodal data spanning neurophysiology, imaging, behavior, sequencing, and machine learning, scaling from gigabytes to petabytes while maintaining the same rigor.
+
+## Getting Started
+
+The **Concepts** section builds the theoretical foundation. If you prefer to learn by doing, the hands-on tutorial in **Relational Practice** provides immediate experience with a working database. The **Design** section then covers practical schema construction.
+
+The [Blob Detection example](../80-examples/075-blob-detection.ipynb) demonstrates a complete image processing pipeline with all table tiers (Manual, Lookup, Imported, Computed) working together, providing a concrete reference implementation.
+
+The [DataJoint Specs 2.0](../95-reference/SPECS_2_0.md) provides the formal specification for those requiring precise technical definitions.
+
+To evaluate DataJoint for your organization, visit [datajoint.com](https://datajoint.com) to subscribe to a pilot project and experience the platform firsthand with guided support.
@@ -1,8 +1,8 @@
-# Prerequisites and Essential Skills
+# Prerequisites
 
-This book teaches DataJoint and SQL for scientific data workflows. To get the most out of this course, you should be comfortable with a set of tools that form the bedrock of modern data science.
-While we will focus on database principles, we assume a working knowledge of the following.
-If you're new to these, we highly recommend exploring MIT's ["The Missing Semester of Your CS Education"](https://missing.csail.mit.edu/) to get up to speed.
+This book teaches the concept of relational data workflows in DataJoint.
+We provide some equivalent SQL for reference, but SQL knowledge is not required.
+To get the most out of this course, you should be comfortable with the following tools.
 
 ### Command-Line Proficiency
 
@@ -19,3 +19,5 @@ In collaborative science and software, version control is non-negotiable. We exp
 ### Jupyter Notebooks
 
 This textbook itself is built using Jupyter. You should know how to launch, navigate, and run code within Jupyter Notebooks or JupyterLab. The concept of "literate programming"—mixing executable code, text, and results—is central to reproducible science.
+
+(If you're new to these tools, MIT's ["The Missing Semester of Your CS Education"](https://missing.csail.mit.edu/) is an excellent resource.)