datajoint · ttngu207 · Jul 29, 2025 · Jul 25, 2025 · Jul 25, 2025 · Jul 25, 2025
diff --git a/.devcontainer/docker-compose.yml b/.devcontainer/docker-compose.yml
@@ -1,4 +1,3 @@
-version: '2.4'
 services:
   # Update this to the name of the service you want to work with in your docker-compose.yml file
   app:
@@ -7,13 +6,13 @@ services:
     # docker-compose.yml file (the first in the devcontainer.json "dockerComposeFile"
     # array). The sample below assumes your primary file is in the root of your project.
     container_name: datajoint-python-devcontainer
-    image: datajoint/datajoint-python-devcontainer:${PY_VER:-3.11}-${DISTRO:-buster}
+    image: datajoint/datajoint-python-devcontainer:${PY_VER:-3.11}-${DISTRO:-bookworm}
     build:
       context: .
       dockerfile: .devcontainer/Dockerfile
       args:
         - PY_VER=${PY_VER:-3.11}
-        - DISTRO=${DISTRO:-buster}
+        - DISTRO=${DISTRO:-bookworm}
 
     volumes:
       # Update this to wherever you want VS Code to mount the folder of your project

diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -21,18 +21,18 @@ repos:
   hooks:
     - id: codespell
 - repo: https://github.com/pycqa/isort
-  rev: 5.12.0  # Use the latest stable version
+  rev: 6.0.1  # Use the latest stable version
   hooks:
     - id: isort
       args:
         - --profile=black  # Optional, makes isort compatible with Black
 - repo: https://github.com/psf/black
-  rev: 24.2.0 # matching versions in pyproject.toml and github actions
+  rev: 25.1.0 # matching versions in pyproject.toml and github actions
   hooks:
   - id: black
     args: ["--check", "-v", "datajoint", "tests", "--diff"] # --required-version is conflicting with pre-commit
 - repo: https://github.com/PyCQA/flake8
-  rev: 7.1.2
+  rev: 7.3.0
   hooks:
   # syntax tests
   - id: flake8

diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,5 +1,8 @@
 ## Release notes
 
+**Note:** This file is no longer updated. See the GitHub change log page for the
+latest release notes: <https://github.com/datajoint/datajoint-python/releases>.
+
 ### 0.14.3 -- Sep 23, 2024
 - Added - `dj.Top` restriction - PR [#1024](https://github.com/datajoint/datajoint-python/issues/1024)) PR [#1084](https://github.com/datajoint/datajoint-python/pull/1084)
 - Fixed - Added encapsulating double quotes to comply with [DOT language](https://graphviz.org/doc/info/lang.html) - PR [#1177](https://github.com/datajoint/datajoint-python/pull/1177)

diff --git a/Dockerfile b/Dockerfile
@@ -2,7 +2,7 @@ ARG IMAGE=mambaorg/micromamba:1.5-bookworm-slim
 FROM ${IMAGE}
 
 ARG CONDA_BIN=micromamba
-ARG PY_VER=3.9
+ARG PY_VER=3.11
 ARG HOST_UID=1000
 
 RUN ${CONDA_BIN} install --no-pin -qq -y -n base -c conda-forge \

diff --git a/docs/src/compute/populate.md b/docs/src/compute/populate.md
@@ -65,6 +65,193 @@ The `make` callback does three things:
 `make` may populate multiple entities in one call when `key` does not specify the
 entire primary key of the populated table.
 
+### Three-Part Make Pattern for Long Computations
+
+For long-running computations, DataJoint provides an advanced pattern called the
+**three-part make** that separates the `make` method into three distinct phases.
+This pattern is essential for maintaining database performance and data integrity
+during expensive computations.
+
+#### The Problem: Long Transactions
+
+Traditional `make` methods perform all operations within a single database transaction:
+
+```python
+def make(self, key):
+    # All within one transaction
+    data = (ParentTable & key).fetch1()  # Fetch
+    result = expensive_computation(data)  # Compute (could take hours)
+    self.insert1(dict(key, result=result))  # Insert
+```
+
+This approach has significant limitations:
+- **Database locks**: Long transactions hold locks on tables, blocking other operations
+- **Connection timeouts**: Database connections may timeout during long computations
+- **Memory pressure**: All fetched data must remain in memory throughout the computation
+- **Failure recovery**: If computation fails, the entire transaction is rolled back
+
+#### The Solution: Three-Part Make Pattern
+
+The three-part make pattern splits the `make` method into three distinct phases,
+allowing the expensive computation to occur outside of database transactions:
+
+```python
+def make_fetch(self, key):
+    """Phase 1: Fetch all required data from parent tables"""
+    fetched_data = ((ParentTable & key).fetch1(),) 
+    return fetched_data # must be a sequence, eg tuple or list
+
+def make_compute(self, key, *fetched_data):
+    """Phase 2: Perform expensive computation (outside transaction)"""
+    computed_result = expensive_computation(*fetched_data)
+    return computed_result # must be a sequence, eg tuple or list
+
+def make_insert(self, key, *computed_result):
+    """Phase 3: Insert results into the current table"""
+    self.insert1(dict(key, result=computed_result))
+```
+
+#### Execution Flow
+
+To achieve data intensity without long transactions, the three-part make pattern follows this sophisticated execution sequence:
+
+```python
+# Step 1: Fetch data outside transaction
+fetched_data1 = self.make_fetch(key)
+computed_result = self.make_compute(key, *fetched_data1)
+
+# Step 2: Begin transaction and verify data consistency
+begin transaction:
+    fetched_data2 = self.make_fetch(key)
+    if fetched_data1 != fetched_data2:  # deep comparison
+        cancel transaction  # Data changed during computation
+    else:
+        self.make_insert(key, *computed_result)
+        commit_transaction
+```
+
+#### Key Benefits
+
+1. **Reduced Database Lock Time**: Only the fetch and insert operations occur within transactions, minimizing lock duration
+2. **Connection Efficiency**: Database connections are only used briefly for data transfer
+3. **Memory Management**: Fetched data can be processed and released during computation
+4. **Fault Tolerance**: Computation failures don't affect database state
+5. **Scalability**: Multiple computations can run concurrently without database contention
+
+#### Referential Integrity Protection
+
+The pattern includes a critical safety mechanism: **referential integrity verification**.
+Before inserting results, the system:
+
+1. Re-fetches the source data within the transaction
+2. Compares it with the originally fetched data using deep hashing
+3. Only proceeds with insertion if the data hasn't changed
+
+This prevents the "phantom read" problem where source data changes during long computations,
+ensuring that results remain consistent with their inputs.
+
+#### Implementation Details
+
+The pattern is implemented using Python generators in the `AutoPopulate` class:
+
+```python
+def make(self, key):
+    # Step 1: Fetch data from parent tables
+    fetched_data = self.make_fetch(key)
+    computed_result = yield fetched_data
+
+    # Step 2: Compute if not provided
+    if computed_result is None:
+        computed_result = self.make_compute(key, *fetched_data)
+        yield computed_result
+
+    # Step 3: Insert the computed result
+    self.make_insert(key, *computed_result)
+    yield
+```
+Therefore, it is possible to override the `make` method to implement the three-part make pattern by using the `yield` statement to return the fetched data and computed result as above.
+
+#### Use Cases
+
+This pattern is particularly valuable for:
+
+- **Machine learning model training**: Hours-long training sessions
+- **Image processing pipelines**: Large-scale image analysis
+- **Statistical computations**: Complex statistical analyses
+- **Data transformations**: ETL processes with heavy computation
+- **Simulation runs**: Time-consuming simulations
+
+#### Example: Long-Running Image Analysis
+
+Here's an example of how to implement the three-part make pattern for a
+long-running image analysis task:
+
+```python
+@schema
+class ImageAnalysis(dj.Computed):
+    definition = """
+    # Complex image analysis results
+    -> Image
+    ---
+    analysis_result : longblob
+    processing_time : float
+    """
+
+    def make_fetch(self, key):
+        """Fetch the image data needed for analysis"""
+        return (Image & key).fetch1('image'), 
+
+    def make_compute(self, key, image_data):
+        """Perform expensive image analysis outside transaction"""
+        import time
+        start_time = time.time()
+
+        # Expensive computation that could take hours
+        result = complex_image_analysis(image_data)
+        processing_time = time.time() - start_time
+        return result, processing_time
+
+    def make_insert(self, key, analysis_result, processing_time):
+        """Insert the analysis results"""
+        self.insert1(dict(key, 
+                         analysis_result=analysis_result,
+                         processing_time=processing_time))
+```
+
+The exact same effect may be achieved by overriding the `make` method as a generator function using the `yield` statement to return the fetched data and computed result as above:
+
+```python
+@schema
+class ImageAnalysis(dj.Computed):
+    definition = """
+    # Complex image analysis results
+    -> Image
+    ---
+    analysis_result : longblob
+    processing_time : float
+    """
+
+    def make(self, key):
+        fetched_data = (Image & key).fetch1('image'), 
+        computed_result = yield fetched_data 
+
+        if computed_result is None:
+            # Expensive computation that could take hours
+            import time
+            start_time = time.time()
+            result = complex_image_analysis(image_data)
-            result = complex_image_analysis(image_data)
+            result = complex_image_analysis(fetched_data[0])
-            result = complex_image_analysis(image_data)
+            result = complex_image_analysis(fetched_data[0])
+            processing_time = time.time() - start_time
+            computed_result = result, processing_time
+            yield computed_result
+
+        result, processing_time = computed_result
+        self.insert1(dict(key, 
+                         analysis_result=result,
+                         processing_time=processing_time))
+        yield  # yield control back to the caller
+```
+We expect that most users will prefer to use the three-part implementation over the generator function implementation due to its conceptual complexity.
+
 ## Populate
 
 The inherited `populate` method of `dj.Imported` and `dj.Computed` automatically calls

diff --git a/pyproject.toml b/pyproject.toml
@@ -27,6 +27,7 @@ dependencies = [
 requires-python = ">=3.9,<4.0"
 authors = [
   {name = "Dimitri Yatsenko", email = "[email protected]"},
+  {name = "Thinh Nguyen", email = "[email protected]"},
   {name = "Raphael Guzman"},
   {name = "Edgar Walker"},
   {name = "DataJoint Contributors", email = "[email protected]"},