Skip to content

Conversation

@dimitri-yatsenko
Copy link
Member

@dimitri-yatsenko dimitri-yatsenko commented Jan 8, 2026

Summary

This PR includes fixes and enhancements for DataJoint 2.0:

1. Fix: enable auto_mkdir for local filesystem in StorageBackend

Fixes an issue where staged_insert1 with <object@> fields fails when using Zarr 3.x on local filesystem storage.

2. Fix: dj.Top order_by inheritance (fixes #1242)

When chaining dj.Top restrictions, the second Top can now inherit the ordering from the first by using order_by=None:

(Table & dj.Top(100, "score DESC")) & dj.Top(10, order_by=None)
# Returns top 10 of top 100, ordered by score descending

3. New part_integrity parameter for delete/drop

Unified parameter for master-part integrity in delete and drop operations:

For delete():

Value Behavior
"enforce" (default) Error if parts would be deleted without masters
"ignore" Allow deleting parts without masters (breaks integrity)
"cascade" Also delete masters when parts are deleted

For drop():

Value Behavior
"enforce" (default) Error - drop master instead
"ignore" Allow direct drop

This replaces the previous parameters:

  • force_parts=Truepart_integrity="ignore"
  • force_masters=Truepart_integrity="cascade"
  • Part.drop(force=True)Part.drop(part_integrity="ignore")

4. API improvements

AutoPopulate.jobs: Now accessible on both class and instance
Schema.drop(): Changed force to prompt for clarity

5. GitHub Actions CI (fixes #1272)

Updated workflow to run tests automatically on PRs:

  • Uses docker-compose for MySQL and MinIO services
  • Python 3.10-3.13 matrix
  • Separate fast unit-tests job
  • Installs graphviz for ERD tests

6. Jobs 2.0: Use MySQL server time consistently

Fixed timezone mismatches between Python and MySQL that caused jobs not to be found in CI:

  • Use MySQL NOW(3) for all scheduling comparisons (matches CURRENT_TIMESTAMP(3) precision)
  • Use MySQL's CURRENT_TIMESTAMP default for scheduled_time when delay=0
  • Use MySQL server time for reserved_time, completed_time

Test plan

  • All tests pass (574+ tests)
  • New unit tests for Top.merge() (11 tests)
  • New integration tests for Top inheritance/merging (5 tests)
  • Jobs 2.0 tests pass in CI

Related Issues & PRs

🤖 Generated with Claude Code

The fsspec local filesystem needs auto_mkdir=True to automatically
create parent directories when writing files. This is required for
Zarr 3.x which doesn't create directories when opening arrays with
mode='w' through an FSMap.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@github-actions github-actions bot added the enhancement Indicates new improvements label Jan 8, 2026
dimitri-yatsenko and others added 3 commits January 8, 2026 00:52
When chaining dj.Top restrictions, the second Top can now inherit
the ordering from the first by using order_by=None:

    (Table & dj.Top(100, "score DESC")) & dj.Top(10, order_by=None)
    # Returns top 10 of top 100, ordered by score descending

This fixes an issue where preview (to_arrays with limit) would
override user-specified ordering with the default KEY ordering.

Changes:
- Top class now accepts order_by=None to mean "inherit"
- Added Top.merge() method for combining Tops with compatible ordering
- restrict() now merges Tops when order_by=None or identical
- Creates subquery only when orderings differ

Merge behavior:
- limit = min(existing_limit, new_limit)
- offset = existing_offset + new_offset
- order_by = preserved from existing Top

Co-Authored-By: Claude Opus 4.5 <[email protected]>
AutoPopulate.jobs:
- Changed from @Property to descriptor pattern
- Now accessible on both class and instance:
    Analysis.jobs.refresh()    # class access
    Analysis().jobs.refresh()  # instance access
- Maintains lazy initialization behavior

Schema.drop():
- Changed parameter from `force` to `prompt` for clarity
- prompt=True: show confirmation prompt
- prompt=False: drop without confirmation
- prompt=None (default): use dj.config['safemode'] setting
- More intuitive: prompt=False means "don't prompt"
  vs force=True which meant "force without prompt"

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Introduced unified `part_integrity` parameter for delete and drop operations
with policies for master-part integrity:

For delete():
- "enforce" (default): Error if parts would be deleted without masters
- "ignore": Allow deleting parts without masters (breaks integrity)
- "cascade": Also delete masters when parts are deleted

For drop():
- "enforce" (default): Error - drop master instead
- "ignore": Allow direct drop

This replaces the previous boolean parameters:
- force_parts=True → part_integrity="ignore"
- force_masters=True → part_integrity="cascade"
- Part.drop(force=True) → Part.drop(part_integrity="ignore")

The same parameter works consistently on both Table and Part classes,
providing a cleaner, more intuitive API.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@github-actions github-actions bot added the documentation Issues related to documentation label Jan 8, 2026
@dimitri-yatsenko dimitri-yatsenko changed the title fix: enable auto_mkdir for local filesystem in StorageBackend feat: dj.Top order inheritance, part_integrity parameter, and storage fixes Jan 8, 2026
@dimitri-yatsenko dimitri-yatsenko added this to the DataJoint 2.0 milestone Jan 8, 2026
@dimitri-yatsenko dimitri-yatsenko self-assigned this Jan 8, 2026
- Use MySQL service container for integration tests
- Start MinIO manually (requires server command)
- Set environment variables for external container mode
- Add separate unit-tests job for faster feedback
- Test against Python 3.10-3.13 matrix

Addresses #1272

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Co-Authored-By: Claude Opus 4.5 <[email protected]>
@d-v-b
Copy link
Contributor

d-v-b commented Jan 8, 2026

why declare the mysql config in .github/workflows? I thought this was already handled by docker-compose

@dimitri-yatsenko
Copy link
Member Author

why declare the mysql config in .github/workflows? I thought this was already handled by docker-compose

@d-v-b Yes, thanks. You are right. I am experimenting with optimal workflow.

dimitri-yatsenko and others added 5 commits January 8, 2026 09:26
Use existing docker-compose.yaml instead of duplicating service
configuration in the workflow.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Use MySQL NOW() instead of Python datetime.now() for consistent timing
- Jobs refresh: use CURRENT_TIMESTAMP default for scheduled_time (delay=0)
- Jobs reserve: use MySQL NOW() for scheduled_time comparison and reserved_time
- Jobs complete/error: use MySQL NOW() for completed_time
- Stale/orphan timeout checks: use MySQL NOW() - INTERVAL syntax

This fixes timezone mismatches between Python and MySQL that caused
jobs to not be found in CI environments.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
The scheduled_time column uses CURRENT_TIMESTAMP(3) with milliseconds,
so comparisons must also use NOW(3) for consistent precision.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Simplify by always computing scheduled_time with MySQL server time,
removing the special case for delay=0.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
- name: Set up Python ${{matrix.py_ver}}

- name: Install system dependencies
run: sudo apt-get update && sudo apt-get install -y graphviz
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to do this? pixi run -e test should install graphviz automatically

run: sudo apt-get update && sudo apt-get install -y graphviz

- name: Start services
run: docker compose up -d db minio --wait
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need this? as long as docker is available, pytest orchestrates the container lifecycle

python-version: ${{ matrix.py_ver }}

- name: Install dependencies
run: pip install -e ".[test]"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we never need to run pip install because we are using pixi

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I am still learning pixi's role. Let me check.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the purpose of pixi is to handle all our dependencies, including pypi dependencies and the one conda dependency (graphviz). But the story would be the same if we had no conda deps and were just using uv or hatch. We never need to use pip install any more because we have tools that are faster than pip for resolving dependencies and downloading them.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much! This is so much better!

dimitri-yatsenko and others added 4 commits January 8, 2026 10:18
- Use setup-pixi action instead of manual setup
- Graphviz installed automatically via pixi conda dependency
- Testcontainers manages MySQL/MinIO containers automatically
- No manual pip install needed, pixi handles dependencies

Addresses reviewer feedback on PR #1312.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Make pixi the recommended development setup (matches CI)
- Add DOCKER_HOST note for macOS Docker Desktop users
- Keep pip as alternative for users who prefer it
- Update pre-commit commands to use pixi

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Allow lock file updates in CI since the lock file may be stale.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
The test feature was not including s3fs and other test dependencies
because the feature-specific pypi-dependencies were missing.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@dimitri-yatsenko
Copy link
Member Author

Addressed all review comments:

  1. Graphviz - Now using setup-pixi which installs graphviz automatically via pixi's conda dependency
  2. Docker compose - Removed. Testcontainers manages MySQL/MinIO container lifecycle automatically
  3. Pip install - Removed. Using pixi run -e test instead

The workflow is now simplified to just setup-pixi + pixi run -e test test-cov.

Also updated:

  • README to make pixi the primary developer workflow (pip as alternative)
  • Added DOCKER_HOST note for macOS Docker Desktop users
  • Fixed pixi feature pypi-dependencies to include test extras (s3fs, etc.)

dimitri-yatsenko and others added 2 commits January 8, 2026 11:53
- Add mypy configuration to pyproject.toml
- Add mypy hook to pre-commit with type stubs
- Start with lenient settings, strict checking for content_registry
- All other modules excluded until fully typed (gradual adoption)

Addresses #1266.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Run unit tests locally before commit to catch issues before CI.
Addresses feedback from @drewyangdev on #1211.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@dimitri-yatsenko dimitri-yatsenko merged commit 46333e0 into pre/v2.0 Jan 8, 2026
5 of 6 checks passed
@dimitri-yatsenko dimitri-yatsenko deleted the docs-2.0-migration branch January 8, 2026 19:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Issues related to documentation enhancement Indicates new improvements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants