-
Notifications
You must be signed in to change notification settings - Fork 93
feat: dj.Top order inheritance, part_integrity parameter, and storage fixes #1312
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The fsspec local filesystem needs auto_mkdir=True to automatically create parent directories when writing files. This is required for Zarr 3.x which doesn't create directories when opening arrays with mode='w' through an FSMap. Co-Authored-By: Claude Opus 4.5 <[email protected]>
When chaining dj.Top restrictions, the second Top can now inherit
the ordering from the first by using order_by=None:
(Table & dj.Top(100, "score DESC")) & dj.Top(10, order_by=None)
# Returns top 10 of top 100, ordered by score descending
This fixes an issue where preview (to_arrays with limit) would
override user-specified ordering with the default KEY ordering.
Changes:
- Top class now accepts order_by=None to mean "inherit"
- Added Top.merge() method for combining Tops with compatible ordering
- restrict() now merges Tops when order_by=None or identical
- Creates subquery only when orderings differ
Merge behavior:
- limit = min(existing_limit, new_limit)
- offset = existing_offset + new_offset
- order_by = preserved from existing Top
Co-Authored-By: Claude Opus 4.5 <[email protected]>
AutoPopulate.jobs: - Changed from @Property to descriptor pattern - Now accessible on both class and instance: Analysis.jobs.refresh() # class access Analysis().jobs.refresh() # instance access - Maintains lazy initialization behavior Schema.drop(): - Changed parameter from `force` to `prompt` for clarity - prompt=True: show confirmation prompt - prompt=False: drop without confirmation - prompt=None (default): use dj.config['safemode'] setting - More intuitive: prompt=False means "don't prompt" vs force=True which meant "force without prompt" Co-Authored-By: Claude Opus 4.5 <[email protected]>
Introduced unified `part_integrity` parameter for delete and drop operations with policies for master-part integrity: For delete(): - "enforce" (default): Error if parts would be deleted without masters - "ignore": Allow deleting parts without masters (breaks integrity) - "cascade": Also delete masters when parts are deleted For drop(): - "enforce" (default): Error - drop master instead - "ignore": Allow direct drop This replaces the previous boolean parameters: - force_parts=True → part_integrity="ignore" - force_masters=True → part_integrity="cascade" - Part.drop(force=True) → Part.drop(part_integrity="ignore") The same parameter works consistently on both Table and Part classes, providing a cleaner, more intuitive API. Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Use MySQL service container for integration tests - Start MinIO manually (requires server command) - Set environment variables for external container mode - Add separate unit-tests job for faster feedback - Test against Python 3.10-3.13 matrix Addresses #1272 Co-Authored-By: Claude Opus 4.5 <[email protected]>
Co-Authored-By: Claude Opus 4.5 <[email protected]>
|
why declare the mysql config in .github/workflows? I thought this was already handled by docker-compose |
@d-v-b Yes, thanks. You are right. I am experimenting with optimal workflow. |
Use existing docker-compose.yaml instead of duplicating service configuration in the workflow. Co-Authored-By: Claude Opus 4.5 <[email protected]>
Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Use MySQL NOW() instead of Python datetime.now() for consistent timing - Jobs refresh: use CURRENT_TIMESTAMP default for scheduled_time (delay=0) - Jobs reserve: use MySQL NOW() for scheduled_time comparison and reserved_time - Jobs complete/error: use MySQL NOW() for completed_time - Stale/orphan timeout checks: use MySQL NOW() - INTERVAL syntax This fixes timezone mismatches between Python and MySQL that caused jobs to not be found in CI environments. Co-Authored-By: Claude Opus 4.5 <[email protected]>
The scheduled_time column uses CURRENT_TIMESTAMP(3) with milliseconds, so comparisons must also use NOW(3) for consistent precision. Co-Authored-By: Claude Opus 4.5 <[email protected]>
Simplify by always computing scheduled_time with MySQL server time, removing the special case for delay=0. Co-Authored-By: Claude Opus 4.5 <[email protected]>
.github/workflows/test.yaml
Outdated
| - name: Set up Python ${{matrix.py_ver}} | ||
|
|
||
| - name: Install system dependencies | ||
| run: sudo apt-get update && sudo apt-get install -y graphviz |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need to do this? pixi run -e test should install graphviz automatically
.github/workflows/test.yaml
Outdated
| run: sudo apt-get update && sudo apt-get install -y graphviz | ||
|
|
||
| - name: Start services | ||
| run: docker compose up -d db minio --wait |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need this? as long as docker is available, pytest orchestrates the container lifecycle
.github/workflows/test.yaml
Outdated
| python-version: ${{ matrix.py_ver }} | ||
|
|
||
| - name: Install dependencies | ||
| run: pip install -e ".[test]" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we never need to run pip install because we are using pixi
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I am still learning pixi's role. Let me check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the purpose of pixi is to handle all our dependencies, including pypi dependencies and the one conda dependency (graphviz). But the story would be the same if we had no conda deps and were just using uv or hatch. We never need to use pip install any more because we have tools that are faster than pip for resolving dependencies and downloading them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks so much! This is so much better!
- Use setup-pixi action instead of manual setup - Graphviz installed automatically via pixi conda dependency - Testcontainers manages MySQL/MinIO containers automatically - No manual pip install needed, pixi handles dependencies Addresses reviewer feedback on PR #1312. Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Make pixi the recommended development setup (matches CI) - Add DOCKER_HOST note for macOS Docker Desktop users - Keep pip as alternative for users who prefer it - Update pre-commit commands to use pixi Co-Authored-By: Claude Opus 4.5 <[email protected]>
Allow lock file updates in CI since the lock file may be stale. Co-Authored-By: Claude Opus 4.5 <[email protected]>
The test feature was not including s3fs and other test dependencies because the feature-specific pypi-dependencies were missing. Co-Authored-By: Claude Opus 4.5 <[email protected]>
|
Addressed all review comments:
The workflow is now simplified to just setup-pixi + Also updated:
|
- Add mypy configuration to pyproject.toml - Add mypy hook to pre-commit with type stubs - Start with lenient settings, strict checking for content_registry - All other modules excluded until fully typed (gradual adoption) Addresses #1266. Co-Authored-By: Claude Opus 4.5 <[email protected]>
Run unit tests locally before commit to catch issues before CI. Addresses feedback from @drewyangdev on #1211. Co-Authored-By: Claude Opus 4.5 <[email protected]>
Summary
This PR includes fixes and enhancements for DataJoint 2.0:
1. Fix: enable auto_mkdir for local filesystem in StorageBackend
Fixes an issue where
staged_insert1with<object@>fields fails when using Zarr 3.x on local filesystem storage.2. Fix: dj.Top order_by inheritance (fixes #1242)
When chaining
dj.Toprestrictions, the second Top can now inherit the ordering from the first by usingorder_by=None:3. New
part_integrityparameter for delete/dropUnified parameter for master-part integrity in delete and drop operations:
For
delete():"enforce""ignore""cascade"For
drop():"enforce""ignore"This replaces the previous parameters:
force_parts=True→part_integrity="ignore"force_masters=True→part_integrity="cascade"Part.drop(force=True)→Part.drop(part_integrity="ignore")4. API improvements
AutoPopulate.jobs: Now accessible on both class and instance
Schema.drop(): Changed
forcetopromptfor clarity5. GitHub Actions CI (fixes #1272)
Updated workflow to run tests automatically on PRs:
6. Jobs 2.0: Use MySQL server time consistently
Fixed timezone mismatches between Python and MySQL that caused jobs not to be found in CI:
NOW(3)for all scheduling comparisons (matchesCURRENT_TIMESTAMP(3)precision)CURRENT_TIMESTAMPdefault forscheduled_timewhen delay=0reserved_time,completed_timeTest plan
Top.merge()(11 tests)Related Issues & PRs
dj.Toporders the preview withorder_by#1242 (preview doesn't respect dj.Top ordering)super.deletekwargs toPart.delete#1276 (Can't passsuper.deletekwargs toPart.delete)🤖 Generated with Claude Code