Skip to content
Merged
Show file tree
Hide file tree
Changes from 11 commits
Commits
Show all changes
17 commits
Select commit Hold shift + click to select a range
9f28b10
fix: enable auto_mkdir for local filesystem in StorageBackend
dimitri-yatsenko Jan 8, 2026
9737dee
fix(Top): allow order_by=None to inherit existing ordering (#1242)
dimitri-yatsenko Jan 8, 2026
c2a2eae
refactor: improve API consistency for jobs and schema.drop
dimitri-yatsenko Jan 8, 2026
cabdb74
refactor(delete): replace force_parts/force_masters with part_integrity
dimitri-yatsenko Jan 8, 2026
dacf4ac
ci: add MySQL/MinIO services to GitHub Actions workflow
dimitri-yatsenko Jan 8, 2026
ae5fd68
style: format user_tables.py
dimitri-yatsenko Jan 8, 2026
fbc4cad
ci: use docker-compose for test services
dimitri-yatsenko Jan 8, 2026
d778e4f
ci: install graphviz for ERD tests
dimitri-yatsenko Jan 8, 2026
4f4a924
fix(jobs): use MySQL server time consistently for all scheduling
dimitri-yatsenko Jan 8, 2026
344de9b
fix(jobs): use NOW(3) to match CURRENT_TIMESTAMP(3) precision
dimitri-yatsenko Jan 8, 2026
2100487
refactor(jobs): always use NOW(3) + INTERVAL for scheduled_time
dimitri-yatsenko Jan 8, 2026
1fdfb3e
ci: use pixi for CI workflow
dimitri-yatsenko Jan 8, 2026
307983a
docs: update developer guide to use pixi as primary toolchain
dimitri-yatsenko Jan 8, 2026
272fcb5
ci: disable locked mode for pixi install
dimitri-yatsenko Jan 8, 2026
b8645f8
fix(pixi): add test extras to feature-specific pypi-dependencies
dimitri-yatsenko Jan 8, 2026
27391c7
feat: add mypy type checking to pre-commit
dimitri-yatsenko Jan 8, 2026
f195110
feat: add unit tests to pre-commit hooks
dimitri-yatsenko Jan 8, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
86 changes: 69 additions & 17 deletions .github/workflows/test.yaml
Original file line number Diff line number Diff line change
@@ -1,37 +1,89 @@
name: Test

on:
push:
branches:
- "**" # every branch
- "!gh-pages" # exclude gh-pages branch
- "!stage*" # exclude branches beginning with stage
- "**"
- "!gh-pages"
- "!stage*"
paths:
- "src/datajoint"
- "tests"
- "src/datajoint/**"
- "tests/**"
- "pyproject.toml"
- "docker-compose.yaml"
- ".github/workflows/test.yaml"
pull_request:
branches:
- "**" # every branch
- "!gh-pages" # exclude gh-pages branch
- "!stage*" # exclude branches beginning with stage
- "**"
- "!gh-pages"
- "!stage*"
paths:
- "src/datajoint"
- "tests"
- "src/datajoint/**"
- "tests/**"
- "pyproject.toml"
- "docker-compose.yaml"
- ".github/workflows/test.yaml"

jobs:
test:
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
py_ver: ["3.10", "3.11", "3.12", "3.13"]
mysql_ver: ["8.0"]

steps:
- uses: actions/checkout@v4
- name: Set up Python ${{matrix.py_ver}}

- name: Install system dependencies
run: sudo apt-get update && sudo apt-get install -y graphviz
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to do this? pixi run -e test should install graphviz automatically


- name: Start services
run: docker compose up -d db minio --wait
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need this? as long as docker is available, pytest orchestrates the container lifecycle

env:
MYSQL_VER: ${{ matrix.mysql_ver }}

- name: Set up Python ${{ matrix.py_ver }}
uses: actions/setup-python@v5
with:
python-version: ${{matrix.py_ver}}
- name: Integration test
python-version: ${{ matrix.py_ver }}

- name: Install dependencies
run: pip install -e ".[test]"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we never need to run pip install because we are using pixi

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I am still learning pixi's role. Let me check.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the purpose of pixi is to handle all our dependencies, including pypi dependencies and the one conda dependency (graphviz). But the story would be the same if we had no conda deps and were just using uv or hatch. We never need to use pip install any more because we have tools that are faster than pip for resolving dependencies and downloading them.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much! This is so much better!


- name: Run tests
env:
MYSQL_VER: ${{matrix.mysql_ver}}
run: |
pip install -e ".[test]"
pytest --cov-report term-missing --cov=datajoint tests
DJ_USE_EXTERNAL_CONTAINERS: "1"
DJ_HOST: 127.0.0.1
DJ_PORT: 3306
DJ_USER: root
DJ_PASS: password
S3_ENDPOINT: 127.0.0.1:9000
S3_ACCESS_KEY: datajoint
S3_SECRET_KEY: datajoint
run: pytest --cov-report term-missing --cov=datajoint tests -v

- name: Stop services
if: always()
run: docker compose down

# Unit tests run without containers (faster feedback)
unit-tests:
runs-on: ubuntu-latest
strategy:
matrix:
py_ver: ["3.11"]
steps:
- uses: actions/checkout@v4

- name: Set up Python ${{ matrix.py_ver }}
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.py_ver }}

- name: Install dependencies
run: pip install -e ".[test]"

- name: Run unit tests
run: pytest tests/unit -v
117 changes: 117 additions & 0 deletions RELEASE_MEMO.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
# DataJoint 2.0 Release Memo

## PyPI Release Process

### Steps

1. **Run "Manual Draft Release" workflow** on GitHub Actions
2. **Edit the draft release**:
- Change release name to `Release 2.0.0`
- Change tag to `v2.0.0`
3. **Publish the release**
4. Automation will:
- Update `version.py` to `2.0.0`
- Build and publish to PyPI
- Create PR to merge version update back to master

### Version Note

The release drafter computes version from the previous tag (`v0.14.6`), so it would generate `0.14.7` or `0.15.0`. You must **manually edit** the release name to include `2.0.0`.

The regex on line 42 of `post_draft_release_published.yaml` extracts version from the release name:
```bash
VERSION=$(echo "${{ github.event.release.name }}" | grep -oP '\d+\.\d+\.\d+')
```

---

## Conda-Forge Release Process

DataJoint has a [conda-forge feedstock](https://github.com/conda-forge/datajoint-feedstock).

### How Conda-Forge Updates Work

Conda-forge has **automated bots** that detect new PyPI releases and create PRs automatically:

1. **You publish to PyPI** (via the GitHub release workflow)
2. **regro-cf-autotick-bot** detects the new version within ~24 hours
3. **Bot creates a PR** to the feedstock with updated version and hash
4. **Maintainers review and merge** (you're listed as a maintainer)
5. **Package builds automatically** for all platforms

### Manual Update (if bot doesn't trigger)

If the bot doesn't create a PR, manually update the feedstock:

1. **Fork** [conda-forge/datajoint-feedstock](https://github.com/conda-forge/datajoint-feedstock)

2. **Edit `recipe/meta.yaml`**:
```yaml
{% set version = "2.0.0" %}

package:
name: datajoint
version: {{ version }}

source:
url: https://pypi.io/packages/source/d/datajoint/datajoint-{{ version }}.tar.gz
sha256: <NEW_SHA256_HASH>

build:
number: 0 # Reset to 0 for new version
```

3. **Get the SHA256 hash**:
```bash
curl -sL https://pypi.org/pypi/datajoint/2.0.0/json | jq -r '.urls[] | select(.packagetype=="sdist") | .digests.sha256'
```

4. **Update license** (important for 2.0!):
```yaml
about:
license: Apache-2.0 # Changed from LGPL-2.1-only
license_file: LICENSE
```

5. **Submit PR** to the feedstock

### Action Items for 2.0 Release

1. **First**: Publish to PyPI via GitHub release (name it "Release 2.0.0")
2. **Wait**: ~24 hours for conda-forge bot to detect
3. **Check**: [datajoint-feedstock PRs](https://github.com/conda-forge/datajoint-feedstock/pulls) for auto-PR
4. **Review**: Ensure license changed from LGPL to Apache-2.0
5. **Merge**: As maintainer, approve and merge the PR

### Timeline

| Step | When |
|------|------|
| PyPI release | Day 0 |
| Bot detects & creates PR | Day 0-1 |
| Review & merge PR | Day 1-2 |
| Conda-forge package available | Day 1-2 |

### Verification

After release:
```bash
conda search datajoint -c conda-forge
# Should show 2.0.0
```

---

## Maintainers

- @datajointbot
- @dimitri-yatsenko
- @drewyangdev
- @guzman-raphael
- @ttngu207

## Links

- [datajoint-feedstock on GitHub](https://github.com/conda-forge/datajoint-feedstock)
- [datajoint on Anaconda.org](https://anaconda.org/conda-forge/datajoint)
- [datajoint on PyPI](https://pypi.org/project/datajoint/)
6 changes: 3 additions & 3 deletions docs/src/archive/manipulation/delete.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,6 @@ Entities in a [part table](../design/tables/master-part.md) are usually removed
consequence of deleting the master table.

To enforce this workflow, calling `delete` directly on a part table produces an error.
In some cases, it may be necessary to override this behavior.
To remove entities from a part table without calling `delete` master, use the argument `force_parts=True`.
To include the corresponding entries in the master table, use the argument `force_masters=True`.
In some cases, it may be necessary to override this behavior using the `part_integrity` parameter:
- `part_integrity="ignore"`: Remove entities from a part table without deleting from master (breaks integrity).
- `part_integrity="cascade"`: Delete from parts and also cascade up to delete the corresponding master entries.
53 changes: 33 additions & 20 deletions src/datajoint/autopopulate.py
Original file line number Diff line number Diff line change
Expand Up @@ -93,27 +93,40 @@ class AutoPopulate:
_allow_insert = False
_jobs = None

@property
def jobs(self) -> Job:
"""
Access the job table for this auto-populated table.
class _JobsDescriptor:
"""Descriptor allowing jobs access on both class and instance."""

The job table (``~~table_name``) is created lazily on first access.
It tracks job status, priority, scheduling, and error information
for distributed populate operations.
def __get__(self, obj, objtype=None):
"""
Access the job table for this auto-populated table.

Returns
-------
Job
Job management object for this table.
"""
if self._jobs is None:
from .jobs import Job
The job table (``~~table_name``) is created lazily on first access.
It tracks job status, priority, scheduling, and error information
for distributed populate operations.

Can be accessed on either the class or an instance::

# Both work equivalently
Analysis.jobs.refresh()
Analysis().jobs.refresh()

Returns
-------
Job
Job management object for this table.
"""
if obj is None:
# Accessed on class - instantiate first
obj = objtype()
if obj._jobs is None:
from .jobs import Job

obj._jobs = Job(obj)
if not obj._jobs.is_declared:
obj._jobs.declare()
return obj._jobs

self._jobs = Job(self)
if not self._jobs.is_declared:
self._jobs.declare()
return self._jobs
jobs: Job = _JobsDescriptor()

def _declare_check(self, primary_key: list[str], fk_attribute_map: dict[str, tuple[str, str]]) -> None:
"""
Expand Down Expand Up @@ -474,8 +487,8 @@ def handler(signum, frame):
if refresh:
self.jobs.refresh(*restrictions, priority=priority)

# Fetch pending jobs ordered by priority
pending_query = self.jobs.pending & "scheduled_time <= NOW()"
# Fetch pending jobs ordered by priority (use NOW(3) to match CURRENT_TIMESTAMP(3) precision)
pending_query = self.jobs.pending & "scheduled_time <= NOW(3)"
if priority is not None:
pending_query = pending_query & f"priority <= {priority}"

Expand Down
61 changes: 51 additions & 10 deletions src/datajoint/condition.py
Original file line number Diff line number Diff line change
Expand Up @@ -107,32 +107,73 @@ class Top:
----------
limit : int, optional
Maximum number of rows to return. Default 1.
order_by : str or list[str], optional
Attributes to order by. ``"KEY"`` for primary key. Default ``"KEY"``.
order_by : str or list[str] or None, optional
Attributes to order by. ``"KEY"`` for primary key order.
``None`` means inherit ordering from an existing Top (or default to KEY).
Default ``"KEY"``.
offset : int, optional
Number of rows to skip. Default 0.

Examples
--------
>>> query & dj.Top(5) # Top 5 by primary key
>>> query & dj.Top(10, 'score DESC') # Top 10 by score descending
>>> query & dj.Top(10, order_by=None) # Top 10, inherit existing order
>>> query & dj.Top(5, offset=10) # Skip 10, take 5
"""

limit: int | None = 1
order_by: str | list[str] = "KEY"
order_by: str | list[str] | None = "KEY"
offset: int = 0

def __post_init__(self) -> None:
self.order_by = self.order_by or ["KEY"]
self.offset = self.offset or 0

if self.limit is not None and not isinstance(self.limit, int):
raise TypeError("Top limit must be an integer")
if not isinstance(self.order_by, (str, collections.abc.Sequence)) or not all(
isinstance(r, str) for r in self.order_by
):
raise TypeError("Top order_by attributes must all be strings")
if self.order_by is not None:
if not isinstance(self.order_by, (str, collections.abc.Sequence)) or not all(
isinstance(r, str) for r in self.order_by
):
raise TypeError("Top order_by attributes must all be strings")
if isinstance(self.order_by, str):
self.order_by = [self.order_by]
if not isinstance(self.offset, int):
raise TypeError("The offset argument must be an integer")
if self.offset and self.limit is None:
self.limit = 999999999999 # arbitrary large number to allow query
if isinstance(self.order_by, str):
self.order_by = [self.order_by]

def merge(self, other: "Top") -> "Top":
"""
Merge another Top into this one (when other inherits ordering).

Used when ``other.order_by`` is None or matches ``self.order_by``.

Parameters
----------
other : Top
The Top to merge. Its order_by should be None or equal to self.order_by.

Returns
-------
Top
New Top with merged limit/offset and preserved ordering.
"""
# Compute effective limit (minimum of defined limits)
if self.limit is None and other.limit is None:
new_limit = None
elif self.limit is None:
new_limit = other.limit
elif other.limit is None:
new_limit = self.limit
else:
new_limit = min(self.limit, other.limit)

return Top(
limit=new_limit,
order_by=self.order_by, # preserve existing ordering
offset=self.offset + other.offset, # offsets add
)


class Not:
Expand Down
Loading
Loading