Skip to content

Conversation

@dimitri-yatsenko
Copy link
Member

@dimitri-yatsenko dimitri-yatsenko commented Jan 8, 2026

Summary

This PR modernizes DataJoint's schema exploration capabilities with a complete virtual schema infrastructure overhaul and CLI improvements.

Virtual Schema Infrastructure (#1307)

New Schema Introspection API:

  • Schema.get_table(name) - Direct table access with automatic tier prefix detection
  • Schema.__getitem__ - Bracket notation: schema['TableName']
  • Schema.__iter__ - Iterate over all tables in dependency order
  • Schema.__contains__ - Check table existence: 'TableName' in schema

New Entry Points:

  • dj.virtual_schema(schema_name) - Clean function to access existing database schemas
  • dj.VirtualModule(alias, schema_name) - Create virtual modules with custom names

Internal Fixes:

  • Fix gc.py to use get_table() instead of non-existent spawn_table()
  • Handle tier prefixes (#, _, __) automatically in table lookups

CLI Improvements

The dj command-line interface now provides a proper interactive REPL for schema exploration:

Bug Fixes:

  • Fix -h conflict: removed -h shorthand for --host (was overriding --help)
  • Fix namespace issue: explicitly includes dj in REPL

Enhancements:

  • Add module-level docstring with usage examples
  • Improve function docstring to NumPy style
  • Add error handling for invalid schema format (schema:alias validation)
  • Improve banner with version and tab-completion hint

Usage:

# Start REPL with schemas loaded
dj -s my_lab:lab -s my_analysis:analysis

# In REPL
>>> lab.Subject.to_dicts()
>>> dj.Diagram(lab.schema)

Empty Insert Support (#1280)

Tables with all-default attributes now accept empty inserts:

@schema
class EventLog(dj.Manual):
    definition = """
    id: int auto_increment
    ---
    dt=CURRENT_TIMESTAMP : datetime
    """

EventLog().insert1({})  # Now works!

CI/Tooling (#1271)

  • Use pixi for CI workflow with testcontainers
  • Add mypy type checking to pre-commit
  • Add unit tests to pre-commit hooks
  • Update pre-commit to reference ruff (not flake8)
  • Fix GitHub Actions labeler to preserve manual labels

Other Fixes

Breaking Changes

  • Removed create_virtual_module - use dj.virtual_schema() or dj.VirtualModule() instead
  • Removed specs/ folder (migrated to datajoint-docs)

Test Plan

  • Virtual schema tests added (test_virtual_module.py)
  • Schema introspection tests: get_table, __getitem__, __iter__, __contains__
  • Empty insert tests added (test_insert.py::TestEmptyInsert)
  • dj.Top ordering test updated
  • CI passes

Closes

🤖 Generated with Claude Code

dimitri-yatsenko and others added 3 commits January 8, 2026 13:48
Update test_top_restriction_with_keywords to verify that dj.Top
properly preserves ordering in fetch results. Use secondary sort
by 'id' to ensure deterministic results when there are ties.

Fixes #1205

Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Add Schema.get_table() for direct table access
- Add Schema.__getitem__ for bracket notation: schema['TableName']
- Add Schema.__iter__ to iterate over all tables
- Add Schema.__contains__ for 'TableName' in schema
- Add dj.virtual_schema() as clean entry point
- Remove create_virtual_module (breaking change)
- Fix gc.py to use get_table() instead of spawn_table()
- Remove specs/ folder (moved to datajoint-docs)
- Add comprehensive tests for virtual schema infrastructure

Fixes #1307

Co-Authored-By: Claude Opus 4.5 <[email protected]>
The pre-commit config has been modernized to use ruff instead of
flake8. Update the SKIP example comment accordingly.

Closes #1271

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@github-actions github-actions bot added the enhancement Indicates new improvements label Jan 8, 2026
dimitri-yatsenko and others added 2 commits January 8, 2026 14:45
- Add type annotations to errors.py (suggest method)
- Add type annotations to hash.py (key_hash, uuid_from_buffer)
- Enable strict mypy checking for these modules
- Now 3 modules under strict checking: content_registry, errors, hash

Increases type coverage incrementally following gradual adoption strategy.

Related #1266

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Change sync-labels from true to false in PR labeler workflow.
This prevents the GitHub Actions labeler from removing manually
added labels like "breaking" when they don't match the automatic
labeling rules.

With sync-labels: true, the action removes any labels not matched
by the configuration. With sync-labels: false, it only adds labels
based on patterns and preserves manually added labels.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@dimitri-yatsenko dimitri-yatsenko added breaking Not backward compatible changes documentation Issues related to documentation labels Jan 8, 2026
@dimitri-yatsenko dimitri-yatsenko self-assigned this Jan 8, 2026
Update PyPI keywords to reflect DataJoint 2.0 positioning and
modern data engineering terminology:

Added:
- data-engineering, data-pipelines, workflow-management
- data-integrity, reproducibility, declarative
- object-storage, schema-management, data-lineage
- scientific-computing, research-software
- postgresql (upcoming support)

Removed:
- Generic terms: database, automated, automation, compute, data
- Redundant terms: pipeline, workflow, scientific, science, research
- Domain-specific: bioinformatics (kept neuroscience as primary)

Updated GitHub repository topics to match (18 topics total).

Focuses on searchable terms, 2.0 features, and differentiators.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@github-actions github-actions bot removed documentation Issues related to documentation breaking Not backward compatible changes labels Jan 8, 2026
dimitri-yatsenko and others added 4 commits January 8, 2026 15:14
The get_table(), __getitem__, and __contains__ methods now auto-detect
table tier prefixes (Manual: none, Lookup: #, Imported: _, Computed: __).

This allows users to access tables by their base name without knowing
the tier prefix:
  - schema.get_table("experiment") finds "_experiment" (Imported)
  - schema["Subject"] finds "#subject" (Lookup)
  - "Experiment" in schema returns True

Added _find_table_name() helper that checks exact match first, then
tries each tier prefix.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Replace deprecated fetch() calls with to_dicts() in test_virtual_module.py:
- test_virtual_schema_tables_are_queryable: use lab.Experiment().to_dicts()
- test_getitem_is_queryable: use table.to_dicts()

Co-Authored-By: Claude Opus 4.5 <[email protected]>
The create_virtual_module function was removed in 2.0. Update the CLI
to use dj.virtual_schema() for loading schemas via the -s flag.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
VirtualModule allows specifying both module name and schema name,
while virtual_schema() uses schema name for both. The CLI needs
custom module names for the -s flag, so use VirtualModule directly.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@dimitri-yatsenko dimitri-yatsenko added documentation Issues related to documentation breaking Not backward compatible changes labels Jan 8, 2026
@dimitri-yatsenko dimitri-yatsenko added this to the DataJoint 2.0 milestone Jan 8, 2026
- Remove -h shorthand for --host (conflicts with argparse --help)
- Add module-level docstring with usage examples
- Improve function docstring with NumPy style
- Add explicit error handling for invalid schema format
- Improve banner message with version and usage hint
- Use modern type hints (list[str] | None)
- Fix locals() issue: explicitly include dj in REPL namespace

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@github-actions github-actions bot removed documentation Issues related to documentation breaking Not backward compatible changes labels Jan 8, 2026
dimitri-yatsenko and others added 3 commits January 8, 2026 20:05
- Replace -h shorthand with --host (removed to avoid -h/--help conflict)
- Use separate arguments instead of concatenated form
- Use prefix variable for schema name consistency
- Fix assertion string matching

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@dimitri-yatsenko dimitri-yatsenko added documentation Issues related to documentation breaking Not backward compatible changes labels Jan 9, 2026
@dimitri-yatsenko dimitri-yatsenko merged commit c1b36f0 into pre/v2.0 Jan 9, 2026
7 of 8 checks passed
@dimitri-yatsenko dimitri-yatsenko deleted the virtual-modules branch January 9, 2026 02:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

breaking Not backward compatible changes documentation Issues related to documentation enhancement Indicates new improvements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants