Skip to content

Commit 164e7cc

Browse files
Implement Custom Types -- AttributeType (#1289)
* Introduce AttributeType system to replace AttributeAdapter This commit introduces a modern, extensible custom type system for DataJoint: **New Features:** - AttributeType base class with encode()/decode() methods - Global type registry with @register_type decorator - Entry point discovery for third-party type packages (datajoint.types) - Type chaining: dtype can reference another custom type - Automatic validation via validate() method before encoding - resolve_dtype() for resolving chained types **API Changes:** - New: dj.AttributeType, dj.register_type, dj.list_types - AttributeAdapter is now deprecated (backward-compatible wrapper) - Feature flag DJ_SUPPORT_ADAPTED_TYPES is no longer required **Entry Point Specification:** Third-party packages can declare types in pyproject.toml: [project.entry-points."datajoint.types"] zarr_array = "dj_zarr:ZarrArrayType" **Migration Path:** Old AttributeAdapter subclasses continue to work but emit DeprecationWarning. Migrate to AttributeType with encode/decode. * Update documentation for new AttributeType system - Rewrite customtype.md with comprehensive documentation: - Overview of encode/decode pattern - Required components (type_name, dtype, encode, decode) - Type registration with @dj.register_type decorator - Validation with validate() method - Storage types (dtype options) - Type chaining for composable types - Key parameter for context-aware encoding - Entry point packages for distribution - Complete neuroscience example - Migration guide from AttributeAdapter - Best practices - Update attributes.md to reference custom types * Apply ruff-format fixes to AttributeType implementation * Add DJBlobType and migration utilities for blob columns Introduces `<djblob>` as an explicit AttributeType for DataJoint's native blob serialization, allowing users to be explicit about serialization behavior in table definitions. Key changes: - Add DJBlobType class with `serializes=True` flag to indicate it handles its own serialization (avoiding double pack/unpack) - Update table.py and fetch.py to respect the `serializes` flag, skipping blob.pack/unpack when adapter handles serialization - Add `dj.migrate` module with utilities for migrating existing schemas to use explicit `<djblob>` type declarations - Add tests for DJBlobType functionality - Document `<djblob>` type and migration procedure The migration is metadata-only - blob data format is unchanged. Existing `longblob` columns continue to work with implicit serialization for backward compatibility. * Clarify migration handles all blob type variants * Fix ruff linter errors: add migrate to __all__, remove unused import * Remove serializes flag; longblob is now raw bytes Simplified design: - Plain longblob columns store/return raw bytes (no serialization) - <djblob> type handles serialization via encode/decode - Legacy AttributeAdapter handles blob pack/unpack internally for backward compatibility This eliminates the need for the serializes flag by making blob serialization the responsibility of the adapter/type, not the framework. Migration to <djblob> is now required for existing schemas that rely on implicit serialization. * Remove unused blob imports from fetch.py and table.py * Update docs: use <djblob> for serialized data, longblob for raw bytes * Add storage types redesign spec Design document for reimplementing blob, attach, filepath, and object types as a coherent AttributeType system. Separates storage location (@store) from encoding behavior. * Update storage types spec with OAS integration approach - Clarify OAS (object type) as distinct system - Propose storing blob@store/attach@store in OAS _external/ folder - Content-addressed deduplication via hash stored in varchar(64) - Propose <ref@store> to replace filepath@store - Add open questions and implementation phases Co-authored-by: dimitri-yatsenko <[email protected]> * Unify external storage under OAS with content-addressed region - All external storage uses OAS infrastructure - Path-addressed: regular object@store (existing) - Content-addressed: _content/ folder for <djblob@store>, <attach@store> - ContentRegistry table for reference counting and GC - ObjectRef returned for all external types (lazy access) - Deduplication via SHA256 content hash Co-authored-by: dimitri-yatsenko <[email protected]> * Make <djblob@store> and <attach@store> return values transparently - <djblob@store> returns Python object (fetched and deserialized) - <attach@store> returns local file path (downloaded automatically) - Only object@store returns ObjectRef for explicit lazy access - External storage is transparent - @store only affects where, not how Co-authored-by: dimitri-yatsenko <[email protected]> * Introduce layered storage architecture with content core type Three-layer architecture: 1. MySQL types: longblob, varchar, etc. 2. Core DataJoint types: object, content (and @store variants) 3. AttributeTypes: <djblob>, <xblob>, <attach>, <xattach> New core type `content` for content-addressed storage: - Accepts bytes, returns bytes - Handles hashing, deduplication, and GC registration - AttributeTypes like <xblob> build serialization on top Naming convention: - <djblob> = internal serialized (database) - <xblob> = external serialized (content-addressed) - <attach> = internal file - <xattach> = external file Co-authored-by: dimitri-yatsenko <[email protected]> * Add parameterized AttributeTypes and content vs object comparison - content type is single-blob only (no folders) - Parameterized syntax: <type@param> passes param to dtype - Add content vs object comparison table - Clarify when to use each type Co-authored-by: dimitri-yatsenko <[email protected]> * Make content storage per-project and add migration utility - Content-addressed storage is now per-project (not per-schema) - Deduplication works across all schemas in a project - ContentRegistry is project-level (e.g., {project}_content database) - GC scans all schemas in project for references - Add migration utility for legacy ~external_* per-schema stores - Document migration from binary(16) UUID to char(64) SHA256 hash Co-authored-by: dimitri-yatsenko <[email protected]> * Add filepath as third OAS region with ObjectRef interface Three OAS storage regions: 1. object: {schema}/{table}/{pk}/ - PK-addressed, DataJoint controls 2. content: _content/{hash} - content-addressed, deduplicated 3. filepath: _files/{user-path} - user-addressed, user controls Upgraded filepath@store: - Returns ObjectRef (lazy) instead of copying files - Supports streaming via ref.open() - Supports folders (like object) - Stores checksum in JSON column for verification - No more automatic copy to local stage Co-authored-by: dimitri-yatsenko <[email protected]> * Redesign filepath as URI reference tracker and add json core type filepath changes: - No longer an OAS region - tracks external URIs anywhere - Supports any fsspec-compatible URI (s3://, https://, gs://, etc.) - Returns ObjectRef for lazy access via fsspec - No integrity guarantees (external resources may change) - Uses json core type for storage json core type: - Cross-database compatible (MySQL JSON, PostgreSQL JSONB) - Used by filepath and object types Two OAS regions remain: - object: PK-addressed, DataJoint controlled - content: hash-addressed, deduplicated Co-authored-by: dimitri-yatsenko <[email protected]> * Simplify filepath to filepath@store with relative paths for portability - Remove general URI tracker concept from filepath - filepath@store now requires a store parameter and uses relative paths - Key benefit: portability across environments by changing store config - For arbitrary URLs, recommend using varchar (simpler, more transparent) - Add comparison table for filepath@store vs varchar use cases - Update all diagrams and tables to reflect the change Co-authored-by: dimitri-yatsenko <[email protected]> * Simplify to two-layer architecture: database types + AttributeTypes - Remove "core types" concept - all storage types are now AttributeTypes - Built-in AttributeTypes (object, content, filepath@store) use json dtype - JSON stores metadata: path, hash, store name, size, etc. - User-defined AttributeTypes can compose built-in ones (e.g., <xblob> uses content) - Clearer separation: database types (json, longblob) vs AttributeTypes (encode/decode) Co-authored-by: dimitri-yatsenko <[email protected]> * Add three-layer type architecture with core DataJoint types Layer 1: Native database types (FLOAT, TINYINT, etc.) - backend-specific, discouraged Layer 2: Core DataJoint types (float32, uint8, bool, json) - standardized, scientist-friendly Layer 3: AttributeTypes (object, content, <djblob>, etc.) - encode/decode, composable Core types provide: - Consistent interface across MySQL and PostgreSQL - Scientist-friendly names (float32 vs FLOAT, uint8 vs TINYINT UNSIGNED) - Automatic backend translation Co-authored-by: dimitri-yatsenko <[email protected]> * Use angle brackets for all AttributeTypes in definitions All AttributeTypes (Layer 3) now use angle bracket syntax in table definitions: - Core types (Layer 2): int32, float64, varchar(255) - no brackets - AttributeTypes (Layer 3): <object>, <djblob>, <filepath@main> - angle brackets This clear visual distinction helps users immediately identify: - Core types: direct database mapping - AttributeTypes: encode/decode transformation Co-authored-by: dimitri-yatsenko <[email protected]> * Add implementation plan for storage types redesign Seven-phase implementation plan covering: - Phase 1: Core type system foundation (type mappings, store parameters) - Phase 2: Content-addressed storage (<content> type, ContentRegistry) - Phase 3: User-defined AttributeTypes (<xblob>, <attach>, <xattach>, <filepath>) - Phase 4: Insert and fetch integration (type composition) - Phase 5: Garbage collection (project-wide GC scanner) - Phase 6: Migration utilities (legacy external stores) - Phase 7: Documentation and testing Estimated effort: 24-32 days across all phases Co-authored-by: dimitri-yatsenko <[email protected]> * Implement Phase 1: Core type system with store parameter support Phase 1.1 - Core type mappings already complete in declare.py Phase 1.2 - Enhanced AttributeType with store parameter support: - Added parse_type_spec() to parse "<type@store>" into (type_name, store_name) - Updated get_type() to handle parameterized types - Updated is_type_registered() to ignore store parameters - Updated resolve_dtype() to propagate store through type chains - Returns (final_dtype, type_chain, store_name) tuple - Store from outer type overrides inner type's store Phase 1.3 - Updated heading and declaration parsing: - Updated get_adapter() to return (adapter, store_name) tuple - Updated substitute_special_type() to capture store from ADAPTED types - Store parameter is now properly passed through type resolution Co-authored-by: dimitri-yatsenko <[email protected]> * Remove legacy AttributeAdapter support, update tests for AttributeType - Remove AttributeAdapter class and context-based lookup from attribute_adapter.py - Simplify attribute_adapter.py to compatibility shim that re-exports from attribute_type - Remove AttributeAdapter from package exports in __init__.py - Update tests/schema_adapted.py to use @dj.register_type decorator - Update tests/test_adapted_attributes.py to work with globally registered types - Remove test_attribute_adapter_deprecated test from test_attribute_type.py Types are now registered globally via @dj.register_type decorator, eliminating the need for context-based adapter lookup. Co-authored-by: dimitri-yatsenko <[email protected]> * Simplify core type system: remove SERIALIZED_TYPES, clarify blob semantics Core types (uuid, json, blob) now map directly to native database types without any implicit serialization. Serialization is handled by AttributeTypes like <djblob> via encode()/decode() methods. Changes: - Rename SERIALIZED_TYPES to BINARY_TYPES in declare.py (clearer naming) - Update check for default values in compile_attribute() - Clarify in spec that core blob types store raw bytes Co-authored-by: dimitri-yatsenko <[email protected]> * Simplify type system: only core types and AttributeTypes Major simplification of the type system to two categories: 1. Core DataJoint types (no brackets): float32, uuid, bool, json, blob, etc. 2. AttributeTypes (angle brackets): <djblob>, <object>, <attach>, etc. Changes: - declare.py: Remove EXTERNAL_TYPES, BINARY_TYPES; simplify to CORE_TYPE_ALIASES + ADAPTED - heading.py: Remove is_attachment, is_filepath, is_object, is_external flags - fetch.py: Simplify _get() to only handle uuid, json, blob, and adapters - table.py: Simplify __make_placeholder() to only handle uuid, json, blob, numeric - preview.py: Remove special object field handling (will be AttributeType) - staged_insert.py: Update object type check to use adapter All special handling (attach, filepath, object, external storage) will be implemented as built-in AttributeTypes in subsequent phases. Co-authored-by: dimitri-yatsenko <[email protected]> * Define complete core type system with blob→longblob mapping Core DataJoint types (fully supported, recorded in :type: comments): - Numeric: float32, float64, int64, uint64, int32, uint32, int16, uint16, int8, uint8 - Boolean: bool - UUID: uuid → binary(16) - JSON: json - Binary: blob → longblob - Temporal: date, datetime - String: char(n), varchar(n) - Enumeration: enum(...) Changes: - declare.py: Define CORE_TYPES with (pattern, sql_mapping) pairs - declare.py: Add warning for non-standard native type usage - heading.py: Update to use CORE_TYPE_NAMES - storage-types-spec.md: Update documentation to reflect core types Native database types (text, mediumint, etc.) pass through with a warning about non-standard usage. Co-authored-by: dimitri-yatsenko <[email protected]> * Implement Phase 2: Content-Addressed Storage Add content-addressed storage with deduplication for the <content> and <xblob> AttributeTypes. New files: - content_registry.py: Content storage utilities - compute_content_hash(): SHA256 hashing - build_content_path(): Hierarchical path generation (_content/xx/yy/hash) - put_content(): Store with deduplication - get_content(): Retrieve with hash verification - content_exists(), delete_content(), get_content_size() New built-in AttributeTypes in attribute_type.py: - ContentType (<content>): Content-addressed storage for raw bytes - dtype = "json" (stores metadata: hash, store, size) - Automatic deduplication via SHA256 hashing - XBlobType (<xblob>): Serialized blobs with external storage - dtype = "<content>" (composition with ContentType) - Combines djblob serialization with content-addressed storage Updated insert/fetch for type chain support: - table.py: Apply encoder chain from outermost to innermost - fetch.py: Apply decoder chain from innermost to outermost - Both pass store_name through the chain for external storage Example usage: data : <content@mystore> # Raw bytes, deduplicated array : <xblob@mystore> # Serialized objects, deduplicated Co-authored-by: dimitri-yatsenko <[email protected]> * Apply ruff-format to content_registry.py Co-authored-by: dimitri-yatsenko <[email protected]> * Remove legacy compatibility shims: attribute_adapter.py, bypass_serialization Breaking changes: - Remove attribute_adapter.py entirely (hard deprecate) - Remove bypass_serialization flag from blob.py - blobs always serialize now - Remove unused 'database' field from Attribute in heading.py Import get_adapter from attribute_type instead of attribute_adapter. Co-authored-by: dimitri-yatsenko <[email protected]> * Update implementation plan to reflect actual implementation - Document function-based content storage (not registry class) - Add implementation status table - Explain design decision: functions vs database table - Update Phase 5 GC design for scanning approach - Document removed/deprecated items Co-authored-by: dimitri-yatsenko <[email protected]> * Move built-in AttributeTypes to separate builtin_types.py module - Create builtin_types.py with DJBlobType, ContentType, XBlobType - Types serve as examples for users creating custom types - Module docstring includes example of defining a custom GraphType - Add get_adapter() function to attribute_type.py for compatibility - Auto-register built-in types via import at module load Co-authored-by: dimitri-yatsenko <[email protected]> * Implement ObjectType for path-addressed storage Add <object> type for files and folders (Zarr, HDF5, etc.): - Path derived from primary key: {schema}/{table}/objects/{pk}/{field}_{token} - Supports bytes, files, and directories - Returns ObjectRef for lazy fsspec-based access - No deduplication (unlike <content>) Update implementation plan with Phase 2b documenting ObjectType. Co-authored-by: dimitri-yatsenko <[email protected]> * Remove migration phase from implementation plan Migration utilities are out of scope for now. This is a breaking change version - users will need to recreate tables with new types. Co-authored-by: dimitri-yatsenko <[email protected]> * Add staged insert documentation to implementation plan - Document staged_insert.py for direct object storage writes - Add flow comparison: normal insert vs staged insert - Include staged_insert.py in critical files summary Co-authored-by: dimitri-yatsenko <[email protected]> * Implement Phase 3: AttachType, XAttachType, FilepathType Add remaining built-in AttributeTypes: - <attach>: Internal file attachment stored in longblob - <xattach>: External file attachment via <content> with deduplication - <filepath@store>: Reference to existing file (no copy, returns ObjectRef) Update implementation plan to mark Phase 3 complete. Co-authored-by: dimitri-yatsenko <[email protected]> * Implement Phase 5 (GC) and Phase 6 (Tests) Add garbage collection module (gc.py) for content-addressed storage: - scan_references() to find content hashes in schemas - list_stored_content() to enumerate _content/ directory - scan() for orphan detection without deletion - collect() for orphan removal with dry_run option - format_stats() for human-readable output Add test files: - test_content_storage.py for content_registry.py functions - test_type_composition.py for type chain encoding/decoding - test_gc.py for garbage collection Update implementation plan to mark all phases complete. Co-authored-by: dimitri-yatsenko <[email protected]> * Add object type garbage collection support Extend gc.py to handle both storage patterns: - Content-addressed storage: <content>, <xblob>, <xattach> - Path-addressed storage: <object> New functions added: - _uses_object_storage() - detect object type attributes - _extract_object_refs() - extract path refs from JSON - scan_object_references() - scan schemas for object paths - list_stored_objects() - list all objects in storage - delete_object() - delete object directory tree Updated scan() and collect() to handle both storage types, with combined and per-type statistics in the output. Updated tests for new statistics format. Co-authored-by: dimitri-yatsenko <[email protected]> * Move EXTERNAL_TABLE_ROOT to external.py (deprecated) External tables are deprecated in favor of the new storage type system. Move the constant to external.py where it's used, keeping declare.py clean. Co-authored-by: dimitri-yatsenko <[email protected]> * Remove deprecated external.py module External tables (~external_*) are deprecated in favor of the new AttributeType-based storage system. The new types (<xblob>, <content>, <object>) store data directly to storage via StorageBackend without tracking tables. - Remove src/datajoint/external.py entirely - Remove ExternalMapping from schemas.py - Remove external table pre-declaration from table.py Co-authored-by: dimitri-yatsenko <[email protected]> * Replace ClassProperty with metaclass properties Python 3.10+ doesn't have a built-in class property decorator (the @classmethod + @Property chaining was deprecated in 3.11). The modern approach is to define properties on the metaclass, which automatically makes them work at the class level. - Move connection, table_name, full_table_name properties to TableMeta - Create PartMeta subclass with overridden properties for Part tables - Remove ClassProperty class from utils.py Co-authored-by: dimitri-yatsenko <[email protected]> * Simplify test infrastructure to use docker-compose services Replace pytest-managed Docker containers with external docker-compose services. This removes complexity, improves reliability, and allows running tests both from the host machine and inside the devcontainer. - Remove docker container lifecycle management from conftest.py - Add pixi tasks for running tests (services-up, test, test-cov) - Expose MySQL and MinIO ports in docker-compose.yaml for host access - Simplify devcontainer to extend the main docker-compose.yaml - Remove docker dependency from test requirements 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]> * Fix table_name and uuid type resolution bugs - Fix Table.table_name property to delegate to metaclass for UserTable subclasses (table_name was returning None instead of computed name) - Fix heading type loading to preserve database type for core types (uuid, etc.) instead of overwriting with alias from comment - Add original_type field to Attribute for storing the alias while keeping the actual SQL type in type field - Fix tests: remove obsolete test_external.py, update resolve_dtype tests to expect 3 return values, update type alias tests to use CORE_TYPE_SQL - Update pyproject.toml pytest_env to use D: prefix for default-only vars Test results improved from 174 passed/284 errors to 381 passed/62 errors. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]> * Use <djblob> for automatic serialization, fix is_blob detection Type system changes: - Core type `blob` stores raw bytes without serialization - Built-in type `<djblob>` handles automatic serialization/deserialization - Update jobs table to use <djblob> for key and error_stack columns - Remove enable_python_native_blobs config check (always enabled) Bug fixes: - Fix is_blob detection to include NATIVE_BLOB types (longblob, mediumblob, etc.) - Fix original_type fallback when None - Fix test_type_aliases to use lowercase keys for CORE_TYPE_SQL lookup - Allow None context for built-in types in heading initialization - Update native type warning message wording 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]> * Fix settings tests and config loading - Update settings access tests to check type instead of specific value (safemode is set to False by conftest fixtures) - Fix config.load() to handle nested JSON dicts in addition to flat dot-notation keys Test results: 417 passed (was 414) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]> * Fix adapted_attributes tests for new type system - Update GraphType and LayoutToFilepathType to use <djblob> dtype (old filepath@store syntax no longer supported) - Fix local_schema and schema_virtual_module fixtures to pass connection - Remove unused imports Test results: 421 passed, 58 errors, 13 failed (was 417/62/13) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]> * Fix test failures and update to new type system Source code fixes: - Add download_path setting and squeeze handling in fetch.py - Add filename collision handling in AttachType and XAttachType - Fix is_blob detection to check both BLOB and NATIVE_BLOB patterns - Fix FilepathType.validate to accept Path objects - Add proper error message for undecorated tables Test infrastructure updates: - Update schema_external.py to use new <xblob@store>, <xattach@store>, <filepath@store> syntax - Update all test tables to use <djblob> instead of longblob for serialization - Configure object_storage.stores in conftest.py fixtures - Remove obsolete test_admin.py (set_password was removed) - Fix connection passing in various tests to avoid credential prompts - Fix test_query_caching to handle existing directories README: - Add Developer Guide section with setup, test, and pre-commit instructions Test results: 408 passed, 2 skipped (macOS multiprocessing limitation) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]> * Fix object type and remove legacy external tables - Add save_template() method to Config for creating datajoint.json templates - Add default_store setting to ObjectStorageSettings - Fix get_store_backend() to use default object storage when no store specified - Fix StorageBackend._full_path() to prepend location for all protocols - Fix StorageBackend.open() to create parent directories for write mode - Fix ObjectType to support tuple (extension, data) format for streams - Fix ObjectType to pass through pre-computed metadata for staged inserts - Fix staged_insert.py path handling (use relative paths consistently) - Fix table.py __make_placeholder to handle None values for adapter types - Update schema_object.py to use <object> syntax (angle brackets required) - Remove legacy external table support (Table.external property) - Remove legacy external tests (test_filepath, test_external_class, test_s3) - Add tests for save_template() method Test results: 471 passed, 2 skipped 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]> * Remove legacy log table and bump version to 2.0.0a1 - Remove Log class from table.py - Remove _log property and all _log() calls from Table class - Remove log property from Schema class - Remove ~log table special handling in heading.py - Remove test_log.py - Bump version from 0.14.6 to 2.0.0a1 - Remove version length assertion (was only for log table compatibility) The log table was an outdated approach to event logging. Modern systems should use standard Python logging, external log aggregation services, or database audit logs instead. Test results: 470 passed, 2 skipped 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]> * Fix Table.describe() to show core types instead of native types When a column is declared with a core type (like uuid, int32, float64), describe() now displays the original core type name instead of the underlying database type (e.g., shows "uuid" instead of "binary(16)"). Uses the Attribute.original_type field which stores the core type alias. Bump version to 2.0.0a2. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]> * Fix config precedence: environment variables now override config files Following the 12-Factor App methodology, environment variables now take precedence over config file values. This is the standard DevOps practice for deployments where secrets and environment-specific settings should be injected via environment variables (Docker, Kubernetes, CI/CD). Priority order (highest to lowest): 1. Environment variables (DJ_*) 2. Secrets files (.secrets/) 3. Config file (datajoint.json) 4. Defaults Added ENV_VAR_MAPPING to track which settings have env var overrides. The _update_from_flat_dict() method now skips file values when the corresponding env var is set. Added test_env_var_overrides_config_file to verify the new behavior. Bump version to 2.0.0a3. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]> * Fix test compatibility and remove deprecated s3.py - Add test optional dependency to pyproject.toml for docker-compose - Remove deprecated minio-based s3.py client (using fsspec/s3fs now) - Replace minio test fixtures with s3fs - Fix Path.walk() for Python 3.10 compatibility (use os.walk) - Use introspection instead of try/except TypeError for encoder params - Make test_settings tests environment-agnostic (localhost vs docker) All 473 tests passing. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]> * Remove dead code from Table class Remove unused methods that were superseded by AttributeType system: - _process_object_value (replaced by ObjectType.encode) - _build_object_url (only used by _process_object_value) - get_object_storage (only used by _process_object_value) - object_storage property (wrapper for get_object_storage) Also removed unused imports: mimetypes, datetime, timezone, StorageBackend, build_object_path, verify_or_create_store_metadata 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]> * Simplify test setup and reorganize test structure Test organization: - Split tests into unit/ and integration/ folders - Unit tests (test_attribute_type, test_hash, test_settings) run without Docker - Integration tests require MySQL and MinIO services - Update imports to use absolute paths (from tests.schema import ...) Configuration simplification: - Change pytest_env defaults to localhost (was Docker hostnames) - Simplify pixi tasks (env vars now use defaults) - Update devcontainer to set Docker-specific env vars - Update docker-compose comments Tests now run with just: pip install -e ".[test]" docker compose up -d db minio pytest tests/ Unit tests only: pytest tests/unit/ Bump version to 2.0.0a5 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]> * Fix pyparsing deprecation warnings Update to new pyparsing API (snake_case): - setResultsName -> set_results_name - delimitedList -> DelimitedList - parseString -> parse_string - parseAll -> parse_all - endQuoteChar -> end_quote_char - unquoteResults -> unquote_results Reduces test warnings from 5854 to 3. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]> * Fix pydantic model_fields deprecation warning Access model_fields from class instead of instance: - self.model_fields -> type(self).model_fields Reduces test warnings from 3 to 1 (remaining is intentional user warning). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]> --------- Co-authored-by: Claude <[email protected]>
1 parent 895259f commit 164e7cc

File tree

105 files changed

+6640
-2761
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

105 files changed

+6640
-2761
lines changed

.devcontainer/Dockerfile

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@ RUN \
88
pip uninstall datajoint -y
99

1010
USER root
11-
ENV DJ_HOST db
12-
ENV DJ_USER root
13-
ENV DJ_PASS password
11+
ENV DJ_HOST=db
12+
ENV DJ_USER=root
13+
ENV DJ_PASS=password
14+
ENV S3_ENDPOINT=minio:9000

.devcontainer/devcontainer.json

Lines changed: 4 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,6 @@
11
{
2-
"image": "mcr.microsoft.com/devcontainers/typescript-node:0-18",
3-
"features": {
4-
"ghcr.io/devcontainers/features/docker-in-docker:2": {}
5-
},
2+
"dockerComposeFile": ["../docker-compose.yaml", "docker-compose.yml"],
3+
"service": "app",
4+
"workspaceFolder": "/src",
65
"postCreateCommand": "curl -fsSL https://pixi.sh/install.sh | bash && echo 'export PATH=\"$HOME/.pixi/bin:$PATH\"' >> ~/.bashrc"
7-
}
6+
}

.devcontainer/docker-compose.yml

Lines changed: 4 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -1,30 +1,14 @@
1+
# Devcontainer overrides for the app service from ../docker-compose.yaml
2+
# Inherits db and minio services automatically
13
services:
2-
# Update this to the name of the service you want to work with in your docker-compose.yml file
34
app:
4-
# Uncomment if you want to override the service's Dockerfile to one in the .devcontainer
5-
# folder. Note that the path of the Dockerfile and context is relative to the *primary*
6-
# docker-compose.yml file (the first in the devcontainer.json "dockerComposeFile"
7-
# array). The sample below assumes your primary file is in the root of your project.
85
container_name: datajoint-python-devcontainer
9-
image: datajoint/datajoint-python-devcontainer:${PY_VER:-3.11}-${DISTRO:-bookworm}
106
build:
11-
context: .
7+
context: ..
128
dockerfile: .devcontainer/Dockerfile
139
args:
1410
- PY_VER=${PY_VER:-3.11}
1511
- DISTRO=${DISTRO:-bookworm}
16-
17-
volumes:
18-
# Update this to wherever you want VS Code to mount the folder of your project
19-
- ..:/workspaces:cached
20-
21-
# Uncomment the next four lines if you will use a ptrace-based debugger like C++, Go, and Rust.
22-
# cap_add:
23-
# - SYS_PTRACE
24-
# security_opt:
25-
# - seccomp:unconfined
26-
2712
user: root
28-
29-
# Overrides default command so things don't shut down after the process ends.
13+
# Keep container running for devcontainer
3014
command: /bin/sh -c "while sleep 1000; do :; done"

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -187,3 +187,4 @@ dj_local_conf.json
187187
!.vscode/launch.json
188188
# pixi environments
189189
.pixi
190+
_content/

README.md

Lines changed: 70 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -141,3 +141,73 @@ DataJoint (<https://datajoint.com>).
141141
- [Contribution Guidelines](https://docs.datajoint.com/about/contribute/)
142142

143143
- [Developer Guide](https://docs.datajoint.com/core/datajoint-python/latest/develop/)
144+
145+
## Developer Guide
146+
147+
### Prerequisites
148+
149+
- [Docker](https://docs.docker.com/get-docker/) for MySQL and MinIO services
150+
- Python 3.10+
151+
152+
### Running Tests
153+
154+
Tests are organized into `unit/` (no external services) and `integration/` (requires MySQL + MinIO):
155+
156+
```bash
157+
# Install dependencies
158+
pip install -e ".[test]"
159+
160+
# Run unit tests only (fast, no Docker needed)
161+
pytest tests/unit/
162+
163+
# Start MySQL and MinIO for integration tests
164+
docker compose up -d db minio
165+
166+
# Run all tests
167+
pytest tests/
168+
169+
# Run specific test file
170+
pytest tests/integration/test_blob.py -v
171+
172+
# Stop services when done
173+
docker compose down
174+
```
175+
176+
### Alternative: Full Docker
177+
178+
Run tests entirely in Docker (no local Python needed):
179+
180+
```bash
181+
docker compose --profile test up djtest --build
182+
```
183+
184+
### Alternative: Using pixi
185+
186+
[pixi](https://pixi.sh) users can run tests with automatic service management:
187+
188+
```bash
189+
pixi install # First time setup
190+
pixi run test # Starts services and runs tests
191+
pixi run services-down # Stop services
192+
```
193+
194+
### Pre-commit Hooks
195+
196+
```bash
197+
pre-commit install # Install hooks (first time)
198+
pre-commit run --all-files # Run all checks
199+
```
200+
201+
### Environment Variables
202+
203+
Tests use these defaults (configured in `pyproject.toml`):
204+
205+
| Variable | Default | Description |
206+
|----------|---------|-------------|
207+
| `DJ_HOST` | `localhost` | MySQL hostname |
208+
| `DJ_PORT` | `3306` | MySQL port |
209+
| `DJ_USER` | `root` | MySQL username |
210+
| `DJ_PASS` | `password` | MySQL password |
211+
| `S3_ENDPOINT` | `localhost:9000` | MinIO endpoint |
212+
213+
For Docker-based testing (devcontainer, djtest), set `DJ_HOST=db` and `S3_ENDPOINT=minio:9000`.

docker-compose.yaml

Lines changed: 12 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,19 @@
11
# Development environment with MySQL and MinIO services
2-
# To run tests: pytest --cov-report term-missing --cov=datajoint tests
2+
#
3+
# Quick start:
4+
# docker compose up -d db minio # Start services
5+
# pytest tests/ # Run tests (uses localhost defaults)
6+
#
7+
# Full Docker testing:
8+
# docker compose --profile test up djtest --build
39
services:
410
db:
511
image: datajoint/mysql:${MYSQL_VER:-8.0}
612
environment:
713
- MYSQL_ROOT_PASSWORD=${DJ_PASS:-password}
814
command: mysqld --default-authentication-plugin=mysql_native_password
9-
# ports:
10-
# - "3306:3306"
11-
# volumes:
12-
# - ./mysql/data:/var/lib/mysql
15+
ports:
16+
- "3306:3306"
1317
healthcheck:
1418
test: [ "CMD", "mysqladmin", "ping", "-h", "localhost" ]
1519
timeout: 30s
@@ -20,18 +24,15 @@ services:
2024
environment:
2125
- MINIO_ACCESS_KEY=datajoint
2226
- MINIO_SECRET_KEY=datajoint
23-
# ports:
24-
# - "9000:9000"
25-
# volumes:
26-
# - ./minio/config:/root/.minio
27-
# - ./minio/data:/data
27+
ports:
28+
- "9000:9000"
2829
command: server --address ":9000" /data
2930
healthcheck:
3031
test:
3132
- "CMD"
3233
- "curl"
3334
- "--fail"
34-
- "http://minio:9000/minio/health/live"
35+
- "http://localhost:9000/minio/health/live"
3536
timeout: 30s
3637
retries: 5
3738
interval: 15s

docs/src/compute/key-source.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,7 @@ definition = """
4545
-> Recording
4646
---
4747
sample_rate : float
48-
eeg_data : longblob
48+
eeg_data : <djblob>
4949
"""
5050
key_source = Recording & 'recording_type = "EEG"'
5151
```

docs/src/compute/make.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -152,7 +152,7 @@ class ImageAnalysis(dj.Computed):
152152
# Complex image analysis results
153153
-> Image
154154
---
155-
analysis_result : longblob
155+
analysis_result : <djblob>
156156
processing_time : float
157157
"""
158158

@@ -188,7 +188,7 @@ class ImageAnalysis(dj.Computed):
188188
# Complex image analysis results
189189
-> Image
190190
---
191-
analysis_result : longblob
191+
analysis_result : <djblob>
192192
processing_time : float
193193
"""
194194

docs/src/compute/populate.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ class FilteredImage(dj.Computed):
4040
# Filtered image
4141
-> Image
4242
---
43-
filtered_image : longblob
43+
filtered_image : <djblob>
4444
"""
4545

4646
def make(self, key):
@@ -196,7 +196,7 @@ class ImageAnalysis(dj.Computed):
196196
# Complex image analysis results
197197
-> Image
198198
---
199-
analysis_result : longblob
199+
analysis_result : <djblob>
200200
processing_time : float
201201
"""
202202

@@ -230,7 +230,7 @@ class ImageAnalysis(dj.Computed):
230230
# Complex image analysis results
231231
-> Image
232232
---
233-
analysis_result : longblob
233+
analysis_result : <djblob>
234234
processing_time : float
235235
"""
236236

docs/src/design/integrity.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -142,7 +142,7 @@ definition = """
142142
-> EEGRecording
143143
channel_idx : int
144144
---
145-
channel_data : longblob
145+
channel_data : <djblob>
146146
"""
147147
```
148148
![doc_1-many](../images/doc_1-many.png){: style="align:center"}

0 commit comments

Comments
 (0)