Skip to content

Commit 3c2d805

Browse files
Merge branch 'docs-2.0-migration' into empty_insert
2 parents 736860e + 9207d83 commit 3c2d805

File tree

12 files changed

+721
-1183
lines changed

12 files changed

+721
-1183
lines changed

README.md

Lines changed: 14 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,21 @@
11
# DataJoint for Python
22

3-
DataJoint is an open-source Python framework for building scientific data pipelines.
4-
It implements the **Relational Workflow Model**—a paradigm that extends relational
5-
databases with native support for computational workflows.
3+
DataJoint is a framework for scientific data pipelines that introduces the **Relational Workflow Model**—a paradigm where your database schema is an executable specification of your workflow.
64

7-
**Key Features:**
5+
Traditional databases store data but don't understand how it was computed. DataJoint extends relational databases with native workflow semantics:
86

9-
- **Declarative schema design** — Define tables and relationships in Python
10-
- **Automatic dependency tracking** — Foreign keys encode workflow dependencies
11-
- **Built-in computation** — Imported and Computed tables run automatically
12-
- **Data integrity** — Referential integrity and transaction support
13-
- **Reproducibility** — Immutable data with full provenance
7+
- **Tables represent workflow steps** — Each table is a step in your pipeline where entities are created
8+
- **Foreign keys encode dependencies** — Parent tables must be populated before child tables
9+
- **Computations are declarative** — Define *what* to compute; DataJoint determines *when* and tracks *what's done*
10+
- **Results are immutable** — Computed results preserve full provenance and reproducibility
11+
12+
### Object-Augmented Schemas
13+
14+
Scientific data includes both structured metadata and large data objects (time series, images, movies, neural recordings, gene sequences). DataJoint solves this with **Object-Augmented Schemas (OAS)**—a unified architecture where relational tables and object storage are managed as one system with identical guarantees for integrity, transactions, and lifecycle.
15+
16+
### DataJoint 2.0
17+
18+
**DataJoint 2.0** solidifies these core concepts with a modernized API, improved type system, and enhanced object storage integration. Existing users can refer to the [Migration Guide](https://docs.datajoint.com/migration/) for upgrading from earlier versions.
1419

1520
**Documentation:** https://docs.datajoint.com
1621

src/datajoint/codecs.py

Lines changed: 74 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -502,6 +502,80 @@ def lookup_codec(codec_spec: str) -> tuple[Codec, str | None]:
502502
raise DataJointError(f"Codec <{type_name}> is not registered. " "Define a Codec subclass with name='{type_name}'.")
503503

504504

505+
# =============================================================================
506+
# Decode Helper
507+
# =============================================================================
508+
509+
510+
def decode_attribute(attr, data, squeeze: bool = False):
511+
"""
512+
Decode raw database value using attribute's codec or native type handling.
513+
514+
This is the central decode function used by all fetch methods. It handles:
515+
- Codec chains (e.g., <blob@store> → <hash> → bytes)
516+
- Native type conversions (JSON, UUID)
517+
- External storage downloads (via config["download_path"])
518+
519+
Args:
520+
attr: Attribute from the table's heading.
521+
data: Raw value fetched from the database.
522+
squeeze: If True, remove singleton dimensions from numpy arrays.
523+
524+
Returns:
525+
Decoded Python value.
526+
"""
527+
import json
528+
import uuid as uuid_module
529+
530+
import numpy as np
531+
532+
if data is None:
533+
return None
534+
535+
if attr.codec:
536+
# Get store if present for external storage
537+
store = getattr(attr, "store", None)
538+
if store is not None:
539+
dtype_spec = f"<{attr.codec.name}@{store}>"
540+
else:
541+
dtype_spec = f"<{attr.codec.name}>"
542+
543+
final_dtype, type_chain, _ = resolve_dtype(dtype_spec)
544+
545+
# Process the final storage type (what's in the database)
546+
if final_dtype.lower() == "json":
547+
data = json.loads(data)
548+
elif final_dtype.lower() in ("longblob", "blob", "mediumblob", "tinyblob"):
549+
pass # Blob data is already bytes
550+
elif final_dtype.lower() == "binary(16)":
551+
data = uuid_module.UUID(bytes=data)
552+
553+
# Apply decoders in reverse order: innermost first, then outermost
554+
for codec in reversed(type_chain):
555+
data = codec.decode(data, key=None)
556+
557+
# Squeeze arrays if requested
558+
if squeeze and isinstance(data, np.ndarray):
559+
data = data.squeeze()
560+
561+
return data
562+
563+
# No codec - handle native types
564+
if attr.json:
565+
return json.loads(data)
566+
567+
if attr.uuid:
568+
import uuid as uuid_module
569+
570+
return uuid_module.UUID(bytes=data)
571+
572+
if attr.is_blob:
573+
return data # Raw bytes
574+
575+
# Native types - pass through unchanged
576+
return data
577+
578+
505579
# =============================================================================
506580
# Auto-register built-in codecs
507581
# =============================================================================

0 commit comments

Comments
 (0)