Skip to content

Commit d38a670

Browse files
refactor: remove text from core types, keep as native passthrough (#1318)
`text` is no longer a core DataJoint type. It remains available as a native SQL passthrough type (with portability warning). Rationale: - Core types should encourage structured, bounded data - varchar(n) covers most legitimate text needs with explicit bounds - json handles structured text better - <object> is better for large/unbounded text (files, sequences, docs) - text behavior varies across databases, hurting portability Changes: - Remove `text` from CORE_TYPES in declare.py - Update NATIVE_TEXT pattern to match plain `text` (in addition to tinytext, mediumtext, longtext) - Update archive docs to note text is native-only Users who need unlimited text can: - Use varchar(n) with generous limit - Use json for structured content - Use <object> for large text files - Use native text types with portability warning Co-authored-by: Claude Opus 4.5 <[email protected]>
1 parent 5f2847c commit d38a670

File tree

3 files changed

+7
-5
lines changed

3 files changed

+7
-5
lines changed

docs/src/archive/design/tables/attributes.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,10 +34,12 @@ Use these portable, scientist-friendly types for cross-database compatibility.
3434

3535
- `char(n)`: fixed-length string of exactly *n* characters.
3636
- `varchar(n)`: variable-length string up to *n* characters.
37-
- `text`: unlimited-length text for long-form content (notes, descriptions, abstracts).
3837
- `enum(...)`: one of several enumerated values, e.g., `enum("low", "medium", "high")`.
3938
Do not use enums in primary keys due to difficulty changing definitions.
4039

40+
> **Note:** For unlimited text, use `varchar` with a generous limit, `json` for structured content,
41+
> or `<object>` for large text files. Native SQL `text` types are supported but not portable.
42+
4143
**Encoding policy:** All strings use UTF-8 encoding (`utf8mb4` in MySQL, `UTF8` in PostgreSQL).
4244
Character encoding and collation are database-level configuration, not part of type definitions.
4345
Comparisons are case-sensitive by default.

docs/src/archive/design/tables/storage-types-spec.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -75,7 +75,9 @@ MySQL and PostgreSQL backends. Users should prefer these over native database ty
7575
|-----------|-------------|-------|------------|
7676
| `char(n)` | Fixed-length | `CHAR(n)` | `CHAR(n)` |
7777
| `varchar(n)` | Variable-length | `VARCHAR(n)` | `VARCHAR(n)` |
78-
| `text` | Unlimited text | `TEXT` | `TEXT` |
78+
79+
> **Note:** Native SQL `text` types (`text`, `tinytext`, `mediumtext`, `longtext`) are supported
80+
> but not portable. Prefer `varchar(n)`, `json`, or `<object>` for portable schemas.
7981
8082
**Encoding:** All strings use UTF-8 (`utf8mb4` in MySQL, `UTF8` in PostgreSQL).
8183
See [Encoding and Collation Policy](#encoding-and-collation-policy) for details.

src/datajoint/declare.py

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -45,8 +45,6 @@
4545
# String types (with parameters)
4646
"char": (r"char\s*\(\d+\)$", None),
4747
"varchar": (r"varchar\s*\(\d+\)$", None),
48-
# Unlimited text
49-
"text": (r"text$", None),
5048
# Enumeration
5149
"enum": (r"enum\s*\(.+\)$", None),
5250
# Fixed-point decimal
@@ -78,7 +76,7 @@
7876
STRING=r"(var)?char\s*\(.+\)$", # Catches char/varchar not matched by core types
7977
TEMPORAL=r"(time|timestamp|year)(\s*\(.+\))?$", # time, timestamp, year (not date/datetime)
8078
NATIVE_BLOB=r"(tiny|small|medium|long)blob$", # Specific blob variants
81-
NATIVE_TEXT=r"(tiny|small|medium|long)text$", # Text variants (use plain 'text' instead)
79+
NATIVE_TEXT=r"(tiny|small|medium|long)?text$", # Native text types (not portable)
8280
# Codecs use angle brackets
8381
CODEC=r"<.+>$",
8482
).items()

0 commit comments

Comments
 (0)