Skip to content

Commit 85ee0f8

Browse files
Add extend() method as semantic alias for left join
A.extend(B) is equivalent to A.join(B, left=True) but expresses clearer intent: extending an entity set with additional attributes rather than combining two entity sets. - Add extend() method to QueryExpression - Add 'extend' to supported_class_attrs for class-level access - Update pk-rules-spec.md to document extend as actual API - Add integration tests for valid and invalid extend cases 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
1 parent ba81cb8 commit 85ee0f8

File tree

4 files changed

+115
-0
lines changed

4 files changed

+115
-0
lines changed

docs/src/design/pk-rules-spec.md

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,7 @@ In the examples below, `*` marks primary key attributes:
3838
| `A - B` (anti-restriction) | PK(A) — preserved from left operand |
3939
| `A.proj(...)` (projection) | PK(A) — preserved from left operand |
4040
| `A.aggr(B, ...)` (aggregation) | PK(A) — preserved from left operand |
41+
| `A.extend(B)` (extension) | PK(A) — requires A → B |
4142
| `A * B` (join) | Depends on functional dependencies (see below) |
4243

4344
### Join Primary Key Rule
@@ -203,6 +204,47 @@ The following attributes from the right operand's primary key are not determined
203204
the left operand: ['z']. Use an inner join or restructure the query.
204205
```
205206

207+
### Conceptual Note: Left Join as Extension
208+
209+
When `A → B`, the left join `A.join(B, left=True)` is conceptually distinct from the general join operator `A * B`. It is better understood as an **extension** operation rather than a join:
210+
211+
| Aspect | General Join (A * B) | Left Join when A → B |
212+
|--------|---------------------|----------------------|
213+
| Conceptual model | Cartesian product restricted to matching rows | Extend A with attributes from B |
214+
| Row count | May increase, decrease, or stay same | Always equals len(A) |
215+
| Primary key | Depends on functional dependencies | Always PK(A) |
216+
| Relation to projection | Different operation | Variation of projection |
217+
218+
**The extension perspective:**
219+
220+
The operation `A.join(B, left=True)` when `A → B` is closer to **projection** than to **join**:
221+
- It adds new attributes to A (like `A.proj(..., new_attr=...)`)
222+
- It preserves all rows of A
223+
- It preserves A's primary key
224+
- It lacks the Cartesian product aspect that defines joins
225+
226+
DataJoint provides an explicit `extend()` method for this pattern:
227+
228+
```python
229+
# These are equivalent when A → B:
230+
A.join(B, left=True)
231+
A.extend(B) # clearer intent: extend A with B's attributes
232+
```
233+
234+
The `extend()` method:
235+
- Requires `A → B` (raises `DataJointError` otherwise)
236+
- Does not expose `allow_nullable_pk` (that's an internal mechanism)
237+
- Expresses the semantic intent: "add B's attributes to A's entities"
238+
239+
**Relationship to aggregation:**
240+
241+
A similar argument applies to `A.aggr(B, ...)`:
242+
- It preserves A's primary key
243+
- It adds computed attributes derived from B
244+
- It's conceptually a variation of projection with grouping
245+
246+
Both `A.join(B, left=True)` (when A → B) and `A.aggr(B, ...)` can be viewed as **projection-like operations** that extend A's attributes while preserving its entity identity.
247+
206248
### Bypassing the Left Join Constraint
207249

208250
For special cases where the user takes responsibility for handling the potentially nullable primary key, the constraint can be bypassed using `allow_nullable_pk=True`:

src/datajoint/expression.py

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -360,6 +360,45 @@ def join(self, other, semantic_check=True, left=False, allow_nullable_pk=False):
360360
assert len(result.support) == len(result._left) + 1
361361
return result
362362

363+
def extend(self, other, semantic_check=True):
364+
"""
365+
Extend self with attributes from other.
366+
367+
The extend operation adds attributes from `other` to `self` while preserving
368+
self's entity identity. It is semantically equivalent to `self.join(other, left=True)`
369+
but expresses a clearer intent: extending an entity set with additional attributes
370+
rather than combining two entity sets.
371+
372+
Requirements:
373+
self → other: Every attribute in other's primary key must exist in self.
374+
This ensures:
375+
- All rows of self are preserved (no filtering)
376+
- Self's primary key remains the result's primary key (no NULL PKs)
377+
- The operation is a true extension, not a Cartesian product
378+
379+
Conceptual model:
380+
Unlike a general join (Cartesian product restricted by matching attributes),
381+
extend is closer to projection—it adds new attributes to existing entities
382+
without changing which entities are in the result.
383+
384+
Example:
385+
# Session determines Trial (session_id is in Trial's PK)
386+
# But Trial does NOT determine Session (trial_num not in Session)
387+
388+
# Valid: extend trials with session info
389+
Trial.extend(Session) # Adds 'date' from Session to each Trial
390+
391+
# Invalid: Session cannot extend to Trial
392+
Session.extend(Trial) # Error: trial_num not in Session
393+
394+
:param other: QueryExpression whose attributes will extend self
395+
:param semantic_check: If True (default), require homologous namesakes.
396+
If False, match on all namesakes without lineage checking.
397+
:return: Extended QueryExpression with self's PK and combined attributes
398+
:raises DataJointError: If self does not determine other
399+
"""
400+
return self.join(other, semantic_check=semantic_check, left=True)
401+
363402
def __add__(self, other):
364403
"""union e.g. ``q1 + q2``."""
365404
return Union.create(self, other)

src/datajoint/user_tables.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@
2525
"proj",
2626
"aggr",
2727
"join",
28+
"extend",
2829
"fetch",
2930
"fetch1",
3031
"head",

tests/integration/test_aggr_regressions.py

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -147,3 +147,36 @@ def test_left_join_valid(schema_uuid):
147147
assert len(q) == len(qf)
148148
# All Items should have matching Topics since they were populated from Topics
149149
assert len(q) == len(Item())
150+
151+
152+
def test_extend_valid(schema_uuid):
153+
"""extend() is an alias for join(left=True) when A → B."""
154+
# Clean up from previous tests
155+
Item().delete_quick()
156+
Topic().delete_quick()
157+
158+
Topic().add("alice")
159+
Item.populate()
160+
# Item → Topic (topic_id is in Item), so extend is valid
161+
q_extend = Item.extend(Topic)
162+
q_left_join = Item.join(Topic, left=True)
163+
# Should produce identical results
164+
assert len(q_extend) == len(q_left_join)
165+
assert set(q_extend.heading.names) == set(q_left_join.heading.names)
166+
assert q_extend.primary_key == q_left_join.primary_key
167+
168+
169+
def test_extend_invalid_raises_error(schema_uuid):
170+
"""extend() requires A → B. Topic ↛ Item, so this should raise an error."""
171+
from datajoint.errors import DataJointError
172+
173+
# Clean up from previous tests
174+
Item().delete_quick()
175+
Topic().delete_quick()
176+
177+
Topic().add("bob")
178+
Item.populate()
179+
# Topic ↛ Item (item_id not in Topic), so extend should fail
180+
with pytest.raises(DataJointError) as exc_info:
181+
Topic.extend(Item)
182+
assert "left operand to determine" in str(exc_info.value).lower()

0 commit comments

Comments
 (0)