Review and flesh out Relations design

_Written by Claude Code, Opus 4.6. Result of reading all NeoWiki docs, ADRs, context files, and PHP/TS domain models, prompted by @JeroenDeDauw._

---

## How Relations Work Today

A Relation is one of the four **value types** a Statement can hold (alongside string, number, boolean). When a Subject has a Statement like `"CEO": { "type": "relation", "value": [{ "id": "r...", "target": "s..." }] }`, that's a Relation.

Each Relation instance has:
- A **RelationId** (15-char nanoid starting with `r`)
- A **target** SubjectId
- **Properties** — an untyped `array<string, mixed>` bag of scalar values

The **Schema** defines relation-typed properties via `RelationProperty`, which adds:
- `relation` — a RelationType name (e.g., "Has CEO"), used as the Neo4j relationship label
- `targetSchema` — the Schema the target must follow (e.g., "Person")
- `multiple` — whether multiple targets are allowed

In **Neo4j**, relations become actual graph edges: `(:Subject)-[:Has CEO {id, ...properties}]->(:Subject)`. Everything else (string/number/boolean statements) becomes node properties.

---

## Issues and Design Questions

### 1. Relation Properties need schema-level definitions

`RelationProperties` is `array<string, mixed>` — anything goes. There's no schema defining what properties a relation can or should have, no validation, and no way for API consumers or tooling to discover what properties are meaningful for a given relation type.

Relation properties correspond to Wikibase's "qualifiers" — metadata on a specific relationship instance. The classic example: Company → Person via "Has CEO" with properties `{ role: "CEO", since: 2019 }`. For cultural heritage use cases, these could include provenance, temporal qualifiers, confidence levels, and source references.

**The question**: Should relation properties get their own schema definitions (like mini property definitions within the relation's Property Definition)? This would make the data model self-describing, enable validation, and provide the structure needed for any future UI or API consumer to work with them meaningfully.

### 2. Target Schema is mandatory and singular

`targetSchema` is required on every relation property. You can't create a relation that points to "any Subject" or to "Subjects of type Person OR Organization."

Use cases where this is restrictive:
- "Related to" — a generic association to any Subject
- Cultural heritage: An artwork might relate to a Person (artist), Organization (museum), Place (location), Event (exhibition) — you'd need four separate relation properties instead of one "Related to"
- "References" or "See also" — pointing to any Subject type

**The question**: Should `targetSchema` be optional (allow any schema) or support an array of schemas? Making it optional would cover polymorphic use cases without breaking existing typed relations, which benefit from the constraint for validation and UI filtering.

### 3. Relations are one-directional

If Company X "Has CEO" Person Y, this is stored in Company X's JSON. Person Y's data has no record of this relationship.

In Neo4j you can traverse both directions, so Cypher queries work fine. But Person Y's infobox won't show "Is CEO of Company X" unless you separately create a "Works at" relation on Person Y, meaning bidirectional data must be maintained manually.

One-directional storage is the right call — storing relations on both sides would create consistency nightmares. But **displaying inverse relations** will need dedicated design, likely via Cypher queries and Views rather than data model changes.

### 4. The conceptual fit of Relations-as-Values

Relations are modeled as a value type within the Statement system. This is clean in the sense that everything about a Subject is in its statements. But it creates a conceptual tension:

- Statements are naturally "property → value" pairs: `"Founded in" → 2019`
- Relations are naturally "connections between things": Company → Person

The system handles this by treating them identically at the storage/schema level, then bifurcating at the Neo4j level (relations become edges, scalars become properties). This works, but it means the Statement abstraction is doing double duty — it's both a property-value store and a relationship store.

This isn't necessarily wrong — Wikibase does essentially the same thing (Items as values of claims). But it's worth being aware of the conceptual weight it places on "Statement."

### 5. What happens to dangling references?

The Neo4j updater does `MERGE (target {id: $targetId})` — if the target Subject doesn't exist yet (or was deleted), Neo4j creates a ghost node with just an `id`. The GraphModel doc says deleted Subjects with incoming relations keep their node around.

This is reasonable for graph consistency, but:
- There's no mechanism to **clean up** fully orphaned ghost nodes
- There's no **UI indication** when a relation target no longer exists
- Import scenarios could create many dangling references if data is imported in the wrong order

### 6. No constraints on relation cardinality or uniqueness

The only constraint is `multiple: true/false`. There's no:
- Minimum/maximum number of relations
- Uniqueness constraint (can you have two relations to the same target?)
- Constraint that a target Subject must actually exist (referential integrity)

For the current stage this is probably fine, but worth noting as future work.

---

## Use Cases to Consider

1. **Simple typed links**: Book → Author (Person). This works well today.
2. **Qualified relationships**: Person → Organization with role and time period. Needs relation properties to be properly designed.
3. **Cultural heritage provenance**: Artwork → Source with confidence, date, provenance chain. This is a rich qualifier use case.
4. **Cross-wiki relations**: A Subject in Wiki A references a Subject in Wiki B. The current SubjectId-only model doesn't support this — it would need a wiki identifier.
5. **Inverse display**: Showing all incoming relations on a Subject (e.g., "Books by this Author"). Needs query/view support, not data model changes.
6. **Generic/polymorphic relations**: "Related to" any type of Subject. Blocked by mandatory `targetSchema`.

---

## What Seems Right

- Relations as a value type within Statements — keeps the model uniform
- RelationIds — necessary for stable identity and Neo4j MERGE operations
- One-directional storage — the right call, inverse display should be a query concern
- Target schema constraint for typed relations — provides useful validation and UI filtering
- Separate Property Name and Relation Type — UI labels and graph edge labels serve different purposes

## What Needs Work

1. **Relation properties need schema-level definitions** before they can be meaningfully used. The current untyped bag makes them opaque to API consumers and tooling.
2. **Target Schema could be optional or support multiple schemas** to enable polymorphic relations.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Review and flesh out Relations design #630

How Relations Work Today

Issues and Design Questions

1. Relation Properties need schema-level definitions

2. Target Schema is mandatory and singular

3. Relations are one-directional

4. The conceptual fit of Relations-as-Values

5. What happens to dangling references?

6. No constraints on relation cardinality or uniqueness

Use Cases to Consider

What Seems Right

What Needs Work

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Review and flesh out Relations design #630

Description

How Relations Work Today

Issues and Design Questions

1. Relation Properties need schema-level definitions

2. Target Schema is mandatory and singular

3. Relations are one-directional

4. The conceptual fit of Relations-as-Values

5. What happens to dangling references?

6. No constraints on relation cardinality or uniqueness

Use Cases to Consider

What Seems Right

What Needs Work

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions