-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Written by Claude Code, Opus 4.6. Result of reading all NeoWiki docs, ADRs, context files, and PHP/TS domain models, prompted by @JeroenDeDauw.
How Relations Work Today
A Relation is one of the four value types a Statement can hold (alongside string, number, boolean). When a Subject has a Statement like "CEO": { "type": "relation", "value": [{ "id": "r...", "target": "s..." }] }, that's a Relation.
Each Relation instance has:
- A RelationId (15-char nanoid starting with
r) - A target SubjectId
- Properties — an untyped
array<string, mixed>bag of scalar values
The Schema defines relation-typed properties via RelationProperty, which adds:
relation— a RelationType name (e.g., "Has CEO"), used as the Neo4j relationship labeltargetSchema— the Schema the target must follow (e.g., "Person")multiple— whether multiple targets are allowed
In Neo4j, relations become actual graph edges: (:Subject)-[:Has CEO {id, ...properties}]->(:Subject). Everything else (string/number/boolean statements) becomes node properties.
Issues and Design Questions
1. Relation Properties need schema-level definitions
RelationProperties is array<string, mixed> — anything goes. There's no schema defining what properties a relation can or should have, no validation, and no way for API consumers or tooling to discover what properties are meaningful for a given relation type.
Relation properties correspond to Wikibase's "qualifiers" — metadata on a specific relationship instance. The classic example: Company → Person via "Has CEO" with properties { role: "CEO", since: 2019 }. For cultural heritage use cases, these could include provenance, temporal qualifiers, confidence levels, and source references.
The question: Should relation properties get their own schema definitions (like mini property definitions within the relation's Property Definition)? This would make the data model self-describing, enable validation, and provide the structure needed for any future UI or API consumer to work with them meaningfully.
2. Target Schema is mandatory and singular
targetSchema is required on every relation property. You can't create a relation that points to "any Subject" or to "Subjects of type Person OR Organization."
Use cases where this is restrictive:
- "Related to" — a generic association to any Subject
- Cultural heritage: An artwork might relate to a Person (artist), Organization (museum), Place (location), Event (exhibition) — you'd need four separate relation properties instead of one "Related to"
- "References" or "See also" — pointing to any Subject type
The question: Should targetSchema be optional (allow any schema) or support an array of schemas? Making it optional would cover polymorphic use cases without breaking existing typed relations, which benefit from the constraint for validation and UI filtering.
3. Relations are one-directional
If Company X "Has CEO" Person Y, this is stored in Company X's JSON. Person Y's data has no record of this relationship.
In Neo4j you can traverse both directions, so Cypher queries work fine. But Person Y's infobox won't show "Is CEO of Company X" unless you separately create a "Works at" relation on Person Y, meaning bidirectional data must be maintained manually.
One-directional storage is the right call — storing relations on both sides would create consistency nightmares. But displaying inverse relations will need dedicated design, likely via Cypher queries and Views rather than data model changes.
4. The conceptual fit of Relations-as-Values
Relations are modeled as a value type within the Statement system. This is clean in the sense that everything about a Subject is in its statements. But it creates a conceptual tension:
- Statements are naturally "property → value" pairs:
"Founded in" → 2019 - Relations are naturally "connections between things": Company → Person
The system handles this by treating them identically at the storage/schema level, then bifurcating at the Neo4j level (relations become edges, scalars become properties). This works, but it means the Statement abstraction is doing double duty — it's both a property-value store and a relationship store.
This isn't necessarily wrong — Wikibase does essentially the same thing (Items as values of claims). But it's worth being aware of the conceptual weight it places on "Statement."
5. What happens to dangling references?
The Neo4j updater does MERGE (target {id: $targetId}) — if the target Subject doesn't exist yet (or was deleted), Neo4j creates a ghost node with just an id. The GraphModel doc says deleted Subjects with incoming relations keep their node around.
This is reasonable for graph consistency, but:
- There's no mechanism to clean up fully orphaned ghost nodes
- There's no UI indication when a relation target no longer exists
- Import scenarios could create many dangling references if data is imported in the wrong order
6. No constraints on relation cardinality or uniqueness
The only constraint is multiple: true/false. There's no:
- Minimum/maximum number of relations
- Uniqueness constraint (can you have two relations to the same target?)
- Constraint that a target Subject must actually exist (referential integrity)
For the current stage this is probably fine, but worth noting as future work.
Use Cases to Consider
- Simple typed links: Book → Author (Person). This works well today.
- Qualified relationships: Person → Organization with role and time period. Needs relation properties to be properly designed.
- Cultural heritage provenance: Artwork → Source with confidence, date, provenance chain. This is a rich qualifier use case.
- Cross-wiki relations: A Subject in Wiki A references a Subject in Wiki B. The current SubjectId-only model doesn't support this — it would need a wiki identifier.
- Inverse display: Showing all incoming relations on a Subject (e.g., "Books by this Author"). Needs query/view support, not data model changes.
- Generic/polymorphic relations: "Related to" any type of Subject. Blocked by mandatory
targetSchema.
What Seems Right
- Relations as a value type within Statements — keeps the model uniform
- RelationIds — necessary for stable identity and Neo4j MERGE operations
- One-directional storage — the right call, inverse display should be a query concern
- Target schema constraint for typed relations — provides useful validation and UI filtering
- Separate Property Name and Relation Type — UI labels and graph edge labels serve different purposes
What Needs Work
- Relation properties need schema-level definitions before they can be meaningfully used. The current untyped bag makes them opaque to API consumers and tooling.
- Target Schema could be optional or support multiple schemas to enable polymorphic relations.