Skip to content

Path representation and API ergonomics for gap rendering and beyond #213

@johannesmutter

Description

@johannesmutter

The node gap implementation surfaced several related friction points. Most trace back to the same root cause: paths are bare Array<string|number> values with no caching, no structural API, and redundant serialization at every use site.

The problem

Paths are created, serialized, and compared constantly:

  • Gap computation converts paths to .join('.') strings (for Map keys), .join('-') strings (for CSS anchor names), and back via .split('.'). In a 500-node document this is thousands of small string allocations per render cycle.
  • CSS anchor names like --g-page_abc123-body-42-buttons-0-gap-before are long. These are emitted as inline style attributes on every node gap and gap marker, and referenced in anchor() calls inside CSS formulas. Long custom-property names increase memory for inline styles and make DevTools inspection harder (though they also aid debuggability: page_abc123-body-42 is self-describing in a way that a short hash wouldn't be).
  • Structural queries require ad-hoc string parsing: "is this a root-level child?" means counting dots; "get the parent path" means slice(0, -1) (new allocation); "get the child index" means parseInt(path.at(-1)).
  • Schema traversal requires materializing nodes just to find their type and properties. collect_nested_array_gaps calls session.get() + doc.nodes[] lookups + schema property iteration just to enumerate which child properties are node_array typed.
  • Coarse reactivity granularity: session.doc is $state.raw(), so reading it (via session.get(), session.inspect(), or session.doc.nodes[...]) subscribes to doc-level reference changes — i.e., any transaction triggers a re-run. There are no per-path or per-property subscriptions (that's the whole point of $state.raw), but code inside $effect that only cares about one array's child count still re-evaluates on every unrelated edit elsewhere in the document.

Suggested improvements

1. First-class Path type

An immutable value type with cached serializations:

class Path {
  get str()      // "page_1.body.3" → cached
  get css_name() // "page_1-body-3" →  cached
  get depth()    // 3
  get index()    // last segment as number, or null
  child(segment) // Path → Path (could intern common children)
  get parent()   // cached, zero allocation
  equals(other)
  starts_with(other)
  toString()     // returns str, so it works as Map key
}

This would eliminate the hot-path .join() / .split() churn. The trade-off is an extra abstraction.

The current path arrays are transparent and easy to inspect. A Path class would need to be debugger-friendly (good toString(), maybe a custom Chrome DevTools formatter).

2. Selective re-computation

Since doc is $state.raw(), there's no fine-grained reactivity to exploit — every transaction swaps the entire doc reference. Gap computation already mitigates this with per-path PathGapData signals that only notify subscribers (the NodeGapMarkers components) when a specific path's gaps actually change.

Further improvements could include:

  • A lightweight diff between old and new doc after a transaction, to skip recomputing gaps for arrays that didn't change.
  • A session.child_count(array_path) that returns just the length and can be compared cheaply, so the gap computation can bail out early when the count (not the contents) is unchanged.

3. Schema queries without live nodes

Something like session.schema_for(node_type) or session.property_defs(node_type) so code can enumerate a node type's node_array properties from the schema alone, without first fetching a live node instance.

4. Efficient child count

session.child_count(array_path) returns just the length, with a dependency only on the count, not the array contents. Useful when code needs N+1 gaps for N children and shouldn't re-run when children reorder but count stays the same.

On CSS anchor name length

The long anchor names deserve nuance. They encode the full document path (--g-page_abc-body-42-buttons-0-gap-before) which:

  • Pro: self-describing, invaluable when debugging CSS anchor positioning in DevTools
  • Pro: globally unique without extra bookkeeping
  • Con: emitted twice per element (once in anchor-name:, once in each anchor() reference), so memory for inline styles scales with both document size and path depth
  • Con: long custom-property names may affect affect memory usage, though this needs to be tested

A possible middle ground: keep human-readable names in dev mode, switch to shorter hashed names in production builds or just accept the verbosity as a reasonable trade-off for debuggability.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions