Skip to content

Conversation

@jahorton
Copy link
Contributor

@jahorton jahorton commented Nov 13, 2025

This PR aims to start an internal doc on the role of SearchSpace, SearchPath, and SearchCluster in the correction-search process.

At present, I don't claim it to be complete by any measure. But, "something" is better than "nothing" here, and this provides a chance to get some eyes on things early in order to determine what works as an explanation and what doesn't. Feedback appreciated, even while in draft mode.

Build-bot: skip
Test-bot: skip

@keymanapp-test-bot
Copy link

User Test Results

Test specification and instructions

User tests are not required

@keymanapp-test-bot keymanapp-test-bot bot changed the title docs(web): starts internal doc on SearchSpace design, requirements, and analysis docs(web): starts internal doc on SearchSpace design, requirements, and analysis 🚂 Nov 13, 2025
@keymanapp-test-bot keymanapp-test-bot bot added this to the A19S16 milestone Nov 13, 2025
@keyman-server keyman-server modified the milestones: A19S16, A19S17 Nov 22, 2025
Copy link
Member

@mcdurdin mcdurdin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This helps a lot in understanding the SearchSpace, SearchPath, SearchCluster types.

But I do have some questions and suggestions:

  • It would help to describe the shape of these types (i.e. properties, methods, and particularly how Path differs from Cluster).
  • Include a concrete example at the top, of a short series of key events + resulting SearchSpace types to illustrate how the types are used. This should be a common case rather than a pathological one!
  • Even after reading, I am not really clear why SearchCluster exists; why do SearchPaths need grouping and how does this help?
  • I am a little unclear on the names of the types - Space vs Path. Why is a Path an implementation of a Space?
  • I am unclear how a SearchPath can 'extend' a SearchSpace given a SearchSpace is just an interface without implementation? Isn't the relationship between SearchPath and SearchSpace 'implements'?
  • I guess a Cluster could be called a PathCluster or a PathGroup to clarify the relationship?
  • It seems like a large part of the reason for these types is fat fingering at word boundaries. Is that right? It's never explictly stated, just obliquely when defining the problem.

Formatting nit: we generally wrap our .md files at 80 chars

@jahorton
Copy link
Contributor Author

jahorton commented Dec 1, 2025

After much reading and searching, I've landed on this: https://en.wikipedia.org/wiki/Modular_decomposition

The new target form for the correction-search graph aligns pretty well with what is described there (after parsing all the formalization). To break it down:

  • Upon receiving a new keystroke, a new set of edges and destination nodes for those edges is constructed. The "destination nodes" are then conceptually grouped into modules.
    • On one hand, all transforms for the current keystroke's input may also be used to define a graph module - no transitions on the search-graph will consider duplication of the keystroke.
      • 'delete' edits of the keystroke are included within this outer "module".
      • 'insert' edits that apply after the keystroke's direct effects are also included within this outer "module".
    • On the other hand, we build partitions of this module such that there is only one outbound virtual node for each, which indicates the total length of the token and which keystrokes (and portions thereof) comprise the token. Different (module) partitions of the "outer module" end at different virtual nodes.

Also of note: https://en.wikipedia.org/wiki/Quotient_graph (which is referenced by the prior link)

  • Short version: it's a graph built out of modules comprising another graph, recognizing the connectivity amongst the modules.

We can also build paths on the quotient graph, starting from the root node until the final module(s) added by the incoming keystroke, to somewhat formalize what the current SearchSpace classes are representing.
- If the final transition on the quotient graph to a single "virtual node" only passes through a single keystroke-level module, a SearchPath instance is used to represent that virtual node.
- When multiple such keystroke-level modules terminate the paths to a single "virtual node", this is represented by constructing SearchPath instances for each such keystroke-level module path, then constructing a SearchCluster from that to represent the virtual node.

Obvious remaining clarifications needed:

  • The "outer module" - the module superset of all keystroke-level modules for a single keystroke.
    • Is not really relevant outside of formalization.
  • "Keystroke-level module" - is a quotient-graph path-terminating module representing a subset of the keystroke input effects, which all target the same single "virtual node"
  • "virtual node" - yep, I keep referring back to that.
  • The quotient-graph path. which in combination with the "virtual nodes" alluded to above, line up well to the current SearchSpace interface and implementing types.

@jahorton jahorton force-pushed the docs/web/add-search-space-doc branch from cb06ed9 to dc1cd3b Compare January 9, 2026 15:04
@jahorton jahorton changed the base branch from feat/web/cluster-splitting-and-merging to refactor/web/correction-heuristic-and-thresholding January 9, 2026 15:04
@github-actions github-actions bot added docs and removed docs labels Jan 9, 2026
@github-actions github-actions bot added docs and removed docs labels Jan 9, 2026
@github-actions github-actions bot added docs and removed docs labels Jan 9, 2026
@github-actions github-actions bot added docs and removed docs labels Jan 9, 2026
@jahorton jahorton marked this pull request as ready for review January 14, 2026 21:25
@jahorton
Copy link
Contributor Author

I've now hoisted this PR much further up in the current chain, allowing it to serve somewhat as a design / implementation doc.

Copy link
Member

@mcdurdin mcdurdin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RSLGTM; I think the document needs more of an introduction but don't want the internal documentation to block merge

Comment on lines +9 to +16
There is one major, notable simplifying assumption in the current
text-correction design: we assume that each keystroke's `Transform` is 100%
independent from the `Transform` selected for every other keystroke. This
assumption is, of course, invalid: the output of keystroke A may selectively
establish the context needed for a Keyman keyboard rule matched by one or more
keys in keystroke B. Efforts to address this limitation are considered
out-of-scope at this time and will be addressed later in a future epic -
epic/true-correction - documented as issue #14709.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assumption is, of course, invalid: the output of keystroke A may selectively
establish the context needed for a Keyman keyboard rule matched by one or more
keys in keystroke B.

This sentence hurts my brain. How can keystroke B have more than one key? That doesn't make sense. I guess what you are saying is that key events are not independent because rule matching depends on the output of previously executed rules?

@keyman-server keyman-server modified the milestones: A19S20, A19S21 Jan 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Todo

Development

Successfully merging this pull request may close these issues.

6 participants