-
Notifications
You must be signed in to change notification settings - Fork 165
Text parser 2
This is a sort of scrap book for the work on text-parser2.
Research into using a universal AST (Abstract Syntax Tree) to represent a document. A complex language parser can provide a rich and complete AST while a simplistic parser can simply build a flat tree.
Current work and results are sometimes updated here: https://gist.github.com/764184
Outline:
- The text parser runs in the embedded nodejs runtime thread.
- Kod can send an "edited text" message to the parser at any given time.
- The parser can send a "tree updated" message to kod at any given time.
Kod sending a "text edited" message to the parser system:
[ui -> parser] "text edited" {
document: [KDocument],
modifiedRange: [123, 4],
changeDelta: 4
}-
documentis a shared object which represents the document which text was altered. This document can be queried for:- language type
- modification timestamp
- complete source text
-
modifiedRangerepresent the character range in the source before the edit occured. E.g. removing "re" from "fred" results in the range [1,2] ("edit started at position 1 and extended 2 characters"). -
changeDeltatells how many characters where added or removed. In the above case of removing "re" from "fred", the value is -2 ("two characters where removed")
This information should be sufficient to calculate the (probably domain-specific) effective extent of the edit. The parser should also be able to consult the AST on what node(s) of the tree was affected by the edit and need to be replaced or re-evaluated.
At this point in time, the parser should know the following:
- What node in the AST need to be replaced
- The full semantically complete range of characters in the new source which need to be parsed
Next, the parser simply parses the sbustring of the new source into a partial AST and finally replaces the old node with the new root node from the partial AST.
Picture yourself a tree...* Imagine the AST is a tree in nature and the edit is a scratched branch on that tree. We need to cut of that branch from the tree, but we want to do as little damage as possible, thus we cut of the branch as far out as possible. Then we consult your local god to create a new branch without scratches and insert that one where we cut off the damaged branch.
Finally, the parser sends a message to Kod to inform "the UI" that the semantics of [document] has changed:
[parser -> ui] "ast changed" {
document: [KDocument],
affectedRange: [120, 12],
ast: [ast]
}-
documentis a shared object representing the document in question. -
affectedRangerepresents a character range (in the current source text space) which the update affected. -
astis the new/current AST (TODO: maybe the ast could be a persistent property of the document instead so we don't need to pass it along all the time?)
When the UI receives a "ast changed" message, displayed text in the affectedRange will have its style updated to visually reflect the containg elements semantic meaning (i.e. its kind, relation, etc). The UI can do other things as well, like updating a visual tree representation of the AST in which the user can click to jump between the visual tree and the source text.
In Mathematica frontend, a user can press a key [Ctrl+.], and the token the cursor is on will be selected (highlighted), when user presses the key again, the selection expands to highlight the next smallest semantic unit. When the key is pressed again, it extends further.
