Text parser 2

This is a sort of scrap book for the work on text-parser2.

Research approaches based on AST/TST

Research into using a universal AST (Abstract Syntax Tree) to represent a document. A complex language parser can provide a rich and complete AST while a simplistic parser can simply build a flat tree.

Example implementation of a patching AST parser for JavaScript

Current work and results are sometimes updated here: https://gist.github.com/764184

Outline:

The text parser runs in the embedded nodejs runtime thread.
Kod can send an "edited text" message to the parser at any given time.
The parser can send a "tree updated" message to kod at any given time.

Informal protocol

Kod sending a "text edited" message to the parser system:

[ui -> parser] "text edited" {
                  document: [KDocument],
                  modifiedRange: [123, 4],
                  changeDelta: 4
               }

document is a shared object which represents the document which text was altered. This document can be queried for:
- language type
- modification timestamp
- complete source text
modifiedRange represent the character range in the source before the edit occured. E.g. removing "re" from "fred" results in the range [1,2] ("edit started at position 1 and extended 2 characters").
changeDelta tells how many characters where added or removed. In the above case of removing "re" from "fred", the value is -2 ("two characters where removed")

This information should be sufficient to calculate the (probably domain-specific) effective extent of the edit. The parser should also be able to consult the AST on what node(s) of the tree was affected by the edit and need to be replaced or re-evaluated.

At this point in time, the parser should know the following:

What node in the AST need to be replaced
The full semantically complete range of characters in the new source which need to be parsed

Next, the parser simply parses the sbustring of the new source into a partial AST and finally replaces the old node with the new root node from the partial AST.

Picture yourself a tree...* Imagine the AST is a tree in nature and the edit is a scratched branch on that tree. We need to cut of that branch from the tree, but we want to do as little damage as possible, thus we cut of the branch as far out as possible. Then we consult your local god to create a new branch without scratches and insert that one where we cut off the damaged branch.

Finally, the parser sends a message to Kod to inform "the UI" that the semantics of [document] has changed:

[parser -> ui] "ast changed" {
                  document: [KDocument],
                  affectedRange: [120, 12],
                  ast: [ast]
               }

document is a shared object representing the document in question.
affectedRange represents a character range (in the current source text space) which the update affected.
ast is the new/current AST (TODO: maybe the ast could be a persistent property of the document instead so we don't need to pass it along all the time?)

When the UI receives a "ast changed" message, displayed text in the affectedRange will have its style updated to visually reflect the containg elements semantic meaning (i.e. its kind, relation, etc). The UI can do other things as well, like updating a visual tree representation of the AST in which the user can click to jump between the visual tree and the source text.

Ideas

Mathematica-style semantic expansion of the cursor

In Mathematica frontend, a user can press a key [Ctrl+.], and the token the cursor is on will be selected (highlighted), when user presses the key again, the selection expands to highlight the next smallest semantic unit. When the key is pressed again, it extends further.

-- http://xahlee.org/emacs/syntax_tree_walk.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Text parser 2

Text parser 2

Research approaches based on AST/TST

Example implementation of a patching AST parser for JavaScript

Informal protocol

Ideas

Mathematica-style semantic expansion of the cursor

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally