Skip to content

Commit d78b048

Browse files
committed
tweak to readme
1 parent e09a422 commit d78b048

File tree

1 file changed

+8
-5
lines changed

1 file changed

+8
-5
lines changed

README.md

Lines changed: 8 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -16,15 +16,15 @@ This repository is a step towards what we hope will be a universal code formatte
1616
When looking at code, programmers can easily pick out formatting patterns for various constructs such as how `if` statements and array initializers are laid out. Rule-based formatting systems allow us to specify these input to output patterns. The key idea with our approach is to mimic what programmers do during the act of entering code or formatting. No matter how complicated the formatting structure is for a particular input phrase, formatting always boils down to the following four canonical operations:
1717

1818
1. *nl*: Inject newline
19-
2. *ws*: Inject whitespace
19+
2. *sp*: Inject space character
2020
3. *align*: Align current token with some previous token
2121
4. *indent*: Indent current token from some previous token
2222

2323
The first operation predicates the other three operations in that injecting a newline triggers an alignment or indentation. Not injecting a newline triggers injection of 0 or more spaces.
2424

25-
The basic formatting engine works as follows. At each token in an input sentence, decide which of the canonical operations to perform then emit the current token. Repeat until all tokens have been emitted.
25+
The basic formatting engine works as follows. At each token in an input sentence, decide which of the canonical operations to perform then emit the current token. Repeat until all tokens have been emitted. It's important to note that predictions for previous tokens affect predictions for the current token. For example, inserting a newline after a `{` might force a newline later right before the matching `}`.
2626

27-
To make this approach work, we need a model that maps context information about the current token to one or more canonical operations in {*nl*, *ws*, *align*, *indent*}. To create a formatter for a given language *L*, `CodeBuff` takes as input:
27+
To make this approach work, we need a model that maps context information about the current token to one or more canonical operations in {*nl*, *sp*, *align*, *indent*}. To create a formatter for a given language *L*, `CodeBuff` takes as input:
2828

2929
1. A grammar for *L*
3030
2. A set of input files written in *L*
@@ -51,12 +51,15 @@ For a given token and parse tree context (relative to current token), we predict
5151

5252
For efficiency, we use just two classifiers, one for predicting injection of newlines/spaces and one for predicting alignment/indentation. The result of prediction is a tuple:
5353

54-
**predict<sub>ws</sub>**(*context*) &isin; {(newline, *n*), (whitespace, *n*), none}
54+
*ws* = **predict<sub>ws</sub>**(*context*) &isin; {(newline, *n*), (whitespace, *n*), none}
5555

56-
**predict<sub>align</sub>**(*context*) = &isin; {(align, *delta*, *index*), (indent, *delta*), indent, none}
56+
*alignment* = **predict<sub>align</sub>**(*context*) = &isin; {(align, *delta*, *index*), (indent, *delta*), indent, none}
57+
58+
like auto-regression on a signal, prediction feeds off of prior decisionmaking about newlines and alignment. We even must base 2nd decision, alignment, upon results of first prediction like in a decision tree. So before predicting alignment, we have to compute "is first token on line" based upon *ws>0* result. Also must compute whether "matching symbol exists and is on different line".
5759

5860
### Features
5961

62+
6063
matching symbols and list elements
6164

6265

0 commit comments

Comments
 (0)