Skip to content
This repository was archived by the owner on Apr 1, 2025. It is now read-only.

Commit 31fc1c0

Browse files
author
Patrick Thomson
authored
Merge pull request #30 from lpmi-13/typofix
Fix simple typos and standardize formatting in places
2 parents 38198bc + 0ecf334 commit 31fc1c0

File tree

4 files changed

+13
-13
lines changed

4 files changed

+13
-13
lines changed

docs/coding-style.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -76,7 +76,7 @@ foo = Heap.lookup thing
7676
```
7777

7878
Unlike many Haskell projects, we rely in places on variable shadowing (especially in open-recursive functions).
79-
Avoid variable shadowing if possible, as it can lead to unintuitive error messages; you are free to disable shadowing on in a per-file basis with `{-# OPTIONS_GHC -Wshadow #-}`
79+
Avoid variable shadowing if possible, as it can lead to unintuitive error messages; you are free to disable shadowing on a per-file basis with `{-# OPTIONS_GHC -Wshadow #-}`
8080

8181
# Functions
8282

docs/grammar-development-guide.md

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -105,7 +105,7 @@ Here are some things that might help:
105105

106106
- **Inline:** Adding a rule to the `inline` array strips out the whole node as though it has been removed from the grammar. Each occurrence of this rule in the grammar is replaced with a copy of its definition. Similar to making something hidden, this makes your AST more compact. Inlining doesn't create these nodes at runtime, whereas making something hidden acknowledges the node at runtime but hides it from the AST.
107107

108-
- **Make `seq` visible and `choice` hidden** Sequences typically have meaning. Choices are just containers that point to other things.
108+
- **Make `seq` visible and `choice` hidden:** Sequences typically have meaning. Choices are just containers that point to other things.
109109

110110
- **Making things hidden:** Preceding a rule with an underscore (`_rule`) allows you to omit displaying a rule in the AST. This allows you to make a tree more compact.
111111

@@ -117,7 +117,7 @@ Here are some guidelines to determine what approach to take when removing superf
117117

118118
2. **Add it to the inline array.** If the rule is used more than once and its definition is not simple, make it `inline`. If this does not cause parsing problems, this is the best approach, because it will avoid intermediate node allocations and parsing operations at runtime. One possible side-effect of `inline` is that is sometimes makes the parser much larger in terms of number of states. To evaluate whether this has happened, it’s worth looking at the `STATE_COUNT` in `parser.c` before and after. If the state count goes way up, it may not be worth adding the rule to `inline` since more states mean more one-time memory footprint for the parser. If it goes up a few percent (or goes down), it’s fine to add.
119119

120-
3. **Mark it hidden**. If `inline` causes conflicts or drastically increases the size of the parse table, it's better to mark it as hidden. This is often useful when two two nodes can not exist without one another. For example, `class_body_declaration` was a child of `class_body` and occurred together 100% of the time. Similarly, `type_arguments` can not exist independent of its child node, `type_argument`. In both cases, it makes sense to hide the former.
120+
3. **Mark it hidden.** If `inline` causes conflicts or drastically increases the size of the parse table, it's better to mark it as hidden. This is often useful when two nodes can not exist without one another. For example, `class_body_declaration` was a child of `class_body` and occurred together 100% of the time. Similarly, `type_arguments` can not exist independent of its child node, `type_argument`. In both cases, it makes sense to hide the former.
121121
```diff
122122
(generic_type
123123
(type_identifier)
@@ -134,7 +134,7 @@ Once you have developed a significant portion of the grammar, find a file from a
134134
Use [a script like this](https://github.com/tree-sitter/tree-sitter-java/blob/master/script/parse-examples.rb) is one way to mass test a large repo quickly.
135135

136136
### Sequence your work
137-
Most languages have a long-tail of features that are not frequently utilized in the wild. When supporting a language, our aim is always to be able to parse 100% of a language (or ideally more, since the intent is to support multiple versions). However, this does necessarily all in one go. A good way to do this is to develop the structure and documentation necessary to support open source contribution.
137+
Most languages have a long-tail of features that are not frequently utilized in the wild. When supporting a language, our aim is always to be able to parse 100% of a language (or ideally more, since the intent is to support multiple versions). However, this doesn't necessarily happen all in one go. A good way to do this is to develop the structure and documentation necessary to support open source contribution.
138138

139139
### Handling conflicts
140140

@@ -148,23 +148,23 @@ Conflicts may arise due to ambiguities in the grammar. This is when the parser c
148148
- `commaSep1` - creates a repeating sequence of 1 or more tokens separated by a comma
149149
- `sep1`- creates a repeating sequence of 0 or more tokens separated by the specified delimiter
150150

151-
- **Specify associativity and/or precedence.** Another way of resolving a conflict is through associativity and precedence. Specifying precedence allows us to prioritize productions in the grammar. If there are two or more ways to proceed, the production with the higher precedence will get preference. Left and right associativity can also be used to reflect how to proceed. For instance, a left-associative evaluation is `(a Q b) Q c` vs. a right-associative evaluation would render `a Q (b Q c)`. In this way, associativity changes the meaning of the expression. Resolving conflicts this way is a compile time solution as opposed to the "Add a conflict" section below which means the parser will try deal with the ambiguity at runtime.
151+
- **Specify associativity and/or precedence.** Another way of resolving a conflict is through associativity and precedence. Specifying precedence allows us to prioritize productions in the grammar. If there are two or more ways to proceed, the production with the higher precedence will get preference. Left and right associativity can also be used to reflect how to proceed. For instance, a left-associative evaluation is `(a Q b) Q c` vs. a right-associative evaluation would render `a Q (b Q c)`. In this way, associativity changes the meaning of the expression. Resolving conflicts this way is a compile time solution as opposed to the "Add a conflict" section below which means the parser will try to deal with the ambiguity at runtime.
152152

153153
- **Add a conflict.** Adding conflicts allows the parser to pursue multiple paths in parallel, and decide which one to proceed with further along the process. Adding a conflict for one rule prevents the parser from recursively descending.
154154

155155
_Workflow:_
156-
1. Add a conflict to the `conflicts` if there are 2 rules conflicting (to test that the conflict is the problem and gets the right parse output)
157-
2. Try `prec.left` or `prec.right` based on the options (if that’s not clear, then try both `prec.left` and `prec.right` and compare their outputs)
156+
1. Add a conflict to the `conflicts` if there are 2 rules conflicting (to test that the conflict is the problem and gets the right parse output).
157+
2. Try `prec.left` or `prec.right` based on the options (if that’s not clear, then try both `prec.left` and `prec.right` and compare their outputs).
158158
3. Look at adding a precedence number, usually `1` or `+1`, based on the rule you want to succeed first.
159159
4. Make sure there aren’t duplicate paths to get to the same rule from sibling rules (like having `_literal` in both `_statement` and `_expression`).
160-
And then once things are working in the tree output looks good, remove the conflict rule and try to solve it with associativity or precedence only. This helps confirm the solution before expending too much time adjusting precedence.
160+
And then once things are working and the tree output looks good, remove the conflict rule and try to solve it with associativity or precedence only. This helps confirm the solution before expending too much time adjusting precedence.
161161

162162
### Debugging errors
163163

164-
Tree-sitter's error-handling is great, but sometimes works too well and hides helpful info that help to understand why errors are happening. The following tips can help detect where errors are occurring.
164+
Tree-sitter's error-handling is great, but sometimes works too well and hides helpful info that helps to understand why errors are happening. The following tips can help detect where errors are occurring.
165165

166166
- **Narrow down your problem space.** Triangulate the error by starting with a simple example and progressively adding complexity to better understand where the parser is having trouble.
167167
- **Consult the spec.** Eliminate the possibility of typos or oversights in your logic by looking at the definition of your rule in the spec.
168-
- **Run your code.** Execute your test code to see verify it is valid. Use errors (if any) to get additional information about where the problem may lie.
168+
- **Run your code.** Execute your test code to verify it is valid. Use errors (if any) to get additional information about where the problem may lie.
169169
- **Use visual debug output.** Analyze the forks and look at individual production rules to hone in on the problem.
170-
- **Test all permutations of a particular language construct** This will help you find the edges of your language and ensure your grammar supports them.
170+
- **Test all permutations of a particular language construct.** This will help you find the edges of your language and ensure your grammar supports them.

docs/program-analysis.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ The following is a brief guide to working with the definitional interpreters and
2626

2727
_Helpers:_
2828
- `parseFile`: parses one file.
29-
- `evaluateLanguageProject` takes a list of files and evaluates them usually under concrete semantics.
29+
- `evaluateLanguageProject`: takes a list of files and evaluates them usually under concrete semantics.
3030
- `callGraphLanguageProject`: uses the same mechanism for evaluating, but uses abstract semantics.
3131
- `typeCheckLanguageFile`: allows us to evaluate under type checking semantics.
3232

docs/why-haskell.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,7 @@ Haskell is a pleasure to work in everyday. It's both productive and eye-opening.
4343
- *Editor tooling* is sub-par (especially compared to language communities like Java and C#) and finicky - we often end up just compiling in a separate terminal.
4444
- *Edges of the type system*. We often find ourselves working at the edges of Haskell's incredible type system, wishing for dependent types or reaching for complex workarounds like the [Advanced Overlap][] techniques designed by Oleg Kiselyov & Simon Peyton Jones.
4545
- *Infra glue*. Haskell is very competent at standard infrastructure functionality like running a webserver, but it isn't the focus of the language community so you're often left writing your own libraries and components when you need to plug in to modern infrastructure.
46-
- *Lazy evaluation* isn't always want you want and can have performance problems and make some debugging activities incredibly frustrating. We use the `StrictData` language extension to combat some of these difficulties.
46+
- *Lazy evaluation* isn't always what you want and can have performance problems and make some debugging activities incredibly frustrating. We use the `StrictData` language extension to combat some of these difficulties.
4747
- *Haskell has a reputation for being difficult to learn.* Some of that is well deserved, but half of it has more to do with how many of us first learned imperative programming and the switch to a functional paradigm takes some patience. Haskell also leverages a much more mathematically rigorous set of abstractions which likely aren't as familiar to web developers. We have, however, had very good luck on-boarding new team members with a wide range of previous experience and the quality of learning Haskell resources has really improved.
4848

4949
At this point, we are pretty firmly attached to Haskell's language features to enable many of the objectives of this project: abstract interpretation, graph analysis, effect analysis, code writing, AST matching, etc. Could you implement Semantic in another programming language? Certainly. An early prototype of the semantic diff portion of the project was done in Swift, but it quickly became unwieldy and even the first rough Haskell prototype was considerably more performant. Since adopting Haskell, we've had no trouble plugging into the rest of GitHub's infrastructure: running as a command line tool, a web server (HTTP/JSON), and now a Twirp RPC server. We've been an early adopter of Kubernetes and Moda and now ~~gRPC~~ Twirp at GitHub, often shipping our application on these new infrastructure components well ahead of other teams. We've managed our own build systems, quickly adopted new technologies like Docker, shipped in Enterprise, and much, much more in the short lifespan of the project. We've yet to be constrained by our language choice. If anything, we are amazed daily at Semantic's ability to abstract and represent the syntax and evaluation semantics of half a dozen (and counting) programming languages while keeping all the benefits of a strong static type system. If we'd chosen a more "popular" language it's likely we'd be mired in hundreds of thousands of lines of code and complaining about our tech debt, application performance, and the burden of adding any more languages. As it stands today, we've got 20k lines of Haskell code and some incredible program analysis capabilities at our disposal with little fear of adding more languages or supporting the changing needs of GitHub.

0 commit comments

Comments
 (0)