You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Apr 1, 2025. It is now read-only.
Copy file name to clipboardExpand all lines: docs/coding-style.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -76,7 +76,7 @@ foo = Heap.lookup thing
76
76
```
77
77
78
78
Unlike many Haskell projects, we rely in places on variable shadowing (especially in open-recursive functions).
79
-
Avoid variable shadowing if possible, as it can lead to unintuitive error messages; you are free to disable shadowing on in a per-file basis with `{-# OPTIONS_GHC -Wshadow #-}`
79
+
Avoid variable shadowing if possible, as it can lead to unintuitive error messages; you are free to disable shadowing on a per-file basis with `{-# OPTIONS_GHC -Wshadow #-}`
Copy file name to clipboardExpand all lines: docs/grammar-development-guide.md
+10-10Lines changed: 10 additions & 10 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -105,7 +105,7 @@ Here are some things that might help:
105
105
106
106
-**Inline:** Adding a rule to the `inline` array strips out the whole node as though it has been removed from the grammar. Each occurrence of this rule in the grammar is replaced with a copy of its definition. Similar to making something hidden, this makes your AST more compact. Inlining doesn't create these nodes at runtime, whereas making something hidden acknowledges the node at runtime but hides it from the AST.
107
107
108
-
-**Make `seq` visible and `choice` hidden** Sequences typically have meaning. Choices are just containers that point to other things.
108
+
-**Make `seq` visible and `choice` hidden:** Sequences typically have meaning. Choices are just containers that point to other things.
109
109
110
110
-**Making things hidden:** Preceding a rule with an underscore (`_rule`) allows you to omit displaying a rule in the AST. This allows you to make a tree more compact.
111
111
@@ -117,7 +117,7 @@ Here are some guidelines to determine what approach to take when removing superf
117
117
118
118
2.**Add it to the inline array.** If the rule is used more than once and its definition is not simple, make it `inline`. If this does not cause parsing problems, this is the best approach, because it will avoid intermediate node allocations and parsing operations at runtime. One possible side-effect of `inline` is that is sometimes makes the parser much larger in terms of number of states. To evaluate whether this has happened, it’s worth looking at the `STATE_COUNT` in `parser.c` before and after. If the state count goes way up, it may not be worth adding the rule to `inline` since more states mean more one-time memory footprint for the parser. If it goes up a few percent (or goes down), it’s fine to add.
119
119
120
-
3.**Mark it hidden**. If `inline` causes conflicts or drastically increases the size of the parse table, it's better to mark it as hidden. This is often useful when two two nodes can not exist without one another. For example, `class_body_declaration` was a child of `class_body` and occurred together 100% of the time. Similarly, `type_arguments` can not exist independent of its child node, `type_argument`. In both cases, it makes sense to hide the former.
120
+
3.**Mark it hidden.** If `inline` causes conflicts or drastically increases the size of the parse table, it's better to mark it as hidden. This is often useful when two nodes can not exist without one another. For example, `class_body_declaration` was a child of `class_body` and occurred together 100% of the time. Similarly, `type_arguments` can not exist independent of its child node, `type_argument`. In both cases, it makes sense to hide the former.
121
121
```diff
122
122
(generic_type
123
123
(type_identifier)
@@ -134,7 +134,7 @@ Once you have developed a significant portion of the grammar, find a file from a
134
134
Use [a script like this](https://github.com/tree-sitter/tree-sitter-java/blob/master/script/parse-examples.rb) is one way to mass test a large repo quickly.
135
135
136
136
### Sequence your work
137
-
Most languages have a long-tail of features that are not frequently utilized in the wild. When supporting a language, our aim is always to be able to parse 100% of a language (or ideally more, since the intent is to support multiple versions). However, this does necessarily all in one go. A good way to do this is to develop the structure and documentation necessary to support open source contribution.
137
+
Most languages have a long-tail of features that are not frequently utilized in the wild. When supporting a language, our aim is always to be able to parse 100% of a language (or ideally more, since the intent is to support multiple versions). However, this doesn't necessarily happen all in one go. A good way to do this is to develop the structure and documentation necessary to support open source contribution.
138
138
139
139
### Handling conflicts
140
140
@@ -148,23 +148,23 @@ Conflicts may arise due to ambiguities in the grammar. This is when the parser c
148
148
- `commaSep1` - creates a repeating sequence of 1 or more tokens separated by a comma
149
149
- `sep1`- creates a repeating sequence of 0 or more tokens separated by the specified delimiter
150
150
151
-
- **Specify associativity and/or precedence.** Another way of resolving a conflict is through associativity and precedence. Specifying precedence allows us to prioritize productions in the grammar. If there are two or more ways to proceed, the production with the higher precedence will get preference. Left and right associativity can also be used to reflect how to proceed. For instance, a left-associative evaluation is `(a Q b) Q c` vs. a right-associative evaluation would render `a Q (b Q c)`. In this way, associativity changes the meaning of the expression. Resolving conflicts this way is a compile time solution as opposed to the "Add a conflict" section below which means the parser will try deal with the ambiguity at runtime.
151
+
- **Specify associativity and/or precedence.** Another way of resolving a conflict is through associativity and precedence. Specifying precedence allows us to prioritize productions in the grammar. If there are two or more ways to proceed, the production with the higher precedence will get preference. Left and right associativity can also be used to reflect how to proceed. For instance, a left-associative evaluation is `(a Q b) Q c` vs. a right-associative evaluation would render `a Q (b Q c)`. In this way, associativity changes the meaning of the expression. Resolving conflicts this way is a compile time solution as opposed to the "Add a conflict" section below which means the parser will try to deal with the ambiguity at runtime.
152
152
153
153
- **Add a conflict.** Adding conflicts allows the parser to pursue multiple paths in parallel, and decide which one to proceed with further along the process. Adding a conflict for one rule prevents the parser from recursively descending.
154
154
155
155
_Workflow:_
156
-
1. Add a conflict to the `conflicts` if there are 2 rules conflicting (to test that the conflict is the problem and gets the right parse output)
157
-
2. Try `prec.left` or `prec.right` based on the options (if that’s not clear, then try both `prec.left` and `prec.right` and compare their outputs)
156
+
1. Add a conflict to the `conflicts` if there are 2 rules conflicting (to test that the conflict is the problem and gets the right parse output).
157
+
2. Try `prec.left` or `prec.right` based on the options (if that’s not clear, then try both `prec.left` and `prec.right` and compare their outputs).
158
158
3. Look at adding a precedence number, usually `1` or `+1`, based on the rule you want to succeed first.
159
159
4. Make sure there aren’t duplicate paths to get to the same rule from sibling rules (like having `_literal` in both `_statement` and `_expression`).
160
-
And then once things are working in the tree output looks good, remove the conflict rule and try to solve it with associativity or precedence only. This helps confirm the solution before expending too much time adjusting precedence.
160
+
And then once things are working and the tree output looks good, remove the conflict rule and try to solve it with associativity or precedence only. This helps confirm the solution before expending too much time adjusting precedence.
161
161
162
162
### Debugging errors
163
163
164
-
Tree-sitter's error-handling is great, but sometimes works too well and hides helpful info that help to understand why errors are happening. The following tips can help detect where errors are occurring.
164
+
Tree-sitter's error-handling is great, but sometimes works too well and hides helpful info that helps to understand why errors are happening. The following tips can help detect where errors are occurring.
165
165
166
166
- **Narrow down your problem space.** Triangulate the error by starting with a simple example and progressively adding complexity to better understand where the parser is having trouble.
167
167
- **Consult the spec.** Eliminate the possibility of typos or oversights in your logic by looking at the definition of your rule in the spec.
168
-
- **Run your code.** Execute your test code to see verify it is valid. Use errors (if any) to get additional information about where the problem may lie.
168
+
- **Run your code.** Execute your test code to verify it is valid. Use errors (if any) to get additional information about where the problem may lie.
169
169
- **Use visual debug output.** Analyze the forks and look at individual production rules to hone in on the problem.
170
-
- **Test all permutations of a particular language construct** This will help you find the edges of your language and ensure your grammar supports them.
170
+
- **Test all permutations of a particular language construct.** This will help you find the edges of your language and ensure your grammar supports them.
Copy file name to clipboardExpand all lines: docs/why-haskell.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -43,7 +43,7 @@ Haskell is a pleasure to work in everyday. It's both productive and eye-opening.
43
43
-*Editor tooling* is sub-par (especially compared to language communities like Java and C#) and finicky - we often end up just compiling in a separate terminal.
44
44
-*Edges of the type system*. We often find ourselves working at the edges of Haskell's incredible type system, wishing for dependent types or reaching for complex workarounds like the [Advanced Overlap][] techniques designed by Oleg Kiselyov & Simon Peyton Jones.
45
45
-*Infra glue*. Haskell is very competent at standard infrastructure functionality like running a webserver, but it isn't the focus of the language community so you're often left writing your own libraries and components when you need to plug in to modern infrastructure.
46
-
-*Lazy evaluation* isn't always want you want and can have performance problems and make some debugging activities incredibly frustrating. We use the `StrictData` language extension to combat some of these difficulties.
46
+
-*Lazy evaluation* isn't always what you want and can have performance problems and make some debugging activities incredibly frustrating. We use the `StrictData` language extension to combat some of these difficulties.
47
47
-*Haskell has a reputation for being difficult to learn.* Some of that is well deserved, but half of it has more to do with how many of us first learned imperative programming and the switch to a functional paradigm takes some patience. Haskell also leverages a much more mathematically rigorous set of abstractions which likely aren't as familiar to web developers. We have, however, had very good luck on-boarding new team members with a wide range of previous experience and the quality of learning Haskell resources has really improved.
48
48
49
49
At this point, we are pretty firmly attached to Haskell's language features to enable many of the objectives of this project: abstract interpretation, graph analysis, effect analysis, code writing, AST matching, etc. Could you implement Semantic in another programming language? Certainly. An early prototype of the semantic diff portion of the project was done in Swift, but it quickly became unwieldy and even the first rough Haskell prototype was considerably more performant. Since adopting Haskell, we've had no trouble plugging into the rest of GitHub's infrastructure: running as a command line tool, a web server (HTTP/JSON), and now a Twirp RPC server. We've been an early adopter of Kubernetes and Moda and now ~~gRPC~~ Twirp at GitHub, often shipping our application on these new infrastructure components well ahead of other teams. We've managed our own build systems, quickly adopted new technologies like Docker, shipped in Enterprise, and much, much more in the short lifespan of the project. We've yet to be constrained by our language choice. If anything, we are amazed daily at Semantic's ability to abstract and represent the syntax and evaluation semantics of half a dozen (and counting) programming languages while keeping all the benefits of a strong static type system. If we'd chosen a more "popular" language it's likely we'd be mired in hundreds of thousands of lines of code and complaining about our tech debt, application performance, and the burden of adding any more languages. As it stands today, we've got 20k lines of Haskell code and some incredible program analysis capabilities at our disposal with little fear of adding more languages or supporting the changing needs of GitHub.
0 commit comments