Lustre (LV6) grammar for tree-sitter

This repository contains a tree-sitter grammar that should parse Lustre (.lus) files, as well as its subset dialects (e.g. .ec).

The language bindings that have a centralized package repository (PyPI, crates.io, ...) have not been published yet. I'll do that when the grammar is somewhat complete. You should be able to use them using the git URL though, but I haven't tested.

Remark about encoding: Lustre uses plain ASCII encoding. Input bytes greater than 127 (i.e. outside of ASCII) aren't straight up rejected, but the only grammar rules that will allow them are comments. This gives rise to two important observations:

While tree-sitter supports both UTF-8 and UTF-16, using UTF-16 is nonsensical, as Lustre won't parse the file, since only UTF-8 is compatible with ASCII.
Using another ASCII-compatible encoding than UTF-8 will not work, as tree-sitter requires UTF-8 input to be… well… valid UTF-8. This is unfortunate, because some Lustre files in the original repository are encoded in ISO-8859-1. There is no solution to this problem besides re-encoding the file beforehand.
- If tree-sitter-lustre is used as part of another standalone tool specifically for Lustre, I suggest trying to parse the file as UTF-8, and if that fails, map all bytes with value >127 to a Unicode PUA, so that you'll be able to do the inverse operation if you want to retrieve the original bytes after parsing.

Scope

The goal of this grammar is to be strictly identical to the official one. I plan on using it in a tool that I want to be 100% compatible with the official compiler implementation, so divergences with the specification are not desirable. However, if having some laxity is useful to someone, we could discuss on defining multiple "dialects" of Lustre; the baseline — spec compliant — dialect, and a more tolerant one. Since the grammar DSL is plain old JS, it's very easy to wrap it in a function, with parameters to customize what is and what is not accepted depending on the use case. Then, we'd need to decide which of these dialects is actually published to package registries. I'd be OK with publishing the laxer one, because my tool will most likely use a fork of this repository anyway.

Forward compatibility

It is too early to offer any forward compatibility guarantees when it comes to how the nodes or their fields are named. Please use a pinned version if you rely on specific names or tree structures. The Lustre syntax doesn't evolve too much, so this TS implementation should stabilize over time.

Contributing

Contributions are very welcome. Here's ideas of what I'm interested in:

Fixes to the grammar
Quality assurance : unit tests, differential fuzz testing with the official parser
Queries : highlights.scm, tags.scm, locals.scm, etc.
- I don't use Neovim, so I personally may not have the best judgment
Anything that, in your opinion, improves the overall code quality and reduces the maintaining burden 🙃

Code structure

The rules in grammar.js are sorted to match the "eBNF groups" found in the PDF spec (see below). However, keep in mind that this grammar is not supposed to match the rules 1:1 ; the only goal is that the main rule matches the same files as the official compiler. To quote the tree-sitter book:

Tree-sitter's output is a concrete syntax tree; each node in the tree corresponds directly to a terminal or non-terminal symbol in the grammar. So to produce an easy-to-analyze tree, there should be a direct correspondence between the symbols in your grammar and the recognizable constructs in the language. This might seem obvious, but it is very different from the way that context-free grammars are often written in contexts like language specifications or Yacc/Bison parsers.

Our grammar is supposed to produce relatively simple trees — without too much (useless) nesting — to make it easier to read and/or process later. This requires a bit of human judgement.

Some nodes have been renamed for consistency, simplicity, and to try to adhere to existing naming conventions in tree-sitter grammars. Specifically, I've been taking a lot of inspiration from the Rust grammar for tree-sitter.

References

The official grammar specification
- It is unfortunately not sufficient by itself: it is outdated, it lacks precise information about how to lex some tokens (identifiers, literals...) and doesn't give information about the precedence of any operator
The source code — the most precise specification to exist

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
bindings		bindings
src		src
test/corpus		test/corpus
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
Cargo.toml		Cargo.toml
LICENSE		LICENSE
Makefile		Makefile
Package.swift		Package.swift
README.md		README.md
binding.gyp		binding.gyp
go.mod		go.mod
grammar.js		grammar.js
package.json		package.json
pyproject.toml		pyproject.toml
setup.py		setup.py
tree-sitter.json		tree-sitter.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Lustre (LV6) grammar for tree-sitter

Scope

Forward compatibility

Contributing

Code structure

References

About

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Lustre (LV6) grammar for tree-sitter

Scope

Forward compatibility

Contributing

Code structure

References

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors

Uh oh!

Languages