Serializing to S-Expression #28
Replies: 6 comments 6 replies
-
|
I would recommend either XML or JSON. These are well-adopted exchange formats, with fast readers and writers. In addition, I would recommend understanding the use cases for what you want to inport and export. In the use-case of displaying a indented text representation of a parse tree, you will want to display the rule names, not integers. If you use rule names instead of rule numbers in the parse tree serialization itself, it will slow the serialization down because you encode the name of the parse rule more than once. In the use-case of "grepping" parse tree nodes, you will need information on line and column numbers (or index), and the name of the original source file. In Trash, one can also "cat" input text, in which case, there is no file name--it's just "stdin". In general, I found for Trash that a parse tree is insufficient for tools that operate on parse trees. There's a lot need beyond the parse tree: the grammar, information produced by the Antlr tool, the input string (and file name, if not stdin). |
Beta Was this translation helpful? Give feedback.
-
A binary format (protobuf, MessagePack) should be used for data-exchanging, otherwise performance slow-down could arise on big trees. I'm not sure if it relies only on string, binary data is very native. It makes sense to reinvensitage what kind of communication is meant. Also, ANTLR uses binary format for internal ATN representation. I don't see a reason why text format should be used for other internal communications. |
Beta Was this translation helpful? Give feedback.
-
|
For "internal" communication (tool to generators, wrappers to runtime), S-expression seems a reasonable choice, as well as JSON (I will comment under @kaby76's post) For communications for external tools, I would say two things:
|
Beta Was this translation helpful? Give feedback.
-
I haven't a problem with MessagePack. It seems to be supported in a lot of different languages. In any case, it's best not to reinvent the wheel. Usually, I am trying to analyze a parse with trtree and trxgrep. This analysis could take several iterations because I don't know what I am looking for and it's easy to make mistakes in the XPath expression. Originally, all the Trash tools like trtree and trxgrep re-parsed from scratch. For really slow grammars like html, redoing the parse was several orders of magnitude slower than JSON serialization. Just about any serialization would be faster than reparsing input for the html grammar. |
Beta Was this translation helpful? Give feedback.
-
|
Actually, as we use Kotlin I suggest using kotlinx.serialization because it provides the following benefits:
|
Beta Was this translation helpful? Give feedback.
-
|
How about using Avro ? |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi @antlr/antlr5 ,
serialization is an important topic which impacts many aspects of antlr5, both internally and externally:
This raises the question of the serialization format(s). There are many to choose from, but one I bumped into recently is S-Expression (see https://en.wikipedia.org/wiki/S-expression) which happens to be the serialization format for WebAssembly.
There could be a benefit in using S-Expression for the parse tree, and would be worth digging into for serializing the grammar (might require specifying an
@startrule, and could generate lots of duplicates...).It would be great if we could have just one serialization format end to end...
Beta Was this translation helpful? Give feedback.
All reactions