CST Node tree and whitespace #1042

cdietrich · 2023-05-05T15:18:36Z

cdietrich
May 5, 2023
Collaborator

is where a reason not hidden cst leaf nodes are created for whitespace?
if it is intentional is there another way to access it?

Answered by msujew

May 10, 2023

@cdietrich The decision to omit whitespace tokens in the CST was a deliberate one. Mostly due to it not being necessary (you can calculate the information based on the offsets of the previous or succeeding note and the original document text) and it being a huge memory consumption sink. It effectively doubles the amount of leaf CST nodes in a given document. Running into performance issues to to this in the past with Xtext, we decide to omit all whitespace information.

You should be able to override the TokenBuilder to change that behavior though. I think that would yield the expected results.

View full answer

montymxb · 2023-05-09T07:23:26Z

montymxb
May 9, 2023
Collaborator

Hi @cdietrich . I'll get back to you on this later today in more detail. In the meantime you can check out discussion #971 , which seems similar to your question here, and hopefully helps a bit.

0 replies

montymxb · 2023-05-09T21:27:30Z

montymxb
May 9, 2023
Collaborator

Hi again. You should check out discussions #782 and #608 , as these contain use cases for retaining and using whitespace tokens that would otherwise be discarded (for indent & dedent tokens). I'm not sure if this is what you have in mind, but it should still be helpful. Either way, if you're working with whitespace in a meaningful way, you'll probably be working with a customized TokenBuilder, which will give you fine grained control over how to proceed on your custom token types.

As for why those tokens are not present, it is by design. In most cases the DSLs that we're writing with in Langium are whitespace insensitive. Working with whitespace sensitive DSLs is not such a common use case, so by intention we discard them (but this case can still come up).

There's also some tests that document a custom TokenBuilder to achieve MultiModeLexing. It wasn't for tokens sourced from whitespace, but it would be reasonable to extend it to such a case.

1 reply

cdietrich May 10, 2023
Collaborator Author

@montymxb this is not about whitespace aware languages. it is about whitespace in normal languages. other hidden tokens like comments are retained in the cst.
so i basically have to use two cst leaf nodes and their offsets to calculate the whitespace in between

msujew · 2023-05-10T11:32:17Z

msujew
May 10, 2023
Maintainer

@cdietrich The decision to omit whitespace tokens in the CST was a deliberate one. Mostly due to it not being necessary (you can calculate the information based on the offsets of the previous or succeeding note and the original document text) and it being a huge memory consumption sink. It effectively doubles the amount of leaf CST nodes in a given document. Running into performance issues to to this in the past with Xtext, we decide to omit all whitespace information.

You should be able to override the TokenBuilder to change that behavior though. I think that would yield the expected results.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CST Node tree and whitespace #1042

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

CST Node tree and whitespace #1042

Uh oh!

cdietrich May 5, 2023 Collaborator

Replies: 3 comments · 1 reply

Uh oh!

montymxb May 9, 2023 Collaborator

Uh oh!

montymxb May 9, 2023 Collaborator

Uh oh!

Uh oh!

cdietrich May 10, 2023 Collaborator Author

Uh oh!

msujew May 10, 2023 Maintainer

cdietrich
May 5, 2023
Collaborator

Replies: 3 comments 1 reply

montymxb
May 9, 2023
Collaborator

montymxb
May 9, 2023
Collaborator

cdietrich May 10, 2023
Collaborator Author

msujew
May 10, 2023
Maintainer