Skip to content

lsp: auto-regenerate textmate grammar based on a project config#1757

Open
adam2am wants to merge 11 commits intoDanielXMoore:mainfrom
adam2am:syntaxGeneration
Open

lsp: auto-regenerate textmate grammar based on a project config#1757
adam2am wants to merge 11 commits intoDanielXMoore:mainfrom
adam2am:syntaxGeneration

Conversation

@adam2am
Copy link
Copy Markdown
Contributor

@adam2am adam2am commented Jul 10, 2025

The Motivation:

Previously, enabling a feature like coffeeComment in civetconfig.json would not visually update the syntax highlighting, linter error is gone, but syntax highlighting is static, it would still highlight different words with color inside of the # comment instead of it being a comment

The Solution:

This PR implements a regeneration of a civet.json syntax file based on config (currently only for coffeeComment parse option), and a file watcher that monitors Civet config files. On any change, it automatically regenerates the TextMate grammar (if needed) while keeping previous behavior of restarting a language-server.

The only thing user need to do -> reload the window. The toast appears for that, where you can click the Reload Window

Key Benefits

  • Instant Visual Feedback: What you configure is what you see. Changes to syntax-related flags in civetconfig.json are immediately reflected in the editor after a quick reload.
  • Future-Proof & Extensible: The new grammarList.civet provides a data-driven way to add new syntax features. Adding support for another config flag is now as simple as adding a new entry to that list.

How It Works:

  1. A FileSystemWatcher in extension.civet listens for changes to config files.
  2. On change, it calls the new regenerateTextmate command.
  3. regenerateTextmate reads the project's config, iterates through features defined in grammarList.civet, and surgically injects or removes rules from syntaxes/civet.json.

// Check if options have changed by comparing with cache (async)
cachedOpts .= {}
try
await fsPromises.access cacheFile
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't call access before readFile. There's no guarantee that the file remains in the same state after the call to access and readFile will error anyway if the file is unreadable.

https://nodejs.org/api/fs.html#fspromisesaccesspath-mode

STRd6
STRd6 previously approved these changes Jul 13, 2025
Copy link
Copy Markdown
Collaborator

@STRd6 STRd6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, looks like a good overall approach and it's nice to see the grammar getting some enhancements.

I'll play around with it and get it merged/deployed soon.

Edge-case matrix covered:
# comment + comment
code # trailing + comment
array.# length X not comment
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this now parses as a comment after #1750

@STRd6 STRd6 self-requested a review July 13, 2025 21:59
@STRd6 STRd6 dismissed their stale review July 13, 2025 22:01

On further review it needs some changes.

@STRd6
Copy link
Copy Markdown
Collaborator

STRd6 commented Jul 13, 2025

Hmm... actually looking a little closer this has some issues with multiple projects. Since it rewrites the grammar file and requires a restart of VSCode it won't work well if there are multiple projects with different configuration.

I was hoping VSCode had some kind of way of modifying the grammars but this may be overly complex for how many additional edge cases come up.

One approach we've been thinking about was using the TypeScript semantic tokens to provide more accurate syntax highlighting. It may be a bit more work but it should work in much broader cases as well.

@STRd6 STRd6 removed their request for review July 13, 2025 22:02
@adam2am
Copy link
Copy Markdown
Contributor Author

adam2am commented Jul 14, 2025

Hmm... actually looking a little closer this has some issues with multiple projects. Since it rewrites the grammar file and requires a restart of VSCode it won't work well if there are multiple projects with different configuration.

I was hoping VSCode had some kind of way of modifying the grammars but this may be overly complex for how many additional edge cases come up.

One approach we've been thinking about was using the TypeScript semantic tokens to provide more accurate syntax highlighting. It may be a bit more work but it should work in much broader cases as well.

Yeah, but isn’t textmate grammar static on vscode load?
I took inspiration from vuejs/vetur#210 - they generate the grammar manually via a command and then reload the window

I’m not sure it’s even possible to conditionally add rules to textmate and handle the parse options from config without a window reload.
We could also just drop the automatic detection and regeneration, and instead only update the grammar manually via a command

@STRd6
Copy link
Copy Markdown
Collaborator

STRd6 commented Jul 14, 2025

Another possibility is that we can provide an alternative grammar that can be toggled in the file type setting. This way someone can choose the grammar based on their setting and we can add more as needed.

So for now civet and civet-coffeeComment.

We may even be able to reuse large parts of the grammar if they have some kind of include/import capability.

@adam2am
Copy link
Copy Markdown
Contributor Author

adam2am commented Jul 14, 2025

Another possibility is that we can provide an alternative grammar that can be toggled in the file type setting. This way someone can choose the grammar based on their setting and we can add more as needed.

So for now civet and civet-coffeeComment.

We may even be able to reuse large parts of the grammar if they have some kind of include/import capability.

Agree, it might be an angle. But my main concern is scalability, for now we’d ship 2 options (civet and civet‑coffeeComment), yet as soon as we introduce a third or fourth variant, we'll end up with an N×M explosion of on/off combinations, which will be a maintenance headache (each variant on/off 2x2x2). Unless coffeeComment is the only edge case, I'd lean toward a more flexible approach, even if it means the user needs to reload the window when they switch config parse options

@edemaine
Copy link
Copy Markdown
Collaborator

edemaine commented Jul 15, 2025

(Sorry, wrote this earlier, but just sending now.)

I did a little research on the possibilities here. One potentially useful workaround is setTextDocumentLanguage; see StackOverflow. I think then we could provide multiple Civet "languages" that are identical except perhaps for the start context, according to config. With different contexts (coffee comments vs not, where most rules are identically included in both), we could even dynamically adjust according to "civet coffeeComments". That said, the number of states will grow exponentially with the number of toggles that matter, so I'm not sure this is a great idea.

Alternatively, perhaps we could maintain multiple rewritten versions of the grammar, like in this PR, only writing versions lazily as needed, which would enable a mixed codebase. We could even select which grammar based on global config and opening pragmas. And by being lazy, we wouldn't need to have all combinations, only those used.

Some more related discussion here: microsoft/vscode#68647

@STRd6
Copy link
Copy Markdown
Collaborator

STRd6 commented Jul 15, 2025

I'm fine with generating up to 5! but in practice the top 10-20 most popular settings should cover 95% of people's use cases.

@adam2am
Copy link
Copy Markdown
Contributor Author

adam2am commented Jul 16, 2025

I'm fine with generating up to 5! but in practice the top 10-20 most popular settings should cover 95% of people's use cases.

I'm okay with any of the approaches. Whatever it's a file type setting/command>reload/auto-detect>reload. The lazy-generation seems promising, but we can start with file type and iterate, if it doesn't scale, we can always pivot to a more flexible solution later when needed

My main question here is about the file type approach: can an extension easily change the language association for all relevant files in a workspace? Or would this require users to manually configure files.associations in their settings, which might be a usability hurdle?

@edemaine
Copy link
Copy Markdown
Collaborator

can an extension easily change the language association for all relevant files in a workspace?

I think the extension can set the language association for each file it's applied to, via setTextDocumentLanguage. So each instance can check the config and top "civet" pragma and set its language accordingly.

@adam2am
Copy link
Copy Markdown
Contributor Author

adam2am commented Oct 22, 2025

@STRd6 @edemaine Hey, I managed to get this working dynamically via the semanticToken provider. We can just pass the parseOptions to it, and that's all the setup needed

Here's a repo where you can see it in action. A quick heads-up: the branch is a bit messy because I've bundled in other features I was testing (like a renameProvider F2 and automatic renames with dependency tracking, syntax highlighting for any/string/number/type/enum etc etc.).

https://github.com/adam2am/Civet-lspcheck/tree/battleground-renameHandler

I'll clean it up and split it into bite-sized PRs a little later

@RedCMD
Copy link
Copy Markdown

RedCMD commented Oct 23, 2025

enabling a feature like coffeeComment in civetconfig.json would not visually update the syntax highlighting

what sort of syntax highlighting differences are there with the config changes?

Yeah, but isn’t textmate grammar static on vscode load?

TextMate grammars are only loaded when needed
so you can modify it all you want until the user opens a file with your language

Hey, I managed to get this working dynamically via the semanticToken provider

sounds good
just keep in mind semantic highlighting should only be used as an enhancement, for when TextMate falls short, not a replacement
as the TextMate highlighting still has control of many features like intelisense, comments and brackets
and has a quicker response time

@adam2am
Copy link
Copy Markdown
Contributor Author

adam2am commented Oct 23, 2025

what sort of syntax highlighting differences are there with the config changes?

parseOptions : coffeeComment is bringing coffeescript type comments

Edge-case matrix:
# comment               + comment
code  # trailing        + comment
  array.# length          X not a comment
  object.#private         X not a comment
  @[index %% #]           X not a comment
  floor # / 2             X not a comment
  "#{interpolate}"        X not a comment
  ### ###             handled by separate block-comment rule

some llms when generating civet sometimes would confuse it for a coffeescript and would generate this type of comments instead of javascript type

@edemaine
Copy link
Copy Markdown
Collaborator

@adam2am Semantic provider sounds exciting! This may solve #40 as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants