Skip to content

lsp: Override virtual file system#793

Merged
jaraonthe-dot-net merged 7 commits intomasterfrom
feature/lsp-virtual-file-system
Mar 16, 2026
Merged

lsp: Override virtual file system#793
jaraonthe-dot-net merged 7 commits intomasterfrom
feature/lsp-virtual-file-system

Conversation

@jaraonthe-dot-net
Copy link
Contributor

The language server provides its own VirtualFileSystem in order to intercept reads of files that are currently managed by the LSP client.

Additionally, parsing dependencies between files are tracked in order to publish diagnostics for dependent files too.

Updated LICENSE.md with lsp4j information.

@jaraonthe-dot-net jaraonthe-dot-net self-assigned this Mar 7, 2026
@jaraonthe-dot-net jaraonthe-dot-net added enhancement New feature or request lsp Language Server Protocol related labels Mar 7, 2026
@jaraonthe-dot-net jaraonthe-dot-net force-pushed the feature/lsp-virtual-file-system branch from 2c83aa4 to e36b938 Compare March 7, 2026 16:17
@flofriday
Copy link
Contributor

I really like the dependent updates, that seems like a really useful feature.

However the VFS implementation seems more complex than I thought would have been necessary when I designed the abstraction.
I had in mind that the LSP maintains one single instance of the VFS (inherited from the DiskVFS) but additionally updates a List of Files that are owned and modified by the client.
Why does each document have to hold it's own FileSystem?

@jaraonthe-dot-net
Copy link
Contributor Author

jaraonthe-dot-net commented Mar 8, 2026

Why does each document have to hold it's own FileSystem?

This is to figure out which files that file depends on: The VFS records which files are read, and that is the basis for the data in the DependencyMap. I saw no other way to get hold of this information (especially if the Parser produces Diagnostics instead of an AST).

@flofriday
Copy link
Contributor

I'm a bit reluctant to merge this because the complexity in the LSP increased somewhat substantially, and the impact on performance is probably not neglect able.

In your implementation you first compile/check the file that's open and then all other files that depend on it that are also open. For large base files (like rv3264im) parsing and checking it takes 300ms (on my machine without native built) but with these changes we now parse and check it 5 more times as part of the dependents if all files that depend on it are also open, resulting in 1.8s total (at least).

To be fair the ergonomics of it could be worth it. Also we could parallelize it so that it doesn't block new updates on the base file (I'm not sure if that's already done).

I fear there are also some shortcommings in the implementation, that make it even more complex (If I understand it correctly, I'm not totally sure).

1) Diagnostics in dependencies aren't published
Imagine two files A -> B where A imports B. Now the user only opens A which cannot be compiled because of errors in B. In this case the User won't see any errors until they open B, but they won't also get any support for A and might be confused.

2) The importer can break the importee
In the import statement you can override macros inside the imported file, which means that the file might compile to something different depending on how you import it. Which means at the moment if someone overwrites a macro wrong the diagnostic would land inside the imported file but we only check the file on it's own and so we will never see that error.
This is potentially quite niche but, this language feature also prevents us from caching the AST of the imported file to improve performance.

@flofriday
Copy link
Contributor

flofriday commented Mar 10, 2026

Again I'm still not too sure about the dependency implementation/apporach but in case you feel really passionate about it, I started the review.

During the review I had an idea about a different approach for the VFS.
What if we create a new VFS for each compilation that takes a snapshot of all files that are currently owned by the client and at the start of the compilation and falls back to the disk otherwise.
This way the VFS would no longer have to hold a reference to the TextDocumentService and there would also be less potential for race conditions.

@jaraonthe-dot-net jaraonthe-dot-net force-pushed the feature/lsp-virtual-file-system branch from e36b938 to 86815e2 Compare March 12, 2026 20:36
@jaraonthe-dot-net
Copy link
Contributor Author

Please note my new Fixup commit that addresses several of your smaller points.


In your implementation you first compile/check the file that's open and then all other files that depend on it that are also open. For large base files (like rv3264im) parsing and checking it takes 300ms (on my machine without native built) but with these changes we now parse and check it 5 more times as part of the dependents if all files that depend on it are also open, resulting in 1.8s total (at least).

In my testing it was substantially faster than that: Pushing diagnostics for all 9 files that (directly or indirectly) depend on rv3264im.vadl took less than 300ms. But even if it takes longer than that: The user is not actively focused on these other files at that moment, so a bit of delay in updating their diagnostics shouldn't be a problem.
But I took your suggestion anyway and parallelized this now - works like a charm.
Notes for testing: 1) VSCode shows this very nicely as it colors all open files that it has errors for in red (both in the Explorer and on the Editor tabs). 2) Log output of the server shows which thread the output occurred in, thus indicating the level of parallelism.


1) Diagnostics in dependencies aren't published Imagine two files A -> B where A imports B. Now the user only opens A which cannot be compiled because of errors in B. In this case the User won't see any errors until they open B, but they won't also get any support for A and might be confused.

You're right, that's weird behavior for the user. I've added this improvement: If the compiler reports a diagnostic in an imported file, then a new LSP diagnostic is placed right at the top of the current file (the one we're preparing diagnostics for) stating in which file an error occurred. It's not a perfect solution but better than nothing; it can certainly be improved (e.g. by including a code action to jump to the file in question), but that is out-of-scope for this PR.

2) The importer can break the importee In the import statement you can override macros inside the imported file, which means that the file might compile to something different depending on how you import it. Which means at the moment if someone overwrites a macro wrong the diagnostic would land inside the imported file but we only check the file on it's own and so we will never see that error. This is potentially quite niche but, this language feature also prevents us from caching the AST of the imported file to improve performance.

As you say, this is a niche problem; I've never seen an example for this mechanism. It is something to tackle in another PR/issue if it is actually a problem - but maybe my solution for 1) above also helps here?


During the review I had an idea about a different approach for the VFS. What if we create a new VFS for each compilation that takes a snapshot of all files that are currently owned by the client and at the start of the compilation and falls back to the disk otherwise. This way the VFS would no longer have to hold a reference to the TextDocumentService and there would also be less potential for race conditions.

That is somewhat similar to my approach of tying the VFS to the Document Snapshot. Instead of having a snapshot for a file that is based on that file's version, you want to create a new Snapshot for a set of files. I believe that is overkill. I think that race conditions are not a problem:
If a user edits files A and B (where A depends on B) in such quick succession that B's edits are recorded while the server is still busy compiling A's new version, which version of B should be used in that compilation? The old or the new version? There is no clear argument for which version is better.
Your approach aims to freeze file state at one moment in time, ideally so that the old version of B would be used in compiling A (i.e. the version that was current at the moment that the edit in A happened) (but as race conditions go, this would not be guaranteed, as the edit of B may be recorded before its state is retrieved from the server's internal Document store for the Compilation Snapshot). In my solution it may be more likely that the newest version of B is used, but maybe that is even preferable over compiling an outdated file state. In the end, LSP handles each file independently, there is no connection whatsoever between the versions of different files, and so no rule on which version of another file to use.

All in all, I wouldn't worry too much about which version of an imported file is used in compilation. Eventually, the latest edit (in a series of fast-paced edits) would override all others, because it either is an edit of a dependent file, thus it takes the latest changes of its dependencies into account, or it is an edit of a dependency, upon which diagnostics are recomputed for all dependent files.
The only potential problem I see is that the DependencyMap may not be up-to-date, but let's fix this problem if it really turns out to be one.

As for the reference to TextDocumentService: This allows to fetch the most current version of a document from the Document store. Are you okay with leaving this reference in?

TODO: I just noticed that the VFS will also provide the most current content for the Document it is attached to, instead of the Snapshot's version. I will fix that tomorrow.

I believe I have addressed all your concerns, please let me know if you're happy to go ahead with my approach.

@jaraonthe-dot-net
Copy link
Contributor Author

Oh, one more thing: I've added the first test class to the vadl-lsp module, but it seems to me that the CI didn't run these tests - how to configure that?

@flofriday
Copy link
Contributor

Ok the performance sounds good, let's move forward with this approach, but let's still try to get complexity down, I think this will pay of massively in the long run.

One more think, I don't see how the whole snapshot system makes any sense at the moment. Because the contents of the snapshots aren't used at all for compiling, instead the newest version will be used.

I can see why the raceconditions might not be a good enough reason to implement the whole-world-snapshot-VFS™ but I think this is mostly about improvement in complexity. Creating it should be pretty cheap because snapshots are cached and even though many files are open, only a few (most likely one) will have changed.
With that reasoning about the state should be easier as every check/diagnostic redering or general interaction with the compiler always runs on a single immutable world that isn't shared with potential other threads of the LSP. Also for testing and debugging this should make it much more clear on which data the LSP operates on.

If you can please try this approach and let's see if my assumption is correct that this also improves the readability of the program.

@jaraonthe-dot-net jaraonthe-dot-net force-pushed the feature/lsp-virtual-file-system branch from 4b94007 to 4922a60 Compare March 13, 2026 16:53
@jaraonthe-dot-net
Copy link
Contributor Author

One more think, I don't see how the whole snapshot system makes any sense at the moment. Because the contents of the snapshots aren't used at all for compiling, instead the newest version will be used.

Yes, as I've mentioned above, that was a mistake on my part. Please see my Fixup2 commit. (Also note that Snapshots are used in semanticTokensFull().)


I can see why the raceconditions might not be a good enough reason to implement the whole-world-snapshot-VFS™ but I think this is mostly about improvement in complexity. Creating it should be pretty cheap because snapshots are cached and even though many files are open, only a few (most likely one) will have changed. With that reasoning about the state should be easier as every check/diagnostic redering or general interaction with the compiler always runs on a single immutable world that isn't shared with potential other threads of the LSP. Also for testing and debugging this should make it much more clear on which data the LSP operates on.

I don't see any advantages at the moment. Yes, the language server is highly parallelized, but in practice everything is driven by user input - i.e. files are edited one after another, at human speed, not at the same time. I don't think there's gonna be a problem with multiple threads interfering with / deadlocking each other.

If you can please try this approach and let's see if my assumption is correct that this also improves the readability of the program.

I'm not sure how there would be readability improvements. The difference to the currently implemented Snapshot is that several files would be snapshot instead - but in any case the VadlTextDocumentService has to work with some sort of snapshot, so there's no difference there. All the complexity regarding other files is neatly tucked away in the VFS now.

The reason for introducing per-Document Snapshots is so that version, text, and textLines are in sync (for things like checking if version is outdated or transforming positions to UTF16 positions for LSP) - this is actually already a simplification from the old implementation. So far I see no reason to include other files in that snapshot, and so I didn't.

I propose to postpone multi-file snapshots for now, and only implement that if we actually need it.

I would very much like to move forward with merging this.

@jaraonthe-dot-net
Copy link
Contributor Author

@flofriday and I had a talk about this PR. We agreed to:

  • No longer use mutable Document, instead use immutable snapshots, only. Rationale: With a very high likelihood a document snapshot is created for every document version anyway.
  • Don't work with the latest document version in the LspVirtualFileSystem, but one fixed version for each document. Rationale: We cannot be sure that the VADL Parser doesn't open and read the same file more than once, so the current approach could lead to inconsistencies.

Additionally, both points make it easier to reason about state in this multi-threaded environment - immutable data is much more straight-forward than mutable data.

@jaraonthe-dot-net jaraonthe-dot-net force-pushed the feature/lsp-virtual-file-system branch from 50b1ff8 to 887d0e4 Compare March 16, 2026 16:20
@jaraonthe-dot-net
Copy link
Contributor Author

@flofriday You can finish your review now.

Copy link
Contributor

@flofriday flofriday left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is way better now, just minor, non-blocking things.

jaraonthe-dot-net and others added 7 commits March 16, 2026 22:21
The language server provides its own VirtualFileSystem in order to intercept reads of files that are currently managed by the LSP client.

Additionally, parsing dependencies between files are tracked in order to publish diagnostics for dependent files too.

Updated LICENSE.md with lsp4j information.
- VadlTextDocumentService:
  - Use Frontend.compileToAst()
  - Report if there are errors in imported files
  - Publish diagnostics for dependent files in parallel
- Add DependencyMap Test
- VadlTextDocumentService: Do NOT use Frontend.compileToAst()
  - We want to use the snapshot's text for the parsed file instead of whatever the current version is.
- Immutable Document (i.e. always snapshot)
- Use snapshots of files in VFS instead of most recent version
- Document:
  - Do not store text (it's used very rarely)
  - Do not create new ArrayList if fixed-size list is fine
  - throw IllegalStateException instead of RuntimeException
@jaraonthe-dot-net jaraonthe-dot-net force-pushed the feature/lsp-virtual-file-system branch from 4932e60 to 522e626 Compare March 16, 2026 21:21
@jaraonthe-dot-net jaraonthe-dot-net enabled auto-merge (squash) March 16, 2026 22:27
@jaraonthe-dot-net
Copy link
Contributor Author

@flofriday Please approve again, it's just miner changes.

@jaraonthe-dot-net jaraonthe-dot-net merged commit 208e4b4 into master Mar 16, 2026
7 checks passed
@jaraonthe-dot-net jaraonthe-dot-net deleted the feature/lsp-virtual-file-system branch March 16, 2026 23:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request lsp Language Server Protocol related

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants