Skip to content

Conversation

@toinehartman
Copy link
Member

@toinehartman toinehartman commented Jul 11, 2025

Implementing the formatting and rangeFormatting LSP APIs. Both use the same contribution, where for (whole-file) formatting _range == _input@\loc.

Design:

data LanguageService = formatting(list[TextEdit](Focus _focus, FormattingOptions _opts) formattingService);

data FormattingOptions(
        int tabSize = 4
      , bool insertSpaces = true
      , bool trimTrailingWhitespace = false
      , bool insertFinalNewline = false
      , bool trimFinalNewlines = false
) = formattingOptions();

Closes #130.

@toinehartman toinehartman force-pushed the feature/lsp-formatting branch from 79abb8f to f0930d1 Compare July 11, 2025 09:07
@toinehartman toinehartman requested a review from DavyLandman July 14, 2025 17:19
@sonarqubecloud
Copy link

Copy link
Member

@DavyLandman DavyLandman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see this is a bit of a challenging API, especially if we want to make this more user-friendly for DSL users.

I have the feeling we should involve @jurgenvinju who has been on the topic of formatting/pretty-printing for quite some time already. I suspect that we're missing some opportunities of matching this up with something already in the standard library, or something that we might should add to the standard library first.

@DavyLandman
Copy link
Member

This PR is missing the adjacent: rangeFormatting functionality.

@toinehartman toinehartman force-pushed the feature/lsp-formatting branch 3 times, most recently from 4b58c98 to 4aca661 Compare August 18, 2025 14:50
Copy link
Member

@DavyLandman DavyLandman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is going in the right direction, but I'm not sure what I should have been reviewing, the implementation or the design?

@@ -0,0 +1,157 @@
module util::Format
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should think about moving a version of this to stdlib, I don't want us to have custom vs code only std lib like modules.

So as you suggested, some might be nice to migrate there? I don't think all of them are.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before we merge this PR, we'll have to get rid of this module. So that would be a reason to wait with merging this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this code or something that does the same belongs in Box2Text. This has the same abstraction level, and also this is the same code for all DSLs. It should reside in a reusable generic component like Box2Text.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want quick reuse, we could move these functions to Box2Text and call them in the top-level format function. Later some of the features could be fused into the layout algorithm to avoid revisiting the string again and again.

@toinehartman toinehartman force-pushed the feature/lsp-formatting branch from 2b7fb47 to e05f700 Compare August 25, 2025 15:02
@toinehartman toinehartman changed the base branch from main to migrate-dap August 25, 2025 15:02
@toinehartman toinehartman force-pushed the feature/lsp-formatting branch 2 times, most recently from 9e7fe1a to 523c2fe Compare August 26, 2025 09:41
Base automatically changed from migrate-dap to main August 27, 2025 09:25
@toinehartman toinehartman force-pushed the feature/lsp-formatting branch from 523c2fe to ab448fc Compare August 27, 2025 12:06
@toinehartman toinehartman marked this pull request as ready for review August 29, 2025 12:35
@jurgenvinju
Copy link
Member

Is there an opportunity to use the Focus abstraction for this contribution? Especially for the range alternative.

That way language engineers don't have to go search for the right trees anymore, and they could easily support only a few top-level types for starters, and they could easily recover the required indentation level from the parent tree's layout siblings.

@toinehartman
Copy link
Member Author

toinehartman commented Aug 29, 2025

Is there an opportunity to use the Focus abstraction for this contribution? Especially for the range alternative.

@jurgenvinju I thought about this, but the tricky part there is that a range does not necessarily correspond to an exact tree. Even if we give a focus tree that ends at the largest tree encapsulating the range, there would still be the need to filter the edits so they are not outside of the given range.

@toinehartman toinehartman force-pushed the feature/lsp-formatting branch 2 times, most recently from 7a9d722 to 01d9853 Compare September 1, 2025 12:05
@toinehartman toinehartman marked this pull request as ready for review September 15, 2025 11:25
@sonarqubecloud
Copy link

</dependencies>
<configuration>
<systemPropertyVariables>
<forkMode>always</forkMode>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this change in the pom.xml has to do with this PR?

final ILanguageContributions contribs = contributions(uri);

// call the `formatting` implementation of the relevant language contribution
var fileState = getFile(uri);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: fileState can be inlined in the next line.

// just a range
var start = Locations.toRascalPosition(uri, range.getStart(), columns);
var end = Locations.toRascalPosition(uri, range.getEnd(), columns);
// compute the focus list at the end of the range
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this comment is not accurate?

Comment on lines +786 to +787
var optionsType = tf.abstractDataType(typeStore, "FormattingOptions");
var consType = tf.constructor(typeStore, optionsType, "formattingOptions");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can be stored in a field, and only looked-up at the start?

logger.trace("Common focus suffix length: {}", commonSuffix.length());
// The range spans multiple subtrees. The easy way out is not to focus farther down than
// their smallest common subtree (i.e. `commonSuffix`) - let's see if we can do any better.
if (TreeAdapter.isList((ITree) commonSuffix.get(0))) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, this is quite clean 👍

Comment on lines +224 to +241
final var selected = elements.stream()
.map(ITree.class::cast)
.dropWhile(t -> {
final var l = TreeAdapter.getLocation(t);
// only include layout if the element before it is selected as well
return TreeAdapter.isLayout(t)
? rightOfBegin(l, startLine, startColumn)
: rightOfEnd(l, startLine, startColumn);
})
.takeWhile(t -> {
final var l = TreeAdapter.getLocation(t);
// only include layout if the element after it is selected as well
return TreeAdapter.isLayout(t)
? rightOfEnd(l, endLine, endColumn)
: rightOfBegin(l, endLine, endColumn);
})
.collect(VF.listWriter());

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alternative:

var result = VF.listWriter();
boolean inside = false;
for (var e : elements) {
    var t = (ITree)e;
    var l = TreeAdapter.getLocation(t);
    var isLayout = TreeAdapter.isLayout(t);
    if (!inside) {
       inside = isLayout ? rightOfBegin(l, startLine, startColumn) :  rightOfEnd(l, startLine, startColumn)
    } else if (isLayout ? rightOfEnd(l, endLine, endColumn) : rightOfBegin(l, endLine, endColumn)) {
         break;
    }
    if (inside) {
        result.add(t); 
    }
}
var selected = result.done();

not sure if better, was just wondering if it got more compact by turning it into a loop.

Comment on lines -1 to +4
begin
declare
input : natural,
output : natural,
repnr : natural,
rep : natural;

begin
declare
input : natural, output : natural, repnr : natural, rep : natural
;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't this change break the UI test that have some line numbers from this example?

also, shouldn't we commit the unformatted version, and then run a test that triggers the formatter and see that indeed it worked?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an error/incompleteness in the Pico toBox function. It should pretty print like the original does. The reason is that the default heuristics of toBox pick HOV layout for comma-separated lists.

@@ -0,0 +1,157 @@
module util::Format
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before we merge this PR, we'll have to get rid of this module. So that would be a reason to wait with merging this PR.

* The optional `prepareRename` service argument discovers places in the editor where a ((util::LanguageServer::rename)) is possible. If renameing the location is not supported, it should throw an exception.
* The ((didRenameFiles)) service collects ((DocumentEdit))s corresponding to renamed files (e.g. to rename a class when the class file was renamed). The IDE applies the edits after moving the files. It might fail and report why in diagnostics.
* The ((selectionRange)) service discovers selections around a cursor, that a user might want to select. It expects the list of source locations to be in ascending order of size (each location should be contained by the next) - similar to ((Focus)) trees.
* The ((formatting)) service determines what edits to do to format (part of) a file. The `range` parameter determines what part of the file to format. For whole-file formatting, `_tree.top == range`. ((FormattingOptions)) influence how formatting treats whitespace.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is a problem with tree.top since that drops the comments at the start & end of the file. shouldn't tree == range? So including the start part?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no range parameter anymore since we have a Focus list now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is indeed outdated now

import org.rascalmpl.values.parsetrees.TreeAdapter;
import org.rascalmpl.vscode.lsp.util.RascalServices;

public class TreeSearchTests {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great that we have some actual tests for this piece of code 👍🏼

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

str original = "<input[-1]>";
box = toBox(input[-1]);
box = visit (box) { case i:I(_) => i[is=opts.tabSize] }
formatted = format(box);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this code would look better if the options were passed directly to format and it would implement them.

For me the code drops under a certain level of abstraction (under the syntax-directed level) with all the string-level operations like replaceLast, perLine etc, while format and box2text which indeed take care of these details normally.

list[TextEdit] picoFormattingService(Focus input, FormattingOptions opts) {
str original = "<input[-1]>";
box = toBox(input[-1]);
box = visit (box) { case i:I(_) => i[is=opts.tabSize] }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a good example for users on how to parameterize the indentation level. The reason is that not all I boxes will correspond to the default indentation level in a typical language. Some are different.

So indentation level should be a parameter to toBox where the spec writer can decide where to use it and where not.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's good to discover this now and here. A few minor additions to toBox should help. We have to think hard how to make passing the parameters simple and easy, because toBox is highly recursive and we might skip accidentally to pass it on..


// instead of computing all edits and filtering, we can be more efficient by only formatting certain trees.
loc range = input[0]@\loc;
filteredEdits = [e | e <- edits, isContainedIn(e.range, range)];
Copy link
Member

@jurgenvinju jurgenvinju Sep 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some troubles with this filtering, for the Pico example and in general. The idea would be that the Focus decides already which part to format and which part to leave alone. So here we take the range of the smallest focus while we format the entire file.. but that doesn't seem natural (also it could break parsing).

Filtering the edits on a much larger tree is expensive but it could also be wrong. Maybe you only wrote a formatter for statements and not for expressions, so you selected the smallest statement in the focus, then filtering only the edits in the range will possibly even break the code (make it not parseable due to connecting parts which used to be separated by whitespace or even worse).

So I think we should not demonstrate filtering of edits here. Instead we should only call layoutDiff on the focused element, and think about recovering indentation of a nested tree automatically with layoutDiff (treeDiff already has this feature so layoutDiff could have it too).


list[TextEdit] picoFormattingService(Focus input, FormattingOptions opts) {
str original = "<input[-1]>";
box = toBox(input[-1]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we formatting the whole file?

}
@synopsis{Determine the most-used newline character in a string.}
str mostUsedNewline(str input, list[str] lineseps = newLineCharacters, str(list[str]) tieBreaker = getFirstFrom) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we hide this under the hood as one of the formatting options?

* `trimFinalNewlines`; if `true`, and the file ends in one or more new lines, remove them.
Note: formatting with `insertFinalNewline && trimFinalNewlines` is expected to return a file that ends in a single newline.
}
data FormattingOptions(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add newlineCharacter?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

although we don't want people to write conditional code based on windows vs unix or something... should we hide this too under Box2Text functonality?

Copy link
Member

@jurgenvinju jurgenvinju left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should align a bit more with toBox and format (Box2Text) and map the LSP parameters to their implementations. This way we can hide the string manipulation details from the Pico formatter contrubuton.

@toinehartman toinehartman marked this pull request as draft September 17, 2025 07:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Expose LSP formatter (pretty-printer) api via LanguageService

6 participants