Skip to content

Commit 7ae652e

Browse files
authored
Revamp the way tokens and comments are built into pieces. (#1336)
Revamp the way tokens and comments are built into pieces. I recently ran into bugs where a line comment after some AST node will cause the node to split incorrectly. A simple example is: ``` var x = 1 + 2; // comment ``` Currently, the formatter splits that to: ``` var x = 1 + 2; // comment ``` It does that because the Piece tree it creates doesn't line up with the AST node boundaries. In particular, the current design appends tokens and comments to a preceding piece. So in this example, the piece tree looks like: ``` Var( `var` Assign( `x =` Infix( `1 +` `2; // comment` ) ) ) ``` Note how the `;` and line comment are attached as part of the RHS of the `+`. That's why the formatter thinks the line comment's newline is inside the + expression and forces it to split. We could fix this specific bug by making ExpressionStatement treat the `;` as a separate piece, but I suspect that we will be playing whack-a-mole if we keep the current design. Instead, this unfortunately giant PR revamps the piece API. It has a couple of intermingled changes: ### Split pieces at all AST boundaries Whenever a `visit___()` returns, an implicit split is inserted so that no single `TextPiece` contains tokens from a parent and child AST node. This directly fixes the above bug and all similar bugs in that category. Note that while we split the tokens into separate pieces, that doesn't mean they may split in the "line splitting" sense. The TextPieces go into AdjacentPiece objects that don't insert actual splits between the pieces. This means this change shouldn't significantly impact the performance of line splitting. It's just about ensuring that the nesting structure of the piece tree mirrors the nesting structure of the AST. That way, when a newline in a child piece node invalidates an outer piece, that invalidation respecst the original syntax. ### Revamp the API for creating pieces The previous API had a DSL-like "push" API where the pieces created by PieceWriter were stored internally and exposed by a fairly confusing `give()`/`take()`/`split()` API. That was necessary because any given `visit___()` method might not be *able* to return a Piece for its node if that node just concatenated its tokens into some surrounding piece. With the previous change where every AST node corresponds to a piece, we have that option. So this PR also makes that change. Every `visit___()` method is now required to return a piece. Likewise, all of the `create___()` methods in PieceFactory return the pieces they create. This avoids the need for a weird `take()` API. ### Add an AdjacentBuilder and buildPiece() API Getting rid of the implicit storage and dataflow for pieces is good for being able to easily reason about how the piece tree gets created out of child pieces. But it can come at the cost of making code that creates pieces very verbose with lots of local variables and `List<Piece>` objects to store the intermediate pieces being built. To make that nicer, I wrote an AdjacentBuilder class with an imperative API for building an AdjacentPiece out of a series of tokens, nodes, and spaces. This API closely mirrors the original DSL-like API. Except now you know exactly what object the nodes and tokens are pushing their pieces into. To make that even nicer, I added a `buildPiece()` method that takes a callback, invokes it with a new AdjacentBuilder, and return the built result. This gets most code for building pieces fairly close to the original push-based API but with hopefully clearer more explicit dataflow. I'm really sorry for the giant size of this PR. If you want, I can try to break it into a series of smaller commits (but likely still one PR), but doing so is pretty challenging given how intertwined these changes are. It's hard to change the return type of the visit methods without also getting rid of the implicit dataflow and at that point, almost all the changes are there. Also, I added more tests to cover the cases around comments that were broken.
1 parent 214f289 commit 7ae652e

22 files changed

+1609
-1318
lines changed

lib/src/back_end/code_writer.dart

Lines changed: 33 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
11
// Copyright (c) 2023, the Dart project authors. Please see the AUTHORS file
22
// for details. All rights reserved. Use of this source code is governed by a
33
// BSD-style license that can be found in the LICENSE file.
4+
import 'dart:math';
5+
46
import '../piece/piece.dart';
57
import 'solution.dart';
68

@@ -34,11 +36,11 @@ class CodeWriter {
3436
/// it as pending. This ensures that we don't write trailing whitespace,
3537
/// avoids writing spaces at the beginning of lines, and allows collapsing
3638
/// multiple redundant newlines.
37-
_Whitespace _pendingWhitespace = _Whitespace.none;
39+
Whitespace _pendingWhitespace = Whitespace.none;
3840

3941
/// The number of spaces of indentation that should be begin the next line
40-
/// when [_pendingWhitespace] is [_Whitespace.newline] or
41-
/// [_Whitespace.blankLine].
42+
/// when [_pendingWhitespace] is [Whitespace.newline] or
43+
/// [Whitespace.blankLine].
4244
int _pendingIndent = 0;
4345

4446
/// The cost of the currently chosen line splits.
@@ -194,10 +196,7 @@ class CodeWriter {
194196

195197
/// Writes a single space to the output.
196198
void space() {
197-
// If a newline is already pending, then ignore the space.
198-
if (_pendingWhitespace == _Whitespace.none) {
199-
_pendingWhitespace = _Whitespace.space;
200-
}
199+
whitespace(Whitespace.space);
201200
}
202201

203202
/// Inserts a line split in the output.
@@ -209,16 +208,16 @@ class CodeWriter {
209208
void newline({bool blank = false, int? indent}) {
210209
if (indent != null) setIndent(indent);
211210

212-
handleNewline();
211+
whitespace(blank ? Whitespace.blankLine : Whitespace.newline);
212+
}
213213

214-
// Collapse redundant newlines.
215-
if (blank) {
216-
_pendingWhitespace = _Whitespace.blankLine;
217-
} else if (_pendingWhitespace != _Whitespace.blankLine) {
218-
_pendingWhitespace = _Whitespace.newline;
214+
void whitespace(Whitespace whitespace) {
215+
if (whitespace case Whitespace.newline || Whitespace.blankLine) {
216+
handleNewline();
217+
_pendingIndent = _options.indent;
219218
}
220219

221-
_pendingIndent = _options.indent;
220+
_pendingWhitespace = _pendingWhitespace.collapse(whitespace);
222221
}
223222

224223
/// Sets whether newlines are allowed to occur from this point on for the
@@ -286,24 +285,24 @@ class CodeWriter {
286285
/// count of the written text, including whitespace.
287286
void _flushWhitespace() {
288287
switch (_pendingWhitespace) {
289-
case _Whitespace.none:
288+
case Whitespace.none:
290289
break; // Nothing to do.
291290

292-
case _Whitespace.newline:
293-
case _Whitespace.blankLine:
291+
case Whitespace.newline:
292+
case Whitespace.blankLine:
294293
_finishLine();
295294
_buffer.writeln();
296-
if (_pendingWhitespace == _Whitespace.blankLine) _buffer.writeln();
295+
if (_pendingWhitespace == Whitespace.blankLine) _buffer.writeln();
297296

298297
_column = _pendingIndent;
299298
_buffer.write(' ' * _column);
300299

301-
case _Whitespace.space:
300+
case Whitespace.space:
302301
_buffer.write(' ');
303302
_column++;
304303
}
305304

306-
_pendingWhitespace = _Whitespace.none;
305+
_pendingWhitespace = Whitespace.none;
307306
}
308307

309308
void _finishLine() {
@@ -328,18 +327,28 @@ class CodeWriter {
328327
}
329328

330329
/// Different kinds of pending whitespace that have been requested.
331-
enum _Whitespace {
330+
///
331+
/// Note that the order of values in the enum is significant: later ones have
332+
/// more whitespace than previous ones.
333+
enum Whitespace {
332334
/// No pending whitespace.
333335
none,
334336

337+
/// A single space.
338+
space,
339+
335340
/// A single newline.
336341
newline,
337342

338343
/// Two newlines.
339-
blankLine,
344+
blankLine;
340345

341-
/// A single space.
342-
space
346+
/// Combines two pending whitespaces and returns the result.
347+
///
348+
/// When two whitespaces overlap, they aren't both written: we don't want
349+
/// two spaces or a newline followed by a space. Instead, the two whitespaces
350+
/// are collapsed such that the largest one wins.
351+
Whitespace collapse(Whitespace other) => values[max(index, other.index)];
343352
}
344353

345354
/// The mutable state local to a single piece being formatted.
Lines changed: 109 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,109 @@
1+
// Copyright (c) 2023, the Dart project authors. Please see the AUTHORS file
2+
// for details. All rights reserved. Use of this source code is governed by a
3+
// BSD-style license that can be found in the LICENSE file.
4+
import 'package:analyzer/dart/ast/ast.dart';
5+
import 'package:analyzer/dart/ast/token.dart';
6+
7+
import '../piece/adjacent.dart';
8+
import '../piece/piece.dart';
9+
import 'piece_factory.dart';
10+
11+
/// Incrementally builds an [AdjacentPiece].
12+
class AdjacentBuilder {
13+
final PieceFactory _visitor;
14+
15+
/// The series of adjacent pieces.
16+
final List<Piece> _pieces = [];
17+
18+
AdjacentBuilder(this._visitor);
19+
20+
/// Yields a new piece containing all of the pieces added to or created by
21+
/// this builder. The caller must ensure it doesn't build an empty piece.
22+
///
23+
/// Also clears the builder's list of pieces so that this builder can be
24+
/// reused to build more pieces.
25+
Piece build() {
26+
assert(_pieces.isNotEmpty);
27+
28+
var result = _flattenPieces();
29+
_pieces.clear();
30+
31+
return result;
32+
}
33+
34+
/// Adds [piece] to this builder.
35+
void add(Piece piece) {
36+
_pieces.add(piece);
37+
}
38+
39+
/// Emit [token], along with any comments and formatted whitespace that comes
40+
/// before it.
41+
///
42+
/// If [lexeme] is given, uses that for the token's lexeme instead of its own.
43+
///
44+
/// Does nothing if [token] is `null`. If [spaceBefore] is `true`, writes a
45+
/// space before the token, likewise with [spaceAfter].
46+
void token(Token? token,
47+
{bool spaceBefore = false, bool spaceAfter = false, String? lexeme}) {
48+
if (token == null) return;
49+
50+
if (spaceBefore) space();
51+
add(_visitor.pieces.tokenPiece(token, lexeme: lexeme));
52+
if (spaceAfter) space();
53+
}
54+
55+
/// Writes any comments that appear before [token], which will be discarded.
56+
///
57+
/// Used to ensure comments before a discarded token are preserved.
58+
void commentsBefore(Token? token) {
59+
if (token == null) return;
60+
61+
var piece = _visitor.pieces.writeCommentsBefore(token);
62+
if (piece != null) add(piece);
63+
}
64+
65+
/// Writes an optional modifier that precedes other code.
66+
void modifier(Token? keyword) {
67+
token(keyword, spaceAfter: true);
68+
}
69+
70+
/// Visits [node] if not `null` and adds the resulting [Piece] to this
71+
/// builder.
72+
void visit(AstNode? node,
73+
{bool spaceBefore = false,
74+
bool commaAfter = false,
75+
bool spaceAfter = false}) {
76+
if (node == null) return;
77+
78+
if (spaceBefore) space();
79+
add(_visitor.nodePiece(node, commaAfter: commaAfter));
80+
if (spaceAfter) space();
81+
}
82+
83+
/// Appends a space before the previous piece and the next one.
84+
void space() {
85+
_pieces.add(SpacePiece());
86+
}
87+
88+
/// Removes redundant [AdjacentPiece] wrappers from [_pieces].
89+
Piece _flattenPieces() {
90+
List<Piece> flattened = [];
91+
92+
void traverse(List<Piece> pieces) {
93+
for (var piece in pieces) {
94+
if (piece is AdjacentPiece) {
95+
traverse(piece.pieces);
96+
} else {
97+
flattened.add(piece);
98+
}
99+
}
100+
}
101+
102+
traverse(_pieces);
103+
104+
// If there's only one piece, don't wrap it in a pointless AdjacentPiece.
105+
if (flattened.length == 1) return flattened[0];
106+
107+
return AdjacentPiece(flattened);
108+
}
109+
}

0 commit comments

Comments
 (0)