Skip to content

Commit 94f81dd

Browse files
authored
Format multi-line strings and string interpolation. (#1362)
Format multi-line strings and string interpolation. In the old style, the formatter has some special code to discard line splits that occur inside string interpolation expressions. That's largely for historical reasons because the formatter initially didn't support formatting of string interpolation expressions *at all* and I didn't want too much churn when adding support for formatting them. In the new style here, we don't do that: The contents of a string interpolation expression are split like any other expression. In practice, it doesn't matter much since users generally reorganize their code to avoid long strings and splits in string interpolation. This way leads to less special case code in the formatter. This change is somewhat large because I also reorganized how newlines inside lexemes are handled in general. Previously, TextPiece stored a list of "lines" to handle things like line comments preceding or following a token. But it was also possible for a single "line" string in that list to internally contain newline characters because of multi-line strings or block comments. But those internal newlines also need to force surrounding code to split, so there was this "_containsNewline" bit that had to be plumbed through and tracked. Even so, there were latent bugs where the column calculation in CodeWriter would be incorrect if a line contained internal newlines because it just used to the length of the entire "line" string. With this change, the "_lines" list in TextPiece really is a list of lines. We eagerly split any incoming lexeme into multiple lines before writing it to the TextPiece. I think the resulting code is simpler, it fixes the column calculation in CodeWriter, and it means the formatter will correctly normalize line endings even when they occur inside block comments or multiline strings. This was a good time to test the line ending code, so I copied those existing tests over from short_format_test.dart. I went ahead and copied all of the unit tests from that file, even the ones not related to line endings, since they're all working and passing now. This PR does *not* handle adjacent strings. Those have a decent amount of special handling not related to what's going on here, so I'll do those separately.
1 parent b1b0481 commit 94f81dd

File tree

12 files changed

+606
-73
lines changed

12 files changed

+606
-73
lines changed

lib/src/ast_extensions.dart

Lines changed: 10 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -156,14 +156,6 @@ extension ExpressionExtensions on Expression {
156156
expression = expression.expression;
157157
}
158158

159-
// TODO(tall): We should also allow multi-line strings to be formatted
160-
// like block arguments, at least in some cases like:
161-
//
162-
// function('''
163-
// Lots of
164-
// text
165-
// ''');
166-
167159
// TODO(tall): Consider whether immediately-invoked function expressions
168160
// should be block argument candidates, like:
169161
//
@@ -177,16 +169,26 @@ extension ExpressionExtensions on Expression {
177169
parameters.parameters.canSplit(parameters.rightParenthesis) ||
178170
(body is BlockFunctionBody &&
179171
body.block.statements.canSplit(body.block.rightBracket)),
172+
173+
// Non-empty collection literals can block split.
180174
ListLiteral(:var elements, :var rightBracket) ||
181175
SetOrMapLiteral(:var elements, :var rightBracket) =>
182176
elements.canSplit(rightBracket),
183177
RecordLiteral(:var fields, :var rightParenthesis) =>
184178
fields.canSplit(rightParenthesis),
185179
SwitchExpression(:var cases, :var rightBracket) =>
186180
cases.canSplit(rightBracket),
181+
182+
// Function calls can block split if their argument lists can.
187183
InstanceCreationExpression(:var argumentList) ||
188184
MethodInvocation(:var argumentList) =>
189185
argumentList.arguments.canSplit(argumentList.rightParenthesis),
186+
187+
// Multi-line strings can.
188+
StringInterpolation(isMultiline: true) => true,
189+
SimpleStringLiteral(isMultiline: true) => true,
190+
191+
// Parenthesized expressions can if the inner one can.
190192
ParenthesizedExpression(:var expression) => expression.canBlockSplit,
191193
_ => false,
192194
};

lib/src/back_end/code_writer.dart

Lines changed: 32 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -128,22 +128,6 @@ class CodeWriter {
128128
isValid: !_hasInvalidNewline, overflow: _overflow, cost: _cost);
129129
}
130130

131-
/// Notes that a newline has been written.
132-
///
133-
/// If this occurs in a place where newlines are prohibited, then invalidates
134-
/// the solution.
135-
///
136-
/// This is called externally by [TextPiece] to let the writer know some of
137-
/// the raw text contains a newline, which can happen in multi-line block
138-
/// comments and multi-line string literals.
139-
void handleNewline() {
140-
if (!_options.allowNewlines) _hasInvalidNewline = true;
141-
142-
// Note that this piece contains a newline so that we can propagate that
143-
// up to containing pieces too.
144-
_options.hasNewline = true;
145-
}
146-
147131
/// Appends [text] to the output.
148132
///
149133
/// If [text] contains any internal newlines, the caller is responsible for
@@ -206,16 +190,30 @@ class CodeWriter {
206190
///
207191
/// If [indent] is given, set the indentation of the new line (and all
208192
/// subsequent lines) to that indentation relative to the containing piece.
209-
void newline({bool blank = false, int? indent}) {
193+
///
194+
/// If [flushLeft] is `true`, then the new line begins at column 1 and ignores
195+
/// any surrounding indentation. This is used for multi-line block comments
196+
/// and multi-line strings.
197+
void newline({bool blank = false, int? indent, bool flushLeft = false}) {
210198
if (indent != null) setIndent(indent);
211199

212-
whitespace(blank ? Whitespace.blankLine : Whitespace.newline);
200+
whitespace(blank ? Whitespace.blankLine : Whitespace.newline,
201+
flushLeft: flushLeft);
213202
}
214203

215-
void whitespace(Whitespace whitespace) {
204+
/// Queues [whitespace] to be written to the output.
205+
///
206+
/// If any non-whitespace is written after this call, then this whitespace
207+
/// will be written first. Also handles merging multiple kinds of whitespace
208+
/// intelligently together.
209+
///
210+
/// If [flushLeft] is `true`, then the new line begins at column 1 and ignores
211+
/// any surrounding indentation. This is used for multi-line block comments
212+
/// and multi-line strings.
213+
void whitespace(Whitespace whitespace, {bool flushLeft = false}) {
216214
if (whitespace case Whitespace.newline || Whitespace.blankLine) {
217-
handleNewline();
218-
_pendingIndent = _options.indent;
215+
_handleNewline();
216+
_pendingIndent = flushLeft ? 0 : _options.indent;
219217
}
220218

221219
_pendingWhitespace = _pendingWhitespace.collapse(whitespace);
@@ -248,9 +246,7 @@ class CodeWriter {
248246
var childOptions = _pieceOptions.removeLast();
249247

250248
// If the child [piece] contains a newline then this one transitively does.
251-
// TODO(tall): At some point, we may want to provide an API so that pieces
252-
// can block this from propagating outward.
253-
if (childOptions.hasNewline) handleNewline();
249+
if (childOptions.hasNewline) _handleNewline();
254250
}
255251

256252
/// Format [piece] if not null.
@@ -274,6 +270,18 @@ class CodeWriter {
274270
_selectionEnd = _buffer.length + end;
275271
}
276272

273+
/// Notes that a newline has been written.
274+
///
275+
/// If this occurs in a place where newlines are prohibited, then invalidates
276+
/// the solution.
277+
void _handleNewline() {
278+
if (!_options.allowNewlines) _hasInvalidNewline = true;
279+
280+
// Note that this piece contains a newline so that we can propagate that
281+
// up to containing pieces too.
282+
_options.hasNewline = true;
283+
}
284+
277285
/// Write any pending whitespace.
278286
///
279287
/// This is called before non-whitespace text is about to be written, or

lib/src/front_end/ast_node_visitor.dart

Lines changed: 14 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1079,12 +1079,17 @@ class AstNodeVisitor extends ThrowingAstVisitor<Piece> with PieceFactory {
10791079

10801080
@override
10811081
Piece visitInterpolationExpression(InterpolationExpression node) {
1082-
throw UnimplementedError();
1082+
return buildPiece((b) {
1083+
b.token(node.leftBracket);
1084+
b.visit(node.expression);
1085+
b.token(node.rightBracket);
1086+
});
10831087
}
10841088

10851089
@override
10861090
Piece visitInterpolationString(InterpolationString node) {
1087-
throw UnimplementedError();
1091+
return pieces.stringLiteralPiece(node.contents,
1092+
isMultiline: (node.parent as StringInterpolation).isMultiline);
10881093
}
10891094

10901095
@override
@@ -1530,7 +1535,8 @@ class AstNodeVisitor extends ThrowingAstVisitor<Piece> with PieceFactory {
15301535

15311536
@override
15321537
Piece visitSimpleStringLiteral(SimpleStringLiteral node) {
1533-
return tokenPiece(node.literal);
1538+
return pieces.stringLiteralPiece(node.literal,
1539+
isMultiline: node.isMultiline);
15341540
}
15351541

15361542
@override
@@ -1543,7 +1549,11 @@ class AstNodeVisitor extends ThrowingAstVisitor<Piece> with PieceFactory {
15431549

15441550
@override
15451551
Piece visitStringInterpolation(StringInterpolation node) {
1546-
throw UnimplementedError();
1552+
return buildPiece((b) {
1553+
for (var element in node.elements) {
1554+
b.visit(element);
1555+
}
1556+
});
15471557
}
15481558

15491559
@override

lib/src/front_end/piece_writer.dart

Lines changed: 44 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,9 @@ import '../piece/piece.dart';
1111
import '../source_code.dart';
1212
import 'comment_writer.dart';
1313

14+
/// RegExp that matches any valid Dart line terminator.
15+
final _lineTerminatorPattern = RegExp(r'\r\n?|\n');
16+
1417
/// Builds [TextPiece]s for [Token]s and comments.
1518
///
1619
/// Handles updating selection markers and attaching comments to the tokens
@@ -69,6 +72,22 @@ class PieceWriter {
6972
return tokenPiece;
7073
}
7174

75+
/// Creates a piece for a simple or interpolated string [literal].
76+
///
77+
/// Handles splitting it into multiple lines in the resulting [TextPiece] if
78+
/// [isMultiline] is `true`.
79+
Piece stringLiteralPiece(Token literal, {required bool isMultiline}) {
80+
if (!isMultiline) return tokenPiece(literal);
81+
82+
if (!_writeCommentsBefore(literal)) {
83+
// We want this token to be in its own TextPiece, so if the comments
84+
// didn't already lead to ending the previous TextPiece than do so now.
85+
_currentText = TextPiece();
86+
}
87+
88+
return _writeMultiLine(literal.lexeme, offset: literal.offset);
89+
}
90+
7291
// TODO(tall): Much of the comment handling code in CommentWriter got moved
7392
// into here, so there isn't great separation of concerns anymore. Can we
7493
// organize this code better? Or just combine CommentWriter with this class
@@ -95,9 +114,7 @@ class PieceWriter {
95114
Piece writeComment(SourceComment comment) {
96115
_currentText = TextPiece();
97116

98-
_write(comment.text,
99-
offset: comment.offset, containsNewline: comment.text.contains('\n'));
100-
return _currentText;
117+
return _writeMultiLine(comment.text, offset: comment.offset);
101118
}
102119

103120
/// Writes all of the comments that appear between [token] and the previous
@@ -146,8 +163,7 @@ class PieceWriter {
146163
_currentText.newline();
147164
}
148165

149-
_write(comment.text,
150-
offset: comment.offset, containsNewline: comment.text.contains('\n'));
166+
_write(comment.text, offset: comment.offset);
151167
}
152168

153169
// Output a trailing newline after the last comment if it needs one.
@@ -180,14 +196,34 @@ class PieceWriter {
180196
_currentText = TextPiece();
181197
}
182198

183-
_write(lexeme ?? token.lexeme, offset: token.offset);
199+
lexeme ??= token.lexeme;
200+
201+
_write(lexeme, offset: token.offset);
202+
}
203+
204+
/// Writes multi-line [text] to the current [TextPiece].
205+
///
206+
/// Handles breaking [text] into lines and adding them to the [TextPiece].
207+
///
208+
/// The [offset] parameter is the offset in the original source code of the
209+
/// beginning of multi-line lexeme.
210+
Piece _writeMultiLine(String text, {required int offset}) {
211+
var lines = text.split(_lineTerminatorPattern);
212+
var currentOffset = offset;
213+
for (var i = 0; i < lines.length; i++) {
214+
if (i > 0) _currentText.newline(flushLeft: true);
215+
_write(lines[i], offset: currentOffset);
216+
currentOffset += lines[i].length;
217+
}
218+
219+
return _currentText;
184220
}
185221

186222
/// Writes [text] to the current [TextPiece].
187223
///
188224
/// If [offset] is given and it contains any selection markers, then attaches
189225
/// those markers to the [TextPiece].
190-
void _write(String text, {bool containsNewline = false, int? offset}) {
226+
void _write(String text, {int? offset}) {
191227
if (offset != null) {
192228
// If this text contains any of the selection endpoints, note their
193229
// relative locations in the text piece.
@@ -200,7 +236,7 @@ class PieceWriter {
200236
}
201237
}
202238

203-
_currentText.append(text, containsNewline: containsNewline);
239+
_currentText.append(text);
204240
}
205241

206242
/// Finishes writing and returns a [SourceCode] containing the final output

0 commit comments

Comments
 (0)