Skip to content

Commit 838b4a3

Browse files
committed
Process triple quoted string trivia during parsing
Triple quoted strings are de-indented with fairly complicated rules * Based on string content * The position of interpolations across multiple tokens Because indentation isn't part of the string data it should also ideally be excluded from the string content within the green tree. That is, it should be treated as separate whitespace trivia tokens. With this separation things like formatting should be much easier. The same reasoning goes for escaping newlines and following whitespace with backslashes in normal strings. The downside of detecting string trivia during parsing is that string content is split over several tokens. Here we wrap these in the K"string" kind (as is already used for interpolations). The individual chunks can then be reassembled during Expr construction. A possible alternative might be to reuse the K"String" and K"CmdString" kinds for groups of string chunks (without interpolation).
1 parent f683212 commit 838b4a3

File tree

7 files changed

+333
-343
lines changed

7 files changed

+333
-343
lines changed

README.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -458,6 +458,18 @@ use a flattened structure:
458458
to be the module name after the `.` is parsed. But `$` can never be a valid
459459
module name in normal Julia code so this makes no sense.
460460

461+
* Triple quoted `var"""##"""` identifiers are allowed. But it's not clear these
462+
are required or desired given that they come with the complex triple-quoted
463+
string deindentation rules.
464+
465+
* Deindentation of triple quoted strings with mismatched whitespace is weird
466+
when there's nothing but whitespace. For example, we have
467+
`"\"\"\"\n \n \n \"\"\"" ==> "\n \n"` so the middle line of whitespace
468+
here isn't dedented but the other two longer lines are?? Here it seems more
469+
consistent that either (a) the middle line should be deindented completely,
470+
or (b) all lines should be dedented only one character, as that's the
471+
matching prefix.
472+
461473
# Comparisons to other packages
462474

463475
### Official Julia compiler

src/parse_stream.jl

Lines changed: 31 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -130,8 +130,8 @@ end
130130
head(range::TaggedRange) = range.head
131131
kind(range::TaggedRange) = kind(range.head)
132132
flags(range::TaggedRange) = flags(range.head)
133-
first_byte(range::TaggedRange) = range.first_byte
134-
last_byte(range::TaggedRange) = range.last_byte
133+
first_byte(range::TaggedRange) = Int(range.first_byte)
134+
last_byte(range::TaggedRange) = Int(range.last_byte)
135135
span(range::TaggedRange) = 1 + last_byte(range) - first_byte(range)
136136

137137
#-------------------------------------------------------------------------------
@@ -492,16 +492,42 @@ the kind or flags of a token in a way which would require unbounded lookahead
492492
in a recursive descent parser. Modifying the output with reset_node! is useful
493493
in those cases.
494494
"""
495-
function reset_node!(stream::ParseStream, mark::ParseStreamPosition;
495+
function reset_node!(stream::ParseStream, pos::ParseStreamPosition;
496496
kind=nothing, flags=nothing)
497-
range = stream.ranges[mark.output_index]
497+
range = stream.ranges[pos.output_index]
498498
k = isnothing(kind) ? (@__MODULE__).kind(range) : kind
499499
f = isnothing(flags) ? (@__MODULE__).flags(range) : flags
500-
stream.ranges[mark.output_index] =
500+
stream.ranges[pos.output_index] =
501501
TaggedRange(SyntaxHead(k, f), range.orig_kind,
502502
first_byte(range), last_byte(range), range.start_mark)
503503
end
504504

505+
"""
506+
Move `numbytes` from the range at output position `pos+1` to the output
507+
position `pos`. If the donor range becomes empty, mark it dead with
508+
K"TOMBSTONE" and return `true`, otherwise return `false`.
509+
510+
Hack alert! This is used only for managing the complicated rules related to
511+
dedenting triple quoted strings.
512+
"""
513+
function steal_node_bytes!(stream::ParseStream, pos::ParseStreamPosition, numbytes)
514+
i = pos.output_index
515+
r1 = stream.ranges[i]
516+
r2 = stream.ranges[i+1]
517+
@assert span(r1) == 0
518+
@assert numbytes <= span(r2)
519+
fb2 = r2.first_byte + numbytes
520+
rhs_empty = fb2 > last_byte(r2)
521+
head2 = rhs_empty ? SyntaxHead(K"TOMBSTONE", EMPTY_FLAGS) : r2.head
522+
stream.ranges[i] = TaggedRange(r1.head, r1.orig_kind,
523+
r2.first_byte, fb2 - 1,
524+
r1.start_mark)
525+
stream.ranges[i+1] = TaggedRange(head2, r2.orig_kind,
526+
fb2, r2.last_byte,
527+
r2.start_mark)
528+
return rhs_empty
529+
end
530+
505531
function Base.position(stream::ParseStream)
506532
ParseStreamPosition(stream.next_byte, lastindex(stream.ranges))
507533
end

0 commit comments

Comments
 (0)