Skip to content

Commit 621fd5d

Browse files
authored
Revert to v0 behaviour for string coercion (#223)
As in v0, strings are once again coerced to token nodes (instead of string nodes). Effective fixes for coerced strings: - character escaping handled automatically - string "a\nb\nc" is no longer interpreted as a string appearing on multiple lines. Bring some clarity to rewrite-clj.node/string-node's usage and expectations via its docstring. Alternative to PRs #216 and #217. Fixes #214
1 parent b52d58e commit 621fd5d

File tree

7 files changed

+175
-37
lines changed

7 files changed

+175
-37
lines changed

CHANGELOG.adoc

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,22 @@ For a list of breaking changes see link:#v1-breaking[breaking changes].
1616

1717
=== Unreleased
1818

19+
* Clojure string to rewrite-clj node coercion fixes
20+
https://github.com/clj-commons/rewrite-clj/issues/214[#214]
21+
(@borkdude/@lread collab)
22+
** `"a\n\b\r\nc"` is now preserved as is.
23+
It was being coerced to:
24+
+
25+
[source,clojure]
26+
----
27+
"a
28+
b
29+
c"
30+
----
31+
** escaped characters now coerce correctly.
32+
Previously we were only handling `\"`.
33+
Strings like `"\\s+"` now handled and preserved.
34+
1935
=== v1.1.46
2036

2137
* added new `rewrite-clj.zip` functions `of-string*` and `of-file*`, these are versions of `of-string` and `of-file` that do no auto-navigation

src/rewrite_clj/node.cljc

Lines changed: 48 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -657,7 +657,23 @@
657657
(defn string-node
658658
"Create node representing a string value where `lines` can be a sequence of strings or a single string.
659659
660-
When `lines` is a sequence, the resulting node will `tag` will be `:multi-line`, otherwise `:token`.
660+
When `lines` is a sequence, the resulting node `tag` will be `:multi-line`, otherwise `:token`.
661+
662+
`:multi-line` refers to a single string in your source that appears over multiple lines, for example:
663+
664+
```Clojure
665+
(def s \"foo
666+
bar
667+
baz\")
668+
```
669+
670+
It does not apply to a string that appears on a single line that includes escaped newlines, for example:
671+
672+
```Clojure
673+
(def s \"foo\\nbar\\n\\baz\")
674+
```
675+
676+
Naive examples (see example on escaping below):
661677
662678
```Clojure
663679
(require '[rewrite-clj.node :as n])
@@ -669,7 +685,31 @@
669685
(-> (n/string-node [\"line1\" \"\" \"line3\"])
670686
n/string)
671687
;; => \"\\\"line1\\n\\nline3\\\"\"
672-
```"
688+
```
689+
690+
This function was originally written to serve the rewrite-clj parser.
691+
Escaping and wrapping expectations are non-obvious.
692+
- characters within strings are assumed to be escaped
693+
- but the string should not wrapped with `\\\"`
694+
695+
Here's an example of conforming to these expectations for a string that has escape sequences.
696+
(Best to view this on cljdoc, docstring string escaping is confusing).
697+
698+
```Clojure
699+
(require '[clojure.string :as string])
700+
701+
(defn pr-str-unwrapped [s]
702+
(apply str (-> s pr-str next butlast)))
703+
704+
(-> \"hey \\\" man\"
705+
pr-str-unwrapped
706+
n/string-node
707+
n/string)
708+
;; => \"\\\"hey \\\\\\\" man\\\"\"
709+
```
710+
711+
To construct strings appearing on a single line, consider [[token-node]].
712+
It will handle escaping for you."
673713
[lines] (rewrite-clj.node.stringz/string-node lines))
674714

675715
;; DO NOT EDIT FILE, automatically imported from: rewrite-clj.node.quote
@@ -768,7 +808,12 @@
768808
769809
(-> (n/token-node 42) n/string)
770810
;; => \"42\"
771-
```"
811+
812+
(-> (n/token-node \"astring\") n/string)
813+
;; => \"\\\"astring\\\"\"
814+
```
815+
816+
To construct strings appearing over multiple lines, see [[string-node]]."
772817
([value] (rewrite-clj.node.token/token-node value))
773818
([value string-value] (rewrite-clj.node.token/token-node value string-value)))
774819

src/rewrite_clj/node/coercer.cljc

Lines changed: 10 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,6 @@
88
if one interface is derived from the other, the more derived is used,
99
else which one is used is unspecified"
1010
(:require
11-
[clojure.string :as string]
1211
#?@(:clj
1312
[[rewrite-clj.node.comment]
1413
[rewrite-clj.node.fn]
@@ -22,12 +21,13 @@
2221
[rewrite-clj.node.reader-macro :refer [reader-macro-node var-node]]
2322
[rewrite-clj.node.regex :refer [regex-node pattern-string-for-regex]]
2423
[rewrite-clj.node.seq :refer [vector-node list-node set-node map-node]]
25-
[rewrite-clj.node.stringz :refer [string-node]]
24+
[rewrite-clj.node.stringz]
2625
[rewrite-clj.node.token :refer [token-node]]
2726
[rewrite-clj.node.uneval]
2827
[rewrite-clj.node.whitespace :as ws]]
2928
:cljs
30-
[[rewrite-clj.node.comment :refer [CommentNode]]
29+
[[clojure.string :as string]
30+
[rewrite-clj.node.comment :refer [CommentNode]]
3131
[rewrite-clj.node.fn :refer [FnNode]]
3232
[rewrite-clj.node.forms :refer [FormsNode]]
3333
[rewrite-clj.node.integer :refer [IntNode]]
@@ -39,7 +39,7 @@
3939
[rewrite-clj.node.reader-macro :refer [ReaderNode ReaderMacroNode DerefNode reader-macro-node var-node]]
4040
[rewrite-clj.node.regex :refer [RegexNode regex-node pattern-string-for-regex]]
4141
[rewrite-clj.node.seq :refer [SeqNode vector-node list-node set-node map-node]]
42-
[rewrite-clj.node.stringz :refer [StringNode string-node]]
42+
[rewrite-clj.node.stringz :refer [StringNode]]
4343
[rewrite-clj.node.token :refer [TokenNode SymbolNode token-node]]
4444
[rewrite-clj.node.uneval :refer [UnevalNode]]
4545
[rewrite-clj.node.whitespace :refer [WhitespaceNode CommaNode NewlineNode] :as ws]]))
@@ -90,20 +90,6 @@
9090

9191
;; ## Helpers
9292

93-
(defn- split-to-lines
94-
"Slightly different than string/split-lines in that:
95-
- escape inline double quotes (to emulate the clojure reader)
96-
- includes all lines even if empty
97-
- behaves the same on clj and cljs"
98-
[s]
99-
(loop [s (string/escape s {\" "\\\""})
100-
lines []]
101-
(if-let [m (first (re-find #"(\r\n|\r|\n)" s))]
102-
(let [eol-ndx (string/index-of s m)]
103-
(recur (subs s (+ eol-ndx (count m)))
104-
(conj lines (subs s 0 eol-ndx))))
105-
(conj lines s))))
106-
10793
(defn node-with-meta
10894
[n value]
10995
(if #?(:clj (instance? clojure.lang.IMeta value)
@@ -149,8 +135,13 @@
149135

150136
(extend-protocol NodeCoerceable
151137
#?(:clj java.lang.String :cljs string)
138+
;; You might we should be coercing to a string-node here.
139+
;; We did this initially for rewrite-clj v1 but found it made more sense to revert to v0 behaviour.
140+
;; The string-node was created to serve the parser and expects the strings to be escaped in a particular way.
141+
;; The token-node has no such expectations and is therefore, currently, what we use for coercing strings.
142+
;; If this makes future internal changes awkward we'll revisit.
152143
(coerce [v]
153-
(string-node (split-to-lines v))))
144+
(token-node v)))
154145

155146
#?(:clj
156147
(extend-protocol NodeCoerceable

src/rewrite_clj/node/string.clj

Lines changed: 42 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,23 @@
1111
(defn string-node
1212
"Create node representing a string value where `lines` can be a sequence of strings or a single string.
1313
14-
When `lines` is a sequence, the resulting node will `tag` will be `:multi-line`, otherwise `:token`.
14+
When `lines` is a sequence, the resulting node `tag` will be `:multi-line`, otherwise `:token`.
15+
16+
`:multi-line` refers to a single string in your source that appears over multiple lines, for example:
17+
18+
```Clojure
19+
(def s \"foo
20+
bar
21+
baz\")
22+
```
23+
24+
It does not apply to a string that appears on a single line that includes escaped newlines, for example:
25+
26+
```Clojure
27+
(def s \"foo\\nbar\\n\\baz\")
28+
```
29+
30+
Naive examples (see example on escaping below):
1531
1632
```Clojure
1733
(require '[rewrite-clj.node :as n])
@@ -23,5 +39,29 @@
2339
(-> (n/string-node [\"line1\" \"\" \"line3\"])
2440
n/string)
2541
;; => \"\\\"line1\\n\\nline3\\\"\"
26-
```"
42+
```
43+
44+
This function was originally written to serve the rewrite-clj parser.
45+
Escaping and wrapping expectations are non-obvious.
46+
- characters within strings are assumed to be escaped
47+
- but the string should not wrapped with `\\\"`
48+
49+
Here's an example of conforming to these expectations for a string that has escape sequences.
50+
(Best to view this on cljdoc, docstring string escaping is confusing).
51+
52+
```Clojure
53+
(require '[clojure.string :as string])
54+
55+
(defn pr-str-unwrapped [s]
56+
(apply str (-> s pr-str next butlast)))
57+
58+
(-> \"hey \\\" man\"
59+
pr-str-unwrapped
60+
n/string-node
61+
n/string)
62+
;; => \"\\\"hey \\\\\\\" man\\\"\"
63+
```
64+
65+
To construct strings appearing on a single line, consider [[token-node]].
66+
It will handle escaping for you."
2767
[lines] (rewrite-clj.node.stringz/string-node lines))

src/rewrite_clj/node/stringz.cljc

Lines changed: 42 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,23 @@
4242
(defn string-node
4343
"Create node representing a string value where `lines` can be a sequence of strings or a single string.
4444
45-
When `lines` is a sequence, the resulting node will `tag` will be `:multi-line`, otherwise `:token`.
45+
When `lines` is a sequence, the resulting node `tag` will be `:multi-line`, otherwise `:token`.
46+
47+
`:multi-line` refers to a single string in your source that appears over multiple lines, for example:
48+
49+
```Clojure
50+
(def s \"foo
51+
bar
52+
baz\")
53+
```
54+
55+
It does not apply to a string that appears on a single line that includes escaped newlines, for example:
56+
57+
```Clojure
58+
(def s \"foo\\nbar\\n\\baz\")
59+
```
60+
61+
Naive examples (see example on escaping below):
4662
4763
```Clojure
4864
(require '[rewrite-clj.node :as n])
@@ -54,7 +70,31 @@
5470
(-> (n/string-node [\"line1\" \"\" \"line3\"])
5571
n/string)
5672
;; => \"\\\"line1\\n\\nline3\\\"\"
57-
```"
73+
```
74+
75+
This function was originally written to serve the rewrite-clj parser.
76+
Escaping and wrapping expectations are non-obvious.
77+
- characters within strings are assumed to be escaped
78+
- but the string should not wrapped with `\\\"`
79+
80+
Here's an example of conforming to these expectations for a string that has escape sequences.
81+
(Best to view this on cljdoc, docstring string escaping is confusing).
82+
83+
```Clojure
84+
(require '[clojure.string :as string])
85+
86+
(defn pr-str-unwrapped [s]
87+
(apply str (-> s pr-str next butlast)))
88+
89+
(-> \"hey \\\" man\"
90+
pr-str-unwrapped
91+
n/string-node
92+
n/string)
93+
;; => \"\\\"hey \\\\\\\" man\\\"\"
94+
```
95+
96+
To construct strings appearing on a single line, consider [[token-node]].
97+
It will handle escaping for you."
5898
[lines]
5999
(if (string? lines)
60100
(->StringNode [lines])

src/rewrite_clj/node/token.cljc

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -79,7 +79,12 @@
7979
8080
(-> (n/token-node 42) n/string)
8181
;; => \"42\"
82-
```"
82+
83+
(-> (n/token-node \"astring\") n/string)
84+
;; => \"\\\"astring\\\"\"
85+
```
86+
87+
To construct strings appearing over multiple lines, see [[string-node]]."
8388
([value]
8489
(token-node value (pr-str value)))
8590
([value string-value]

test/rewrite_clj/node/coercer_test.cljc

Lines changed: 11 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -41,14 +41,15 @@
4141
::1.5.1 :token :keyword
4242
:namespace/keyword :token :keyword
4343

44-
"" :token :string
45-
"hello, over there!" :token :string
46-
"multi\nline" :multi-line :string
47-
" " :token :string
48-
"\n" :multi-line :string
49-
"\n\n" :multi-line :string
50-
"," :token :string
51-
"inner\"quote" :token :string
44+
"" :token :token
45+
"hello, over there!" :token :token
46+
"multi\nline" :token :token
47+
" " :token :token
48+
"\n" :token :token
49+
"\n\n" :token :token
50+
"," :token :token
51+
"inner\"quote" :token :token
52+
"\\s+" :token :token
5253

5354
;; seqs
5455
[] :vector :seq
@@ -63,10 +64,10 @@
6364

6465
;; date
6566
#inst "2014-11-26T00:05:23" :token :token))
66-
(testing "multi-line string newline variants are normalized"
67+
(testing "multi-line string newline variants are preserved"
6768
(let [s "hey\nyou\rover\r\nthere"
6869
n (node/coerce s)]
69-
(is (= "hey\nyou\nover\nthere" (node/sexpr n)))))
70+
(is (= s (node/sexpr n)))))
7071
(testing "coerce string roundtrip"
7172
(is (= "\"hey \\\" man\"" (-> "hey \" man" node/coerce node/string)))))
7273

0 commit comments

Comments
 (0)