Skip to content

Commit b4a6729

Browse files
authored
Scohen/encapsulate source file (#743)
* SourceFile experimental replacement This PR represents a suggestion on how elixir-ls can evolve the concept of the source file. The current implementation is a group of semi-related functions that operate on many different things, leading to logic that's spread around the codebase, with each user of SourceFile treating things differently. This module, however always operates on the data structure it defines, and provides a consistent and simplified interface for its users. Among other things, it: * Builds changes with iolists, and doesn't need to do any reversing options * Introduces a data structure representing a document, with 0(1) random access. * Introduces a line record that separates the ending from the text * Introduces a new line parser that is significantly faster and more efficient than SourceFile.lines_with_endings * Applies edits in a single loop over the document * Added benchmarks and line parser test * Optimized slice, removed dead code * experimental; changed text to to_string * Made an experimental top-level package This will help isolate the experimental stuff from the mainline more cleanly, and also allows swapping of implementations via aliasing * Added benchmark for document access While playing around with the benchmark, I noticed that we weren't consolidating protocols. This had a dramatic effect on the fetch performance of the fetch_line call. * Converted document backing store to tuples * Added benchmark to test the speeds of various backing stores for Documents * Ported ASCII detection / conversion from protocol branch When parsing a file, we check if each line is ascii or not. If it is, we can skip the conversions from utf-8 -> utf-16 -> utf8 when we apply changes from the client. This means we reduce a ton of allocations and simplify our code. I also made a conversions module and some types representing Positions and Ranges. Don't read too much into them, because I have a more extensive data model that I'd like to discuss. * Re-added Enum.at * Fixed package name * Performance improvements * Changed document implementation to use tuples * Enabled protocol consolidation outside of tests * Change LineParser to use an indexed based approach * Ported more improvements from the protocol branch * Slightly simplified line parsing code * Fixed utf conversion bugs LineParser was not correctly setting the ascii? flag. Also added an optimization to SourceFile to not convert the prefixes and suffixes to utf16 if the lines are ascii. * Removed utf16 conversions during document changes Prior to this change, document tracking required doing utf-8 to utf-16 conversion on every change. Now all conversion is done on the positions that are sent to us. We still need to do conversions on non-ascii lines, but this should greatly cut down on the number of conversions that elixir-ls has to do. There was also an off-by-one error in the `conversions.ex` file where we looked up the wrong line to do conversions. When utf16 conversions were removed from `source_file.ex`, the bug became apparent with several unit tests failing. This also includes a small change to heuristically detect utf16 text. * Removed utf16 conversions during document changes Prior to this change, document tracking required doing utf-8 to utf-16 conversion on every change. Now all conversion is done on the positions that are sent to us. We still need to do conversions on non-ascii lines, but this should greatly cut down on the number of conversions that elixir-ls has to do. There was also an off-by-one error in the `conversions.ex` file where we looked up the wrong line to do conversions. When utf16 conversions were removed from `source_file.ex`, the bug became apparent with several unit tests failing. This also includes a small change to heuristically detect utf16 text. * Removed protocol type * Simplified prefix/suffix functions * fixed import * Fixed benchamrk off-by-one error * Complied with out-of-spec unit tests Some of the microsoft node tests deal with out of spec-lsp messages. Prior, I was returning an error if an out-of-spec position was encountered, but this change allows us to accept these messages and handle them like the node client does. * fixed merge error * Revert "Complied with out-of-spec unit tests" This reverts commit e07e0af. * Added additional tests
1 parent 02d3b2e commit b4a6729

File tree

17 files changed

+1631
-3
lines changed

17 files changed

+1631
-3
lines changed
Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
file_generator = StreamData.string(:alphanumeric, min_size: 10, max_size: 120)
2+
3+
line_endings = ["\r", "\n", "\r\n"]
4+
5+
generate_lines = fn line_count ->
6+
:alphanumeric
7+
|> StreamData.string(min_size: 10, max_size: 120)
8+
|> Enum.take(line_count)
9+
end
10+
11+
Benchee.run(
12+
%{
13+
":array.get(count - 1, array)" => fn %{array: array, count: count} ->
14+
:array.get(count - 1, array)
15+
end,
16+
"Enum.at(lines, count - 1)" => fn %{lines: lines, count: count} ->
17+
Enum.at(lines, count - 1)
18+
end,
19+
"list |> List.to_tuple() |> elem(count - 1)" => fn %{lines: lines, count: count} ->
20+
lines |> List.to_tuple() |> elem(count - 1)
21+
end,
22+
"tuple" => fn %{tuple: tuple, count: count} ->
23+
elem(tuple, count - 1)
24+
end
25+
},
26+
inputs:
27+
Map.new([80, 500, 1500], fn count ->
28+
lines = generate_lines.(count)
29+
30+
{"#{count} lines",
31+
%{
32+
lines: lines,
33+
array: :array.from_list(lines),
34+
tuple: List.to_tuple(lines),
35+
count: count
36+
}}
37+
end)
38+
)
Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
alias ElixirLS.LanguageServer.Experimental.SourceFile.Document
2+
file_generator = StreamData.string(:alphanumeric, min_size: 10, max_size: 120)
3+
4+
line_endings = ["\r", "\n", "\r\n"]
5+
6+
generate_lines = fn line_count ->
7+
:alphanumeric
8+
|> StreamData.string(min_size: 10, max_size: 120)
9+
|> Enum.take(line_count)
10+
end
11+
12+
Benchee.run(
13+
%{
14+
"String.split |> Enum.at" => fn %{text: text, count: count} ->
15+
text
16+
|> String.split(line_endings)
17+
|> Enum.at(count - 1)
18+
end,
19+
"Enum.at" => fn %{lines: lines, count: count} ->
20+
Enum.at(lines, count - 1)
21+
end,
22+
"Document" => fn %{document: doc, count: count} ->
23+
{:ok, _} = Document.fetch_line(doc, count - 1)
24+
end,
25+
"Document.new |> Document.fetch_line" => fn %{text: text, count: count} ->
26+
text
27+
|> Document.new()
28+
|> Document.fetch_line(count)
29+
end
30+
},
31+
inputs:
32+
Map.new([80, 500, 1500], fn count ->
33+
lines = generate_lines.(count)
34+
text = Enum.join(lines, Enum.random(line_endings))
35+
36+
{"#{count} lines",
37+
%{
38+
lines: lines,
39+
document: Document.new(text),
40+
text: text,
41+
count: count
42+
}}
43+
end)
44+
)
Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
alias ElixirLS.LanguageServer.SourceFile
2+
alias ElixirLS.LanguageServer.Experimental.SourceFile.LineParser
3+
4+
line_endings = ["\r", "\n", "\r\n"]
5+
6+
generate_file = fn line_count ->
7+
:alphanumeric
8+
|> StreamData.string(min_size: 10, max_size: 120)
9+
|> Enum.take(line_count)
10+
|> Enum.join(Enum.random(line_endings))
11+
end
12+
13+
large_file = generate_file.(500)
14+
15+
Benchee.run(
16+
%{
17+
"SourceFile.lines" => &SourceFile.lines/1,
18+
"SourceFile.lines_with_endings/1" => &SourceFile.lines_with_endings/1,
19+
"LineParser.parse" => &LineParser.parse(&1, 1)
20+
},
21+
inputs: %{
22+
"80 lines" => generate_file.(80),
23+
"500 lines" => generate_file.(500),
24+
"1500 lines" => generate_file.(1500)
25+
}
26+
)
Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
defmodule ElixirLS.LanguageServer.Experimental.Protocol.Types do
2+
defmodule Position do
3+
defstruct [:line, :character]
4+
5+
def new(opts \\ []) do
6+
line = Keyword.get(opts, :line, 0)
7+
character = Keyword.get(opts, :character, 0)
8+
%__MODULE__{line: line, character: character}
9+
end
10+
end
11+
12+
defmodule Range do
13+
defstruct [:start, :end]
14+
15+
def new(opts \\ []) do
16+
start_pos = Keyword.get(opts, :start, Position.new())
17+
end_pos = Keyword.get(opts, :end, Position.new())
18+
%__MODULE__{start: start_pos, end: end_pos}
19+
end
20+
end
21+
end
Lines changed: 230 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,230 @@
1+
defmodule ElixirLS.LanguageServer.Experimental.SourceFile do
2+
alias ElixirLS.LanguageServer.Experimental.SourceFile.Conversions
3+
alias ElixirLS.LanguageServer.Experimental.SourceFile.Document
4+
alias ElixirLS.LanguageServer.Experimental.SourceFile.Position
5+
alias ElixirLS.LanguageServer.Experimental.SourceFile.Range
6+
alias ElixirLS.LanguageServer.SourceFile
7+
import ElixirLS.LanguageServer.Protocol, only: [range: 4]
8+
import ElixirLS.LanguageServer.Experimental.SourceFile.Line
9+
10+
defstruct [:uri, :path, :version, dirty?: false, document: nil]
11+
12+
@type t :: %__MODULE__{
13+
uri: String.t(),
14+
version: pos_integer(),
15+
dirty?: boolean,
16+
document: Document.t(),
17+
path: String.t()
18+
}
19+
20+
@type version :: pos_integer()
21+
@type change_application_error :: {:error, {:invalid_range, map()}}
22+
# public
23+
@spec new(URI.t(), String.t(), pos_integer()) :: t
24+
def new(uri, text, version) do
25+
%__MODULE__{
26+
uri: uri,
27+
version: version,
28+
document: Document.new(text),
29+
path: SourceFile.Path.from_uri(uri)
30+
}
31+
end
32+
33+
@spec mark_dirty(t) :: t
34+
def mark_dirty(%__MODULE__{} = source) do
35+
%__MODULE__{source | dirty?: true}
36+
end
37+
38+
@spec mark_clean(t) :: t
39+
def mark_clean(%__MODULE__{} = source) do
40+
%__MODULE__{source | dirty?: false}
41+
end
42+
43+
@spec fetch_text_at(t, version()) :: {:ok, String.t()} | :error
44+
def fetch_text_at(%__MODULE{} = source, line_number) do
45+
with {:ok, line(text: text)} <- Document.fetch_line(source.document, line_number) do
46+
{:ok, text}
47+
else
48+
_ ->
49+
:error
50+
end
51+
end
52+
53+
@spec apply_content_changes(t, pos_integer(), [map]) ::
54+
{:ok, t} | change_application_error()
55+
def apply_content_changes(%__MODULE__{version: current_version}, new_version, _)
56+
when new_version <= current_version do
57+
{:error, :invalid_version}
58+
end
59+
60+
def apply_content_changes(%__MODULE__{} = source, _, []) do
61+
{:ok, source}
62+
end
63+
64+
def apply_content_changes(%__MODULE__{} = source, version, changes) when is_list(changes) do
65+
result =
66+
Enum.reduce_while(changes, source, fn change, source ->
67+
case apply_change(source, change) do
68+
{:ok, new_source} ->
69+
{:cont, new_source}
70+
71+
error ->
72+
{:halt, error}
73+
end
74+
end)
75+
76+
case result do
77+
%__MODULE__{} = source ->
78+
source = mark_dirty(%__MODULE__{source | version: version})
79+
80+
{:ok, source}
81+
82+
error ->
83+
error
84+
end
85+
end
86+
87+
def to_string(%__MODULE__{} = source) do
88+
source
89+
|> to_iodata()
90+
|> IO.iodata_to_binary()
91+
end
92+
93+
# private
94+
95+
defp line_count(%__MODULE__{} = source) do
96+
Document.size(source.document)
97+
end
98+
99+
defp apply_change(
100+
%__MODULE__{} = source,
101+
%Range{start: %Position{} = start_pos, end: %Position{} = end_pos},
102+
new_text
103+
) do
104+
start_line = start_pos.line
105+
106+
new_lines_iodata =
107+
cond do
108+
start_line > line_count(source) ->
109+
append_to_end(source, new_text)
110+
111+
start_line <= 0 ->
112+
prepend_to_beginning(source, new_text)
113+
114+
true ->
115+
apply_valid_edits(source, new_text, start_pos, end_pos)
116+
end
117+
118+
new_document =
119+
new_lines_iodata
120+
|> IO.iodata_to_binary()
121+
|> Document.new()
122+
123+
{:ok, %__MODULE__{source | document: new_document}}
124+
end
125+
126+
defp apply_change(
127+
%__MODULE__{} = source,
128+
%{
129+
"range" => range(start_line, start_char, end_line, end_char) = range,
130+
"text" => new_text
131+
}
132+
)
133+
when start_line >= 0 and start_char >= 0 and end_line >= 0 and end_char >= 0 do
134+
with {:ok, ex_range} <- Conversions.to_elixir(range, source) do
135+
apply_change(source, ex_range, new_text)
136+
else
137+
_ ->
138+
{:error, {:invalid_range, range}}
139+
end
140+
end
141+
142+
defp apply_change(%__MODULE__{}, %{"range" => invalid_range}) do
143+
{:error, {:invalid_range, invalid_range}}
144+
end
145+
146+
defp apply_change(
147+
%__MODULE__{} = source,
148+
%{"text" => new_text}
149+
) do
150+
{:ok, %__MODULE__{source | document: Document.new(new_text)}}
151+
end
152+
153+
defp append_to_end(%__MODULE__{} = source, edit_text) do
154+
[to_iodata(source), edit_text]
155+
end
156+
157+
defp prepend_to_beginning(%__MODULE__{} = source, edit_text) do
158+
[edit_text, to_iodata(source)]
159+
end
160+
161+
defp apply_valid_edits(%__MODULE{} = source, edit_text, start_pos, end_pos) do
162+
Enum.reduce(source.document, [], fn line() = line, acc ->
163+
case edit_action(line, edit_text, start_pos, end_pos) do
164+
:drop ->
165+
acc
166+
167+
{:append, io_data} ->
168+
[acc, io_data]
169+
end
170+
end)
171+
end
172+
173+
defp edit_action(line() = line, edit_text, %Position{} = start_pos, %Position{} = end_pos) do
174+
%Position{line: start_line, character: start_char} = start_pos
175+
%Position{line: end_line, character: end_char} = end_pos
176+
177+
line(line_number: line_number, text: text, ending: ending) = line
178+
179+
cond do
180+
line_number < start_line ->
181+
{:append, [text, ending]}
182+
183+
line_number > end_line ->
184+
{:append, [text, ending]}
185+
186+
line_number == start_line && line_number == end_line ->
187+
prefix_text = utf8_prefix(text, start_char)
188+
suffix_text = utf8_suffix(text, end_char)
189+
190+
{:append, [prefix_text, edit_text, suffix_text, ending]}
191+
192+
line_number == start_line ->
193+
prefix_text = utf8_prefix(text, start_char)
194+
{:append, [prefix_text, edit_text]}
195+
196+
line_number == end_line ->
197+
suffix_text = utf8_suffix(text, end_char)
198+
{:append, [suffix_text, ending]}
199+
200+
true ->
201+
:drop
202+
end
203+
end
204+
205+
defp utf8_prefix(text, start_index) do
206+
length = max(0, start_index)
207+
binary_part(text, 0, length)
208+
end
209+
210+
defp utf8_suffix(text, start_index) do
211+
byte_count = byte_size(text)
212+
start_index = min(start_index, byte_count)
213+
length = byte_count - start_index
214+
binary_part(text, start_index, length)
215+
end
216+
217+
defp to_iodata(%__MODULE__{} = source) do
218+
Document.to_iodata(source.document)
219+
end
220+
221+
defp increment_version(%__MODULE__{} = source) do
222+
version =
223+
case source.version do
224+
v when is_integer(v) -> v + 1
225+
_ -> 1
226+
end
227+
228+
%__MODULE__{source | version: version}
229+
end
230+
end

0 commit comments

Comments
 (0)