Skip to content

Commit e3c81b6

Browse files
bors[bot]matklad
andauthored
Merge #2941
2941: Freshen Architecture.md document r=matklad a=matklad Co-authored-by: Aleksey Kladov <[email protected]>
2 parents 2fb6af8 + 84dfbfb commit e3c81b6

File tree

2 files changed

+45
-38
lines changed

2 files changed

+45
-38
lines changed

docs/dev/README.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -106,6 +106,10 @@ communication, and `print!` would break it.
106106
If I need to fix something simultaneously in the server and in the client, I
107107
feel even more sad. I don't have a specific workflow for this case.
108108

109+
Additionally, I use `cargo run --release -p ra_cli -- analysis-stats
110+
path/to/some/rust/crate` to run a batch analysis. This is primaraly useful for
111+
performance optimiations, or for bug minimization.
112+
109113
# Logging
110114

111115
Logging is done by both rust-analyzer and VS Code, so it might be tricky to

docs/dev/architecture.md

Lines changed: 41 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,9 @@ analyzer:
1212

1313
https://www.youtube.com/playlist?list=PL85XCvVPmGQho7MZkdW-wtPtuJcFpzycE
1414

15+
Note that the guide and videos are pretty dated, this document should be in
16+
generally fresher.
17+
1518
## The Big Picture
1619

1720
![](https://user-images.githubusercontent.com/1711539/50114578-e8a34280-0255-11e9-902c-7cfc70747966.png)
@@ -20,13 +23,12 @@ On the highest level, rust-analyzer is a thing which accepts input source code
2023
from the client and produces a structured semantic model of the code.
2124

2225
More specifically, input data consists of a set of test files (`(PathBuf,
23-
String)` pairs) and information about project structure, captured in the so called
24-
`CrateGraph`. The crate graph specifies which files are crate roots, which cfg
25-
flags are specified for each crate (TODO: actually implement this) and what
26-
dependencies exist between the crates. The analyzer keeps all this input data in
27-
memory and never does any IO. Because the input data is source code, which
28-
typically measures in tens of megabytes at most, keeping all input data in
29-
memory is OK.
26+
String)` pairs) and information about project structure, captured in the so
27+
called `CrateGraph`. The crate graph specifies which files are crate roots,
28+
which cfg flags are specified for each crate and what dependencies exist between
29+
the crates. The analyzer keeps all this input data in memory and never does any
30+
IO. Because the input data are source code, which typically measures in tens of
31+
megabytes at most, keeping everything in memory is OK.
3032

3133
A "structured semantic model" is basically an object-oriented representation of
3234
modules, functions and types which appear in the source code. This representation
@@ -43,44 +45,50 @@ can be quickly updated for small modifications.
4345
## Code generation
4446

4547
Some of the components of this repository are generated through automatic
46-
processes. These are outlined below:
48+
processes. `cargo xtask codegen` runs all generation tasks. Generated code is
49+
commited to the git repository.
50+
51+
In particular, `cargo xtask codegen` generates:
52+
53+
1. [`syntax_kind/generated`](https://github.com/rust-analyzer/rust-analyzer/blob/a0be39296d2925972cacd9fbf8b5fb258fad6947/crates/ra_parser/src/syntax_kind/generated.rs)
54+
-- the set of terminals and non-terminals of rust grammar.
4755

48-
- `cargo xtask codegen`: The kinds of tokens that are reused in several places, so a generator
49-
is used. We use `quote!` macro to generate the files listed below, based on
50-
the grammar described in [grammar.ron]:
51-
- [ast/generated.rs][ast generated]
52-
- [syntax_kind/generated.rs][syntax_kind generated]
56+
2. [`ast/generated`](https://github.com/rust-analyzer/rust-analyzer/blob/a0be39296d2925972cacd9fbf8b5fb258fad6947/crates/ra_syntax/src/ast/generated.rs)
57+
-- AST data structure.
5358

54-
[grammar.ron]: ../../crates/ra_syntax/src/grammar.ron
55-
[ast generated]: ../../crates/ra_syntax/src/ast/generated.rs
56-
[syntax_kind generated]: ../../crates/ra_parser/src/syntax_kind/generated.rs
59+
.3 [`doc_tests/generated`](https://github.com/rust-analyzer/rust-analyzer/blob/a0be39296d2925972cacd9fbf8b5fb258fad6947/crates/ra_assists/src/doc_tests/generated.rs),
60+
[`test_data/parser/inline`](https://github.com/rust-analyzer/rust-analyzer/tree/a0be39296d2925972cacd9fbf8b5fb258fad6947/crates/ra_syntax/test_data/parser/inline)
61+
-- tests for assists and the parser.
62+
63+
The source for 1 and 2 is in [`ast_src.rs`](https://github.com/rust-analyzer/rust-analyzer/blob/a0be39296d2925972cacd9fbf8b5fb258fad6947/xtask/src/ast_src.rs).
5764

5865
## Code Walk-Through
5966

6067
### `crates/ra_syntax`, `crates/ra_parser`
6168

6269
Rust syntax tree structure and parser. See
63-
[RFC](https://github.com/rust-lang/rfcs/pull/2256) for some design notes.
70+
[RFC](https://github.com/rust-lang/rfcs/pull/2256) and [./syntax.md](./syntax.md) for some design notes.
6471

6572
- [rowan](https://github.com/rust-analyzer/rowan) library is used for constructing syntax trees.
6673
- `grammar` module is the actual parser. It is a hand-written recursive descent parser, which
6774
produces a sequence of events like "start node X", "finish node Y". It works similarly to [kotlin's parser](https://github.com/JetBrains/kotlin/blob/4d951de616b20feca92f3e9cc9679b2de9e65195/compiler/frontend/src/org/jetbrains/kotlin/parsing/KotlinParsing.java),
6875
which is a good source of inspiration for dealing with syntax errors and incomplete input. Original [libsyntax parser](https://github.com/rust-lang/rust/blob/6b99adeb11313197f409b4f7c4083c2ceca8a4fe/src/libsyntax/parse/parser.rs)
6976
is what we use for the definition of the Rust language.
70-
- `parser_api/parser_impl` bridges the tree-agnostic parser from `grammar` with `rowan` trees.
71-
This is the thing that turns a flat list of events into a tree (see `EventProcessor`)
77+
- `TreeSink` and `TokenSource` traits bridge the tree-agnostic parser from `grammar` with `rowan` trees.
7278
- `ast` provides a type safe API on top of the raw `rowan` tree.
73-
- `grammar.ron` RON description of the grammar, which is used to
74-
generate `syntax_kinds` and `ast` modules, using `cargo xtask codegen` command.
75-
- `algo`: generic tree algorithms, including `walk` for O(1) stack
76-
space tree traversal (this is cool).
79+
- `ast_src` description of the grammar, which is used to generate `syntax_kinds`
80+
and `ast` modules, using `cargo xtask codegen` command.
7781

7882
Tests for ra_syntax are mostly data-driven: `test_data/parser` contains subdirectories with a bunch of `.rs`
7983
(test vectors) and `.txt` files with corresponding syntax trees. During testing, we check
8084
`.rs` against `.txt`. If the `.txt` file is missing, it is created (this is how you update
8185
tests). Additionally, running `cargo xtask codegen` will walk the grammar module and collect
8286
all `// test test_name` comments into files inside `test_data/parser/inline` directory.
8387

88+
Note
89+
[`api_walkthrough`](https://github.com/rust-analyzer/rust-analyzer/blob/2fb6af89eb794f775de60b82afe56b6f986c2a40/crates/ra_syntax/src/lib.rs#L190-L348)
90+
in particular: it shows off various methods of working with syntax tree.
91+
8492
See [#93](https://github.com/rust-analyzer/rust-analyzer/pull/93) for an example PR which
8593
fixes a bug in the grammar.
8694

@@ -94,18 +102,22 @@ defines most of the "input" queries: facts supplied by the client of the
94102
analyzer. Reading the docs of the `ra_db::input` module should be useful:
95103
everything else is strictly derived from those inputs.
96104

97-
### `crates/ra_hir`
105+
### `crates/ra_hir*` crates
98106

99107
HIR provides high-level "object oriented" access to Rust code.
100108

101109
The principal difference between HIR and syntax trees is that HIR is bound to a
102-
particular crate instance. That is, it has cfg flags and features applied (in
103-
theory, in practice this is to be implemented). So, the relation between
104-
syntax and HIR is many-to-one. The `source_binder` module is responsible for
105-
guessing a HIR for a particular source position.
110+
particular crate instance. That is, it has cfg flags and features applied. So,
111+
the relation between syntax and HIR is many-to-one. The `source_binder` module
112+
is responsible for guessing a HIR for a particular source position.
106113

107114
Underneath, HIR works on top of salsa, using a `HirDatabase` trait.
108115

116+
`ra_hir_xxx` crates have a strong ECS flavor, in that they work with raw ids and
117+
directly query the databse.
118+
119+
The top-level `ra_hir` façade crate wraps ids into a more OO-flavored API.
120+
109121
### `crates/ra_ide`
110122

111123
A stateful library for analyzing many Rust files as they change. `AnalysisHost`
@@ -135,18 +147,9 @@ different from data on disk. This is more or less the single really
135147
platform-dependent component, so it lives in a separate repository and has an
136148
extensive cross-platform CI testing.
137149

138-
### `crates/gen_lsp_server`
139-
140-
A language server scaffold, exposing a synchronous crossbeam-channel based API.
141-
This crate handles protocol handshaking and parsing messages, while you
142-
control the message dispatch loop yourself.
143-
144-
Run with `RUST_LOG=sync_lsp_server=debug` to see all the messages.
145-
146150
### `crates/ra_cli`
147151

148-
A CLI interface to rust-analyzer.
149-
152+
A CLI interface to rust-analyzer, mainly for testing.
150153

151154
## Testing Infrastructure
152155

0 commit comments

Comments
 (0)