You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -14,7 +14,7 @@ Recently, there have been efforts to get it back on track.
14
14
15
15
This year, the [GSoC project idea](https://discourse.llvm.org/t/improve-documentation-parsing-in-clang/84513) had a simple premise: improve core functionality.
16
16
17
-
##The Issues
17
+
# The Project
18
18
19
19
The project idea proposed three main areas of focus to improve documentation quality.
20
20
@@ -29,48 +29,105 @@ Lastly, having Markdown available to developers for documentation would be usefu
29
29
Markdown provides the power of expression in an area that is technically dense.
30
30
It can be used to highlight critical information and warnings.
31
31
32
-
###The Architecture
32
+
# The Architecture
33
33
34
34
Here's a quick overview on Clang-Doc's architecture, which follows a map-reduce pattern:
It seems fairly straightforward, but the architecture had a critical flaw.
46
-
If a new C++ construct needed to be supported, it would be visited and serialized, but then support would have to be added to each backend individually.
47
-
If you wanted to serialize something in YAML, you'd have to implement the Markdown logic separately.
48
-
This imposed a very high maintenance cost for extending basic functionality, even if you just wanted to add something simple.
49
-
It also easily led to generator disparity; a construct might be serialized in YAML, but not in Markdown.
50
-
Testing was also in an awkward spot because it was unclear what format would be used to verify if the documentation output was acceptable.
56
+
Unlike in LLVM, Clang-Doc doesn't have a framework like CodeGen that shares functionality across different targets.
57
+
To document a `class`, every backend needs to independently implement logic to serialize the `class` into its target format.
58
+
Each backend also has separate logic to write all of the documented entities to disk.
59
+
There is also no IR where Infos can be preprocessed, which means that any organizational preprocessing done in a backend cant be shared.
60
+
61
+
Here's the code for serializing the bases and virtual bases of a class in the HTML backend:
You can see how differently both backends need to handle these constructs, which makes it complicated to bring feature parity.
95
+
The HTML tag creation being so tightly coupled to the documentation serialization also highlights a different problem of formatting problems being difficult to identify.
96
+
97
+
This lack of generic handling imposed a very high maintenance cost for extending basic functionality.
98
+
It also easily led to backend disparity; a construct might be serialized in YAML, but not in Markdown.
99
+
Changes to how a documentation entity was handled would not be uniform across backends.
51
100
52
-
## The Good: Mustache
101
+
Testing was also in an awkward spot.
102
+
If not all backends were guaranteed to generate the same documentation, who could be trusted as the source of truth?
103
+
YAML was originally meant to serve this role, but it suffered from feature disparity.
104
+
It's a cumbersome process to implement support for a construct in YAML, verify it there, but then also go to implement it in HTML.
105
+
There's a logical disconnect: what's serialized in YAML isn't guaranteed to reflect in HTML, so what is the benefit of updating YAML if my documentation is shown through HTML?
53
106
107
+
## The Good
108
+
109
+
The good news is that Clang-Doc's recent improvements had brought in changes that could rectify these problems, with a bit more work.
54
110
Last year's GSoC brought in great improvements that became the basis of my summer.
55
111
First, last year's GSoC contributor landed a large performance improvement.
56
112
I might not have been able to test Clang-Doc on Clang itself without it.
57
113
58
-
Another contribution that was essential to my summer is the [Mustache template engine](https://mustache.github.io/) implementation in LLVM.
114
+
The same contributor authored the [Mustache template engine](https://mustache.github.io/) implementation in LLVM.
59
115
Mustache templates allow Clang-Doc to shift away from manually generating HTML tags and eliminate high maintenance burdens.
60
116
Templates could also solve the feature parity problem by using JSON to feed templates.
117
+
This was a huge part of my summer and allowed me to bring in great improvements that would make Clang-Doc more flexible and easier to contribute to.
61
118
62
119
# Building a JSON Backend
63
120
64
-
While familiarizing myself with the codebase during the Community Bonding Period, I quickly determined that implementing a JSON backend would be incredibly beneficial to the project and my summer plans.
121
+
While studying the codebase during the Community Bonding Period, I determined that creating a separate JSON backend would be extremely helpful.
65
122
A JSON backend presented two immediate benefits:
66
123
67
124
1. We could use it to feed our Mustache HTML templates and future template usage.
68
125
2. As the main feeder format, testing can be focused on the JSON output.
69
126
70
127
The existing Mustache backend in Clang-Doc already contained logic to create JSON documents, but they were immediately discarded when the templates were rendered.
71
-
This backend is extremely beneficial to Clang-Doc because it would completely eliminate any need for manual HTML tag generation, thus greatly reducing lines of code.
128
+
This backend is extremely beneficial to Clang-Doc because it completely eliminated any need for manual HTML tag generation, thus greatly reducing lines of code.
72
129
If the JSON and template rendering logic from the existing implementation were uncoupled, we could apply the same pattern to any format we'd want to support.
73
-
For example, Markdown generation would be a similar case to HTML where templates would be used to automate the creation of all markup.
130
+
Markdown generation would be a similar case where templates would be used to automate the creation of all markup syntax.
@@ -79,21 +136,25 @@ For example, Markdown generation would be a similar case to HTML where templates
79
136
This diagram models the architecture that Clang-Doc would follow given a unified JSON backend.
80
137
Note the similarities to Clang, where our frontend (the visitation/serialization) gathers all the information we need and emits an intermediate representation (JSON).
81
138
The JSON is then fed to the desired templates to produce our documentation, similar to how IR is used for different LLVM backends.
82
-
Following this pattern would reduce the logic maintenance to only the JSON generation; all the formatting for HTML, Markdown, etc. would exist in template files that are very simple to change.
139
+
Following this pattern would reduce the logic maintenance to only the JSON generation; all the formatting for HTML, Markdown, etc. would exist in template files that are very simple to change and neatly separates documentation logic from display/formatting logic.
140
+
Also note how much more streamlined it is compared to the previous diagram where serialization logic was separated among Clang-Doc's backends.
83
141
84
142
Thus, I adapted the JSON logic from the Mustache backend and create a separate JSON backend.
85
143
I also added tests to ensure the C++ constructs that Clang-Doc already supported were properly serialized in JSON.
86
144
I didn't realize it at the time, but this would end up dramatically accelerating my pace of implementation.
145
+
I was especially pleased with the timeframe of this feature since I had no plans at all to work on it when submitting my proposal.
87
146
88
147
## C++ Language Support and Testing
89
148
90
-
After landing the JSON generator in about a week, I returned to my proposed schedule by implementing support for C++ constructs like friends.
149
+
After landing the JSON generator in about a week, I returned to my proposed schedule by implementing support for missing C++ constructs.
91
150
The new JSON generator allowed me to quickly implement and test these features because I didn't have to worry about HTML formatting or appearance.
92
151
I could work with the assumption that as long as the information was properly serialized into JSON, it would be able to be displayed well in HTML later.
93
152
94
153
Testing is an area that the JSON backend brought a lot of clarity to.
95
-
Clang-Doc didn't have a format where all the information we wanted, like ensuring we document that a variable is `const` or `volatile`, was validated.
154
+
Clang-Doc didn't have a format where all the information we wanted to document was validated.
96
155
At one time, YAML was meant to be that format, but it suffered from feature disparity since it wasn't relevant when something needed to be displayed in HTML.
156
+
If we used HTML instead, there was a lot of other data (tags, indentation, classes, IDs) that would need to be validated alongside the construct.
157
+
Testing the documentation and testing the displayed content are two different tasks.
97
158
I ended up adding 14 different test files over the course of the summer to ensure test coverage.
98
159
99
160
### Pull Requests
@@ -124,7 +185,10 @@ The only logic operation that Mustache has to check if a field exists is an iter
124
185
{{/Fields}}
125
186
```
126
187
127
-
All of the logic to order them needs to be done in the serialization to JSON itself, so I overhauled our comment organization.
188
+
the `<h3>` header would be duplicated for every iteration over `Fields`.
189
+
If the header was outside of the iteration, then it would be displayed even if there weren't any elements in `Fields`.
190
+
All of the logic to order them needs to be done in the serialization to JSON itself, so I had overhaul our comment organization.
191
+
128
192
Previously, Clang-Doc's comments were organized exactly as in Clang's AST like the following:
129
193
130
194
- FullComment
@@ -153,16 +217,20 @@ After this refactor was landed, I implemented support for the comments we had al
153
217
154
218
## Reaping the benefits of JSON
155
219
156
-
This was an area where a JSON backend once again accelerated my progress.
157
-
Without it, I would've written the same JSON logic but would've had to written tests to check for the comments in HTML.
158
-
This would've been incredibly cumbersome since I would've had to:
220
+
This was an area where the JSON backend really accelerated my progress.
221
+
Without it, I would've written the same JSON logic but written tests for HTML output.
222
+
This meant that I would've had to:
159
223
160
224
1. Add the appropriate templating language to allow the comments to render.
161
225
2. Add the correct HTML tags to allow the test to pass.
162
226
163
227
As I mentioned, comments weren't being generated the best in HTML anyways, so I could've run into more annoyances if I had to follow that workflow.
228
+
Instead, I could just write some really simple JSON.
164
229
165
230
### Pull Requests
231
+
232
+
Here are the pull requests I made during this phase of the project:
233
+
166
234
-[add namespace references to VarInfo](https://github.com/llvm/llvm-project/pull/146964)
167
235
-[fix ASan complaints from passing RepositoryURL as reference](https://github.com/llvm/llvm-project/pull/148923)
168
236
-[integrate JSON as the source for Mustache templates](https://github.com/llvm/llvm-project/pull/149589)
@@ -179,7 +247,7 @@ Markdown was the most speculative aspect of the project.
179
247
It wasn't clear whether we'd try to integrate a solution into Clang itself or whether we'd keep it in clang-tools-extra.
180
248
181
249
## A JavaScript Solution
182
-
The first option I explored was suggested by my mentor, which was a JavaScript library called [Markdown-Tag](https://github.com/MarketingPipeline/Markdown-Tag)
250
+
The first option I explored was suggested by my mentor, which was a JavaScript library called [Markdown-Tag](https://github.com/MarketingPipeline/Markdown-Tag).
183
251
This would've been really convenient since all it requires is an HTML tag to enable rendering, so any comment text in a template can be easily rendered.
184
252
Unfortunately, it requires all HTML to be sanitized, which defeats the purpose of a ready-made solution for us.
185
253
We would have to parse any potential HTML in comments anyways.
@@ -198,17 +266,49 @@ During my summer, I would stumble into places where I would think "This could be
198
266
Thus, there were a few patches where I dedicated time to general refactors to improve code reuse and hopefully make the lives of future contributors much easier than what I had to go through.
199
267
In fact, one of my best refactors was of the JSON generator that I wrote, which my mentor noted had a lot of areas for great code reuse.
200
268
Future me was extremely thankful for the easy-to-use functions I had added.
201
-
I also refactored some of the bitcode reader/writer code so that less copy-pasting would be involved in the future.
202
269
203
-
Another significant feature that I hadn't planned was name mangling.
204
-
Clang-Doc suffered from a bug where template specializations would be serialized to the same file as their described class because they had the same name.
270
+
## Bitcode Refactor
271
+
The bitcode read/write operations contain highly repetitive code.
272
+
Adding something to documentation, like serializing `const` for a function, required several copy-pastes in several locations.
273
+
It was structured like so:
274
+
275
+
```cpp
276
+
case BI_MEMBER_TYPE_BLOCK_ID: {
277
+
MemberTypeInfo TI;
278
+
if (auto Err = readBlock(ID, &TI))
279
+
return Err;
280
+
if (auto Err = addTypeInfo(I, std::move(TI)))
281
+
return Err;
282
+
return llvm::Error::success();
283
+
}
284
+
```
285
+
286
+
`addTypeInfo` is specific for `MemberTypeInfo`, so every other type of `Info` would need to call its own function.
287
+
Hence, highly repetitive similar code.
288
+
I refactored that block to this:
289
+
290
+
```cpp
291
+
return handleTypeSubBlock<TypeInfo>(ID, I, CreateAddFunc(addTypeInfo<T, TypeInfo>));
292
+
```
293
+
294
+
`handleTypeSubBlock` contains the same logic as the previous block, but it calls a generic `Function`.
295
+
All of this was achieved without compromising the performance of documentation generation.
296
+
297
+
## Mangling Filenames
298
+
Clang-Doc had a bug stemming from non-unique filenames.
205
299
The YAML backend avoided this problem because its filenames were SymbolIDs, but this meant that the lit tests would have to use regex to find the file for FileCheck.
206
300
Nasty.
207
-
In HTML, you ended up with a buggy HTML page from duplicate tags.
208
-
In JSON, you get a fatal error since two JSON documents can't live in the same file.
209
-
So I used `ItaniumMangleContext` to generate mangled names we could use for the filenames.
210
301
211
-
### Pull Requests
302
+
In HTML and JSON, the filenames for classes were just the class name.
303
+
If you had a template specialization, this would cause problems
304
+
In HTML, we'd get duplicate HTML resulting in wonky web pages.
305
+
In JSON, we'd get a fatal error from the JSON parser since there were two sets of top level braces.
306
+
I used `ItaniumMangleContext` to generate mangled names we could use for the filenames.
307
+
308
+
## Pull Requests
309
+
310
+
Here are the pull requests I made for refactors during the project:
311
+
212
312
-[Serialize record files with mangled name](https://github.com/llvm/llvm-project/pull/148021)
@@ -219,12 +319,12 @@ I implemented a new JSON generator that will serve as the basis for Clang-Doc's
219
319
This will vastly reduce overall lines of code and maintenance burdens.
220
320
I added a lot of tests to increase code coverage and ensure we are serializing all the information necessary for high-quality documentation.
221
321
I refactored our comment handling to streamline the logic that handles them and for better output in the HTML.
222
-
I also explored options for rendering Markdown and began an implemetation for a parser that I plan on working on in the future.
322
+
I also explored options for rendering Markdown and began an implementation for a parser that I plan on working on in the future.
223
323
Along the way, I also did some refactoring to improve code reuse and improved maintenance burdens by reducing boilerplate code.
224
324
225
325
After my work this summer, Clang-Doc is nearly ready to switch to HTML generation via Mustache templates, which will be a huge milestone.
226
326
It is backed by the JSON generator which will allow for a much more flexible architecture that will change how we generate other documentation formats like our existing Markdown backend.
227
-
All of this was achieved with little to no performance loss.
327
+
All of this was achieved without compromising the performance of documentation generation.
228
328
I also hope that future contributors have an easier time than I did learning about and working with Clang-Doc.
229
329
The threshold for contributing was high due to a disjointed architecture.
0 commit comments