Skip to content

Commit b51f5f5

Browse files
evelez7asl
authored andcommitted
address blog feedback, add diagrams
1 parent 4cd1b22 commit b51f5f5

File tree

3 files changed

+81
-17
lines changed

3 files changed

+81
-17
lines changed

content/posts/2025-gsoc-clang-doc.md

Lines changed: 81 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -8,8 +8,7 @@ title: "GSoC 2025: Improving Core Clang-Doc Functionality"
88
I was selected as a contributor for GSoC 2025 under the project "Improving Core Clang-Doc Functionality" for LLVM.
99
My mentors for the project were Paul Kirth and Petr Hosek.
1010

11-
Clang-Doc is a tool in clang-tools-extra that generates documentation from Clang's AST.
12-
Clang-Doc can output documentation in Markdown, HTML, YAML, and JSON.
11+
Clang-Doc is a tool in clang-tools-extra that generates documentation from Clang's AST and can output Markdown, HTML, YAML, and JSON.
1312
The project started in 2018 but major development eventually slowed.
1413
Recently, there have been efforts to get it back on track.
1514

@@ -26,7 +25,7 @@ The project idea proposed three main areas of focus to improve documentation qua
2625
First, not all C++ constructs were supported, like friends or concepts.
2726
Not supporting core C++ constructs in C++ documentation is not good.
2827
Second, it's important that Doxygen command support is robust and that we can support as many as possible.
29-
Third and last, having Markdown available to developers for documentation would be useful.
28+
Lastly, having Markdown available to developers for documentation would be useful.
3029
Markdown provides the power of expression in an area that is technically dense.
3130
It can be used to highlight critical information and warnings.
3231

@@ -39,49 +38,75 @@ Here's a quick overview on Clang-Doc's architecture, which follows a map-reduce
3938
3. Once all source declarations are serialized, write them into bitcode, reduce, and read the reduced Infos.
4039
4. Serialize Infos into the desired format.
4140

41+
<div style="margin:0 auto;">
42+
<img src="/img/gsoc-2025-clang-doc-architecture.png"><br/>
43+
</div>
44+
4245
It seems fairly straightforward, but the architecture had a critical flaw.
4346
If a new C++ construct needed to be supported, it would be visited and serialized, but then support would have to be added to each backend individually.
4447
If you wanted to serialize something in YAML, you'd have to implement the Markdown logic separately.
4548
This placed a very high maintenance cost for extending basic functionality, even if you just wanted to add something simple.
4649
It also easily led to generator disparity; a construct might be serialized in YAML, but not in Markdown.
47-
4850
Testing was also in an awkward spot because it was unclear what format would be used to verify if the documentation output was acceptable.
49-
YAML was the initial candidate for this, but my mentors had started to consider JSON instead.
50-
Feature parity was far apart; some backends were tested for certain attributes that others didn't have.
5151

5252
## The Good: Mustache
5353

5454
Last year's GSoC brought in great improvements that became the basis of my summer.
5555
First, last year's GSoC contributor landed a large performance improvement.
5656
I might not have been able to test Clang-Doc on Clang itself without it.
5757

58-
Another contribution that was essential to my summer is the [Mustache template implementation](https://mustache.github.io/) in LLVM.
58+
Another contribution that was essential to my summer is the [Mustache template engine](https://mustache.github.io/) implementation in LLVM.
5959
Mustache templates allow Clang-Doc to shift away from manually generating HTML tags and eliminate high maintenance burdens.
6060
Templates could also solve the feature parity problem by using JSON to feed templates.
6161

62-
6362
# Building a JSON Backend
6463

6564
While familiarizing myself with the codebase during the Community Bonding Period, I quickly determined that implementing a JSON backend would be incredibly beneficial to the project and my summer plans.
6665
A JSON backend presented two immediate benefits:
6766

68-
1. We could use it to feed HTML Mustache templates and future template usage.
67+
1. We could use it to feed our Mustache HTML templates and future template usage.
6968
2. As the main feeder format, testing can be focused on the JSON output.
7069

7170
The existing Mustache backend in Clang-Doc already contained logic to create JSON documents, but they were immediately discarded when the templates were rendered.
72-
I adapted most of the code into a separate generator to output JSON files and was able land it within two weeks.
73-
This ended up accelerating my work because I could implement support for C++ constructs and test them in JSON instead of another format that we would probably be refactoring in the near future.
71+
This backend is extremely beneficial to Clang-Doc because it would completely eliminate any need for manual HTML tag generation, thus greatly reducing lines of code.
72+
If the JSON and template rendering logic from the existing implementation were uncoupled, we could apply the same pattern to any format we'd want to support.
73+
For example, Markdown generation would be a similar case to HTML where templates would be used to automate the creation of all markup.
74+
75+
<div style="margin:0 auto;">
76+
<img src="/img/gsoc-2025-clang-doc-template-backend.png"><br/>
77+
</div>
78+
79+
This diagram models the architecture that Clang-Doc would follow given a unified JSON backend.
80+
Note the similarities to Clang, where our frontend (the visitation/serialization) gathers all the information we need and emits an intermediate representation (JSON).
81+
The JSON is then fed to the desired templates to produce our documentation, similar to how IR is used for different LLVM backends.
82+
Following this pattern would reduce the logic maintenance to only the JSON generation; all the formatting for HTML, Markdown, etc. would exist in template files that are very simple to change.
83+
84+
Thus, I adapted the JSON logic from the Mustache backend and create a separate JSON backend.
85+
I also added tests to ensure the C++ constructs that Clang-Doc already supported were properly serialized in JSON.
86+
I didn't realize it at the time, but this would end up dramatically accelerating my pace of implementation.
87+
88+
## C++ Language Support and Testing
89+
90+
After landing the JSON generator in about a week, I returned to my proposed schedule by implementing support for C++ constructs like friends.
91+
The new JSON generator allowed me to quickly implement and test these features because I didn't have to worry about HTML formatting or appearance.
92+
I could work with the assumption that as long as the information was properly serialized into JSON, it would be able to be displayed well in HTML later.
93+
94+
Testing is an area that the JSON backend brought a lot of clarity to.
95+
Clang-Doc didn't have a format where all the information we wanted, like ensuring we document that a variable is `cosnt` or `volatile`, was validated.
96+
At one time, YAML was meant to be that format, but it suffered from feature disparity since it wasn't relevant when something needed to be displayed in HTML.
97+
I ended up adding 14 different test files over the course of the summer to ensure test coverage.
7498

7599
### Pull Requests
76100
- [add tags to Mustache namespace template](https://github.com/llvm/llvm-project/pull/142045)
77-
- [add namespaces](https://github.com/llvm/llvm-project/pull/142483)
101+
- [add a JSON generator](https://github.com/llvm/llvm-project/pull/142483)
102+
- [add namespaces to JSON generator](https://github.com/llvm/llvm-project/pull/143209)
78103
- [removed default label on some switches](https://github.com/llvm/llvm-project/pull/143919)
79104
- [precommit](https://github.com/llvm/llvm-project/pull/144160) and [add support for concepts](https://github.com/llvm/llvm-project/pull/144430)
80105
- [precommit](https://github.com/llvm/llvm-project/pull/145069) and [document global variables](https://github.com/llvm/llvm-project/pull/145070)
81106
- [refactor JSONGenerator array usage](https://github.com/llvm/llvm-project/pull/145595)
82107
- [refactor BitcodeReader::readSubBlock](https://github.com/llvm/llvm-project/pull/145835)
83108
- [serialize isBuiltIn and IsTemplate](https://github.com/llvm/llvm-project/pull/146149)
84-
- [precommit](https://github.com/llvm/llvm-project/pull/146164) and [friends](https://github.com/llvm/llvm-project/pull/146165)
109+
- [precommit](https://github.com/llvm/llvm-project/pull/146164) and [serialize friends](https://github.com/llvm/llvm-project/pull/146165)
85110

86111
# Comments
87112

@@ -126,6 +151,18 @@ After the change, Clang-Doc's comments were structured like this:
126151

127152
Now, we can just iterate over every type of comment, which means iterating over every array.
128153
This left our JSON documentation with a few more fields, since one is needed for every Doxygen command, but with easier identification of what comments exist in the documentation.
154+
After this refactor was landed, I implemented support for the comments we had already supported and ones we didn't, like Doxygen code comments.
155+
156+
## Reaping the benefits of JSON
157+
158+
This was an area where a JSON backend once again accelerated my progress.
159+
Without it, I would've written the same JSON logic but would've had to written tests to check for the comments in HTML.
160+
This would've been incredibly cumbersome since I would've had to:
161+
162+
1. Add the appropriate templating language to allow the comments to render.
163+
2. Add the correct HTML tags to allow the test to pass.
164+
165+
Like I just mentioned, comments weren't being generated the best in HTML anyways, so I could've run into more annoyances if I had to follow that workflow.
129166

130167

131168
### Pull Requests
@@ -139,6 +176,8 @@ This left our JSON documentation with a few more fields, since one is needed for
139176
- [remove nesting of text comments inside paragraphs](https://github.com/llvm/llvm-project/pull/150451)
140177
- [generate comments for functions](https://github.com/llvm/llvm-project/pull/150570)
141178
- [add param comments to comment template](https://github.com/llvm/llvm-project/pull/150571)
179+
- [add return comments to comment template](https://github.com/llvm/llvm-project/pull/150647)
180+
- [add code comments to comment template](https://github.com/llvm/llvm-project/pull/150648)
142181

143182
# Markdown
144183
Markdown was the most speculative aspect of the project.
@@ -155,15 +194,39 @@ Without an out-of-the-box solution, we were left with implementing our own parse
155194
When I considered this in my proposal, I knew an in-tree parser would want to conform to the simplest possible standard.
156195
Markdown has no official standard, so I opted for CommonMark conformance.
157196

158-
The summer ended without a complete solution since the a couple weeks were spent researching whether or not this could be integrated directly in the Clang comment parser or whether we'd need to build our own solution or not.
197+
The summer ended without a complete solution since a couple weeks were spent researching whether or not this could be integrated directly in the Clang comment parser or whether we'd need to build our own solution or not.
159198
You can see my initial draft [here](https://github.com/llvm/llvm-project/pull/155887).
160199

200+
# Refactors, Name Mangling, and More!
201+
During my summer, I would stumble into places where I would think "This could be better" and my mentors usually agreed.
202+
Thus, there were a few patches where I dedicated time to general refactors to improve code reuse and hopefully make the lives of future contributors much easier than what I had to go through.
203+
In fact, one of my best refactors was of the JSON generator that I wrote, which my mentor noted had a lot of areas for great code reuse.
204+
Future me was extremely thankful for the easy-to-use functions I had added.
205+
I also refactored some of the bitcode reader/writer code so that less copy-pasting would be involved in the future.
206+
207+
Another signifcant feature that I hadn't planned was name mangling.
208+
Clang-Doc suffered from a bug where template specializations would be serialized to the same file as their described class because they had the same name.
209+
The YAML backend avoided this problem because its filenames were SymbolIDs, but this meant that the lit tests would have to use regex to find the file for FileCheck.
210+
Nasty.
211+
In HTML, you ended up with a buggy HTML page from duplicate tags.
212+
In JSON, you get a fatal error since two JSON documents can't live in the same file.
213+
So I used `ItaniumMangleContext` to generate mangled names we could use for the filenames.
214+
161215
# Overview
162-
I implemented a new JSON generator for Clang-Doc that will serve as the basis for documentation generation.
216+
I implemented a new JSON generator that will serve as the basis for Clang-Doc's documentation generation.
163217
This will vastly reduce overall lines of code and maintenance burdens.
218+
I added a lot of tests to increase code coverage and ensure we are serializing all the information necessary for high-quality documentation.
164219
I refactored our comment handling to streamline the logic that handles them and for better output in the HTML.
165220
I also explored options for rendering Markdown and began an implenetation for a parser that I plan on working on in the future.
166-
Along the way, I also did some refactoring to improve code reuse and improve contributor burden by reducing boilerplate code.
221+
Along the way, I also did some refactoring to improve code reuse and improved maintenance burdens by reducing boilerplate code.
222+
223+
After my work this summer, Clang-Doc is nearly ready to switch to HTML generation via Mustache templates, which will be a huge milestone.
224+
It is backed by the JSON generator which will allow for a much more flexible architecture that will change how we generate other documentation formats like our existing Markdown backend.
225+
All of this was done with little to no performance loss.
226+
227+
Another huge boon from my work this summer is that contributors should (hopefully) have a much easier time contributing to Clang-Doc.
228+
Before, the threshold for contributing was high due to a disjointed architecture.
229+
I hope that future contributors find Clang-Doc easier to navigate and to write helpful patches due to my work.
167230

168231
Over the summer, I addressed these issues:
169232
- [template operator T() produces a bad name](https://github.com/llvm/llvm-project/issues/59812)
@@ -172,12 +235,13 @@ Over the summer, I addressed these issues:
172235
- [Add a JSON backend to clang-doc to better leverage mustache templates](https://github.com/llvm/llvm-project/issues/140094)
173236

174237
# Future Work
238+
These are issues that I identified over the summer that I wasn't able to address but would benefit from community discussion and contribution.
175239

176240
## Doxygen Grouping
177241

178242
Doxygen has a very useful [grouping](https://www.doxygen.nl/manual/grouping.html) feature that allows structures to be grouped under a custom heading or on separate pages.
179243
You can see it in [llvm::sys::path](https://llvm.org/doxygen/namespacellvm_1_1sys_1_1path.html).
180-
We [opened up an issue](https://github.com/llvm/llvm-project/issues/151184#issuecomment-3133596874) for Clang to track this issue, which ended up being a duplicate of [this issue](https://github.com/llvm/llvm-project/issues/123582).
244+
We [opened up an issue](https://github.com/llvm/llvm-project/issues/151184) for Clang to track this issue, which ended up being a duplicate of [this issue](https://github.com/llvm/llvm-project/issues/123582).
181245

182246
There would most likely have to be some major changes to Clang's comment parsing and Clang's own parsing.
183247
That's because a lot of the group opening tokens in Clang are free-floating, like so:
7.24 KB
Loading
15.4 KB
Loading

0 commit comments

Comments
 (0)