Skip to content

Commit 4cd1b22

Browse files
evelez7asl
authored andcommitted
GSoC 2025 Clang-Doc blog
1 parent c4a715a commit 4cd1b22

File tree

1 file changed

+207
-0
lines changed

1 file changed

+207
-0
lines changed
Lines changed: 207 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,207 @@
1+
---
2+
author: "Erick Velez (evelez7)"
3+
date: "2025-08-28"
4+
tags: ["GSoC", "clang-doc", "clang-tools-extra", "documentation"]
5+
title: "GSoC 2025: Improving Core Clang-Doc Functionality"
6+
---
7+
8+
I was selected as a contributor for GSoC 2025 under the project "Improving Core Clang-Doc Functionality" for LLVM.
9+
My mentors for the project were Paul Kirth and Petr Hosek.
10+
11+
Clang-Doc is a tool in clang-tools-extra that generates documentation from Clang's AST.
12+
Clang-Doc can output documentation in Markdown, HTML, YAML, and JSON.
13+
The project started in 2018 but major development eventually slowed.
14+
Recently, there have been efforts to get it back on track.
15+
16+
This year, the [GSoC project idea](https://discourse.llvm.org/t/improve-documentation-parsing-in-clang/84513) had a simple premise: improve core functionality.
17+
18+
## The Issues
19+
20+
The project idea proposed three main areas of focus to improve documentation quality.
21+
22+
1. C++ support
23+
2. Doxygen comments
24+
3. Markdown support
25+
26+
First, not all C++ constructs were supported, like friends or concepts.
27+
Not supporting core C++ constructs in C++ documentation is not good.
28+
Second, it's important that Doxygen command support is robust and that we can support as many as possible.
29+
Third and last, having Markdown available to developers for documentation would be useful.
30+
Markdown provides the power of expression in an area that is technically dense.
31+
It can be used to highlight critical information and warnings.
32+
33+
### The Architecture
34+
35+
Here's a quick overview on Clang-Doc's architecture, which follows a map-reduce pattern:
36+
37+
1. Visit source declarations via Clang's ASTVisitor.
38+
2. Serialize relevant source information into an Info (Clang-Doc's main data entity).
39+
3. Once all source declarations are serialized, write them into bitcode, reduce, and read the reduced Infos.
40+
4. Serialize Infos into the desired format.
41+
42+
It seems fairly straightforward, but the architecture had a critical flaw.
43+
If a new C++ construct needed to be supported, it would be visited and serialized, but then support would have to be added to each backend individually.
44+
If you wanted to serialize something in YAML, you'd have to implement the Markdown logic separately.
45+
This placed a very high maintenance cost for extending basic functionality, even if you just wanted to add something simple.
46+
It also easily led to generator disparity; a construct might be serialized in YAML, but not in Markdown.
47+
48+
Testing was also in an awkward spot because it was unclear what format would be used to verify if the documentation output was acceptable.
49+
YAML was the initial candidate for this, but my mentors had started to consider JSON instead.
50+
Feature parity was far apart; some backends were tested for certain attributes that others didn't have.
51+
52+
## The Good: Mustache
53+
54+
Last year's GSoC brought in great improvements that became the basis of my summer.
55+
First, last year's GSoC contributor landed a large performance improvement.
56+
I might not have been able to test Clang-Doc on Clang itself without it.
57+
58+
Another contribution that was essential to my summer is the [Mustache template implementation](https://mustache.github.io/) in LLVM.
59+
Mustache templates allow Clang-Doc to shift away from manually generating HTML tags and eliminate high maintenance burdens.
60+
Templates could also solve the feature parity problem by using JSON to feed templates.
61+
62+
63+
# Building a JSON Backend
64+
65+
While familiarizing myself with the codebase during the Community Bonding Period, I quickly determined that implementing a JSON backend would be incredibly beneficial to the project and my summer plans.
66+
A JSON backend presented two immediate benefits:
67+
68+
1. We could use it to feed HTML Mustache templates and future template usage.
69+
2. As the main feeder format, testing can be focused on the JSON output.
70+
71+
The existing Mustache backend in Clang-Doc already contained logic to create JSON documents, but they were immediately discarded when the templates were rendered.
72+
I adapted most of the code into a separate generator to output JSON files and was able land it within two weeks.
73+
This ended up accelerating my work because I could implement support for C++ constructs and test them in JSON instead of another format that we would probably be refactoring in the near future.
74+
75+
### Pull Requests
76+
- [add tags to Mustache namespace template](https://github.com/llvm/llvm-project/pull/142045)
77+
- [add namespaces](https://github.com/llvm/llvm-project/pull/142483)
78+
- [removed default label on some switches](https://github.com/llvm/llvm-project/pull/143919)
79+
- [precommit](https://github.com/llvm/llvm-project/pull/144160) and [add support for concepts](https://github.com/llvm/llvm-project/pull/144430)
80+
- [precommit](https://github.com/llvm/llvm-project/pull/145069) and [document global variables](https://github.com/llvm/llvm-project/pull/145070)
81+
- [refactor JSONGenerator array usage](https://github.com/llvm/llvm-project/pull/145595)
82+
- [refactor BitcodeReader::readSubBlock](https://github.com/llvm/llvm-project/pull/145835)
83+
- [serialize isBuiltIn and IsTemplate](https://github.com/llvm/llvm-project/pull/146149)
84+
- [precommit](https://github.com/llvm/llvm-project/pull/146164) and [friends](https://github.com/llvm/llvm-project/pull/146165)
85+
86+
# Comments
87+
88+
## Groups and Order
89+
90+
Comments weren't ordered in Clang-Doc's HTML documentation.
91+
They were just displayed in whatever order they were serialized in, which is the order that they're written in source.
92+
This meant comments would be extremely difficult to read - you don't want to search for another parameter comment after reading the first one, even if they're expected to be written in order in source.
93+
94+
Funnily enough, Mustache made this a little more complicated.
95+
The only logic operation that Mustache has to check if a field exists is an iteration like `{{#Fields}}`, but any header that denotes a comment section would be duplicated.
96+
97+
```html
98+
{{#Fields}}
99+
<h3>Field Header</h3>
100+
{{FieldInfo}}
101+
{{/Fields}}
102+
```
103+
104+
All of the logic to order them needs to be done in the serialization to JSON itself, so I overhauled our comment organization.
105+
Previously, Clang-Doc's comments were organized exactly as in Clang's AST like the following:
106+
107+
- FullComment
108+
- BriefComment
109+
- ParagraphComment
110+
- TextComment
111+
- TextComment
112+
- BriefComment
113+
- ParagraphComment
114+
115+
Everything was unnecessarily nested under a FullComment, and TextComments were also unnecessarily nested.
116+
Every non-verbatim comment's text was held in one ParagraphComment.
117+
Since there was only one, we could reduce some boilerplate by directly mapping to the array of TextComments.
118+
119+
After the change, Clang-Doc's comments were structured like this:
120+
121+
- BriefComments
122+
- TextCommentArray
123+
- TextCommentArray
124+
- ParagraphComments
125+
- TextCommentArray
126+
127+
Now, we can just iterate over every type of comment, which means iterating over every array.
128+
This left our JSON documentation with a few more fields, since one is needed for every Doxygen command, but with easier identification of what comments exist in the documentation.
129+
130+
131+
### Pull Requests
132+
- [add namespace references to VarInfo](https://github.com/llvm/llvm-project/pull/146964)
133+
- [Serialize record files with mangled name](https://github.com/llvm/llvm-project/pull/148021)
134+
- [fix ASan complaints from passing RepositoryURL as reference](https://github.com/llvm/llvm-project/pull/148923)
135+
- [refactor JSON for better Mustache compatibility](https://github.com/llvm/llvm-project/pull/149588)
136+
- [integrate JSON as the source for Mustache templates](https://github.com/llvm/llvm-project/pull/149589)
137+
- [separate comments into categories](https://github.com/llvm/llvm-project/pull/149590)
138+
- [enable comments in class templates](https://github.com/llvm/llvm-project/pull/149848)
139+
- [remove nesting of text comments inside paragraphs](https://github.com/llvm/llvm-project/pull/150451)
140+
- [generate comments for functions](https://github.com/llvm/llvm-project/pull/150570)
141+
- [add param comments to comment template](https://github.com/llvm/llvm-project/pull/150571)
142+
143+
# Markdown
144+
Markdown was the most speculative aspect of the project.
145+
It wasn't clear whether we'd try to integrate a solution into Clang itself or whether we'd keep it in clang-tools-extra.
146+
147+
## A JavaScript Solution
148+
The first option I explored was suggested by my mentor, which was a Javascript library called [Markdown-Tag](https://github.com/MarketingPipeline/Markdown-Tag)
149+
This would've been really convenient since all it requires is an HTML tag to enable rendering, so any comment text in a template can be easily rendered.
150+
Unfortunately, it requires all HTML to be sanitized, which defeats the purpose of a ready-made solution for us.
151+
We would have to parse any potential HTML in comments anyways.
152+
153+
## A Parser Solution
154+
Without an out-of-the-box solution, we were left with implementing our own parser.
155+
When I considered this in my proposal, I knew an in-tree parser would want to conform to the simplest possible standard.
156+
Markdown has no official standard, so I opted for CommonMark conformance.
157+
158+
The summer ended without a complete solution since the a couple weeks were spent researching whether or not this could be integrated directly in the Clang comment parser or whether we'd need to build our own solution or not.
159+
You can see my initial draft [here](https://github.com/llvm/llvm-project/pull/155887).
160+
161+
# Overview
162+
I implemented a new JSON generator for Clang-Doc that will serve as the basis for documentation generation.
163+
This will vastly reduce overall lines of code and maintenance burdens.
164+
I refactored our comment handling to streamline the logic that handles them and for better output in the HTML.
165+
I also explored options for rendering Markdown and began an implenetation for a parser that I plan on working on in the future.
166+
Along the way, I also did some refactoring to improve code reuse and improve contributor burden by reducing boilerplate code.
167+
168+
Over the summer, I addressed these issues:
169+
- [template operator T() produces a bad name](https://github.com/llvm/llvm-project/issues/59812)
170+
- [Add a JSON backend to clang-doc to better leverage mustache templates](https://github.com/llvm/llvm-project/issues/140094)
171+
- [Reconsider use of enum InfoType::IT_default](https://github.com/llvm/llvm-project/issues/142888)
172+
- [Add a JSON backend to clang-doc to better leverage mustache templates](https://github.com/llvm/llvm-project/issues/140094)
173+
174+
# Future Work
175+
176+
## Doxygen Grouping
177+
178+
Doxygen has a very useful [grouping](https://www.doxygen.nl/manual/grouping.html) feature that allows structures to be grouped under a custom heading or on separate pages.
179+
You can see it in [llvm::sys::path](https://llvm.org/doxygen/namespacellvm_1_1sys_1_1path.html).
180+
We [opened up an issue](https://github.com/llvm/llvm-project/issues/151184#issuecomment-3133596874) for Clang to track this issue, which ended up being a duplicate of [this issue](https://github.com/llvm/llvm-project/issues/123582).
181+
182+
There would most likely have to be some major changes to Clang's comment parsing and Clang's own parsing.
183+
That's because a lot of the group opening tokens in Clang are free-floating, like so:
184+
185+
```cpp
186+
/// @{
187+
188+
class Foo {};
189+
```
190+
191+
That `@{` doesn't attach to a Decl; only comments directly above a declaration are attached to a Decl in the AST.
192+
My mentors wisely advised that this would be too much to even consider this summer, and could probably be its own GSoC project.
193+
194+
## Cross-referencing
195+
196+
In Doxygen you can use the `@copydoc` command to copy the documentation from one entity to another.
197+
Doxygen also displays where an entity is referenced, like where a function is invoked.
198+
Clang-Doc currently has no support for this kind of behavior.
199+
200+
Clang-Doc would need a preprocessing step where any reference to another entity is identified and then resolved somewhere.
201+
One of my mentors pointed out that it would be great to do during the reduction step where every Info is being visited anyways.
202+
This actually wasn't something I had even considered in my proposal besides identifying that `@copydoc` wasn't supported by the comment parser.
203+
It's a common feature of modern documentation, so hopefully someday soon Clang-Doc can acquire it.
204+
205+
# Acknowledgements
206+
Thank you very much to my mentors Paul Kirth and Petr Hosek for guiding me and advising me in this project.
207+
I learned so much from review feedback and our conversations.

0 commit comments

Comments
 (0)