Skip to content

Commit 4a12a2b

Browse files
PeterChou1asl
authored andcommitted
address feedback for blog
1 parent 65c50b7 commit 4a12a2b

File tree

1 file changed

+26
-24
lines changed

1 file changed

+26
-24
lines changed

content/posts/2024-12-04-improve-clang-doc.md

Lines changed: 26 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,11 @@
11
---
22
author: "Peter Chou"
33
date: "2024-12-04"
4-
tags: ["GSoC", "clang-doc"]
4+
tags: ["GSoC", "Clang-Doc"]
55
title: "GSoC 2024: Improve Clang Doc"
66
---
77

8-
Hi, my name is Peter, and this year I was involved in Google Summer of Code 2024. I worked on [improving the clang-doc documenation generator](https://discourse.llvm.org/t/improve-clang-doc-usability/76996)
8+
Hi, my name is Peter, and this year I was involved in Google Summer of Code 2024. I worked on [improving the Clang-Doc documenation generator](https://discourse.llvm.org/t/improve-Clang-Doc-usability/76996)
99

1010
Mentors: Petr Hosek and Paul Kirth
1111

@@ -17,19 +17,23 @@ however it has since stalled. Currently, the tool can generate HTML, YAML, and m
1717

1818
## Work Done
1919

20-
The original scope of the project was to improve the output of clang-doc's generation. However during testing the tool was significantly slower than expected which made developing features for the tool impossible.
21-
Each compilation of the LLVM codebase was taking upwards of 10 hours on my local machine. Additionally the tool utilized a lot of memory and was prone to crashing with an out of memory error. Similar tools such as Doxygen and Hdoc ran in comparatively less time for the same codebase. This pointed to a significant bottleneck within clang-doc’s codepath when generating large scale software projects. Due to this the project scope quickly changed to improving the runtime of clang-doc so that it could run much faster. It was only during the latter half of the project did the scope change back to improving clang-doc’s generation.
20+
The original scope of the project was to improve the output of Clang-Doc's generation. However during testing the tool was significantly slower than expected which made developing features for the tool impossible.
21+
Each compilation of the LLVM codebase was taking upwards of 10 hours on my local machine. Additionally the tool utilized a lot of memory and was prone to crashing with an out of memory error. Similar tools such as Doxygen and Hdoc ran in comparatively less time for the same codebase. This pointed to a significant bottleneck within Clang-Doc’s codepath when generating large scale software projects. Due to this the project scope quickly changed to improving the runtime of Clang-Doc so that it could run much faster. It was only during the latter half of the project did the scope change back to improving Clang-Doc’s generation.
2222

2323
### Added More Test Cases to Clang-Doc test suite
2424

2525

26-
Clang-doc previously had test’s which did not test the full scope of the tool especially with regards to the HTML or Markdown output of the tool. In order to make sure the optimization experiments were not accidentally breaking the tool, it was necessary to add more end to end tests.
27-
In summary I added four comprehensive tests which covered all features that we were not testing such as testing the generation for Enums, Namespace and Records for HTML and Markdown.
26+
Clang-Doc previously had tests which did not test the full scope of the the HTML or Markdown output. I added more end-to-end tests to make sure that in the process of optimizing documentation generation we were not degrading the quality or functionality of the tool.
2827

28+
In summary, I added four comprehensive tests which covered all features that we were not testing such as testing the generation for Enums, Namespace and Records for HTML and Markdown.
2929

30-
### Improve clang-doc’s performance by 1.58 times
30+
### Improve Clang-Doc’s performance by 1.58 times
3131

32-
Internally the way clang-doc works is by leveraging libtooling's ASTVisitor class to parse the declaration in each project and serializing it into an internal format, which gets deserialized later when we output the final format. The bottleneck in clang-doc lied in the way clang-doc was doing redundant work when it was visiting each declaration.
32+
Internally, the way Clang-Doc works is by leveraging libtooling's ASTVisitor class to parse the source level declarations in each TU and serializing it into an internal format which gets deserialized later when we output the final format.
33+
34+
Many experiments were conducted in order to identified the source of the bottleneck, initially I leverage windows prolifer however that was not fined grained enough to identified the true source of the
35+
36+
Eventually, we were able to identify a major bottleneck in Clang-Doc's performance to doing redundant work when it was processing each declaration. We settled on a caching/memoization strategy to minimize the redundant work.
3337

3438
For example if we had a the following project:
3539

@@ -52,39 +56,39 @@ class A : public Base {}
5256
class B : public Base {}
5357
```
5458
55-
In this case the ASTVisitor invoke by clang-doc would visit serialized the Base class three times, once when it is parsing Base.cpp, another when its visiting A.cpp then B.cpp. This means any C++ project that heavily leverages inheritance would result in a lot of redundant work.
59+
In this case, the ASTVisitor invoked by Clang-Doc would visit the serialized Base class three times, once when it is parsing Base.cpp, another when its visiting A.cpp then B.cpp. This means any C++ project that heavily leverages inheritance would result in a lot of redundant work.
5660
57-
The optimization ended up being a simple memoization dictionary which kept track of a list of declaration that clang-doc had visited.
61+
The optimization ended up being a simple memoization dictionary which kept track of a list of declaration that Clang-Doc had visited.
5862
5963
Here is a plot of the benchmarking numbers:
6064
6165
<div style="margin:0 auto;">
62-
<img src="/img/clang-doc-benchmark-numbers.png"><br/>
66+
<img src="/img/Clang-Doc-benchmark-numbers.png"><br/>
6367
</div>
6468
6569
6670
### Added Template Mustache HTML Backend
6771
68-
Clang-doc originally used an ad-hoc method of generating HTML, in this project I introduced a templating language as a way of reducing project complexity and reducing the ease of development. Two RFCs were made before arriving at the idea of introducing Mustache as a library. Originally the idea was to introduce a custom templating language, however upon further discussion it was decided that the complexity of designing and implementing a new templating language was too much.
69-
A LLVM community member suggested I implement a mustache as templating language.
70-
Mustache was the ideal templating language, since it was very simple to implement, and has a well defined spec that fit what was needed for clang-doc’s use case. The feedback to the RFC was generally positive however there was some a bit of pushback in regards to adding an HTML support library to LLVM.
71-
In terms of engineering wins, this library was able to cut the direct down on HTML backend significantly dropping 500 lines of code. This library was also designed for general purpose use around LLVM, since there are numerous places in LLVM where various tools generate html in its own way. Using the mustache templating library would be a nice way to standardize the codebase.
72+
Clang-Doc originally used an ad-hoc method of generating HTML. I introduced a templating language as a way of reducing project complexity and reducing the ease of development. Two RFCs were made before arriving at the idea of introducing Mustache as a library. Originally the idea was to introduce a custom templating language, however upon further discussion it was decided that the complexity of designing and implementing a new templating language was too much.
73+
A LLVM community member suggested using Mustache as templating language.
74+
Mustache was the ideal choice since it was very simple to implement, and has a well defined spec that fit what was needed for Clang-Doc’s use case. The feedback on the RFC was generally positive. While there was some resistance regarding the inclusion of an HTML support library in LLVM, this concern stemmed partly from a lack of awareness that HTML generation already occurs in several parts of LLVM. Additionally, the introduction of Mustache has the potential to simplify other HTML-related use cases.
75+
In terms of engineering wins, this library was able to cut the direct down on HTML backend significantly dropping 500 lines of code compared to the original Clang-Doc HTML backend. This library was also designed for general purpose use around LLVM, since there are numerous places in LLVM where various tools generate html in its own way. Using the Mustache templating library would be a nice way to standardize the codebase.
7276
7377
### Improve Clang-Doc HTML Output
7478
75-
The previous version of clang-doc’s output was a pretty minimal bare bones implementation. It had a sidebar that contained every single declaration within the project which created a large unnavigable UI. Typedef documentation was missing, plus method documentation was missing details such as whether or not the method was a const or virtual. There was no linking between other declarations in the project and there was no syntax highlighting on any language construct.
79+
The previous version of Clang-Doc’s output was a pretty minimal bare bones implementation. It had a sidebar that contained every single declaration within the project which created a large unnavigable UI. Typedef documentation was missing, plus method documentation was missing details such as whether or not the method was a const or virtual. There was no linking between other declarations in the project and there was no syntax highlighting on any language construct.
7680
77-
With the new mustache changes an additional backend was added using the specifier (--format=mhtml). That addresses these issues.
81+
With the new Mustache changes an additional backend was added using the specifier (--format=mhtml). That addresses these issues.
7882
7983
Below is a comparison of the same output between the two backends
8084
8185
8286
<div style="margin:0 auto;">
83-
<img src="/img/clang-doc-old-html-output.png"><br/>
87+
<img src="/img/Clang-Doc-old-html-output.png"><br/>
8488
</div>
8589
8690
<div style="margin:0 auto;">
87-
<img src="/img/clang-doc-new-output.png"><br/>
91+
<img src="/img/Clang-Doc-new-output.png"><br/>
8892
</div>
8993
9094
You can also visit the output project on my personal github.io page link
@@ -94,18 +98,16 @@ Note: this output is still a work in progress.
9498
9599
## Learning Insight
96100
97-
I've learned a lot in the past few months, thanks to GSOC I now have a much better idea of what it’s like to participate in a large open source project. I received a lot of feedback through PR’s, making RFC and collaborating with other GSOC members. I’d learned a lot about how to interact with the community and solicit feedback. I also learned a lot about instrumentation/profiling code having conducted many experiments in order to try to speed clang-doc up.
101+
I've learned a lot in the past few months, thanks to GSOC I now have a much better idea of what it’s like to participate in a large open source project. I received a lot of feedback through PR’s, making RFC and collaborating with other GSOC members. I’d learned a lot about how to interact with the community and solicit feedback. I also learned a lot about instrumentation/profiling code having conducted many experiments in order to try to speed Clang-Doc up.
98102
99103
## Future Work
100104
101-
I plan to work on clang-doc until an MVP product can be generated and evaluated for the LLVM project. My remaining tasks include landing the mustache support library and clang-doc’s mustache backend, as well as gathering feedback from the LLVM community regarding clang-doc’s current output. Additionally, I intend to add test cases for the mustache HTML backend to ensure its robustness and functionality.
102-
103-
105+
As my work concluded I was named as one of the maintainers of the project. In the future I plan to work on Clang-Doc until an MVP product can be generated and evaluated for the LLVM project. My remaining tasks include landing the Mustache support library and Clang-Doc’s Mustache backend, as well as gathering feedback from the LLVM community regarding Clang-Doc’s current output. Additionally, I intend to add test cases for the Mustache HTML backend to ensure its robustness and functionality.
104106
105107
106108
## Conclusion
107109
108-
Overall the current state of clang-doc is much healthier than it was before. It now has much better test coverage across all its output, markdown, html, yaml. Whereas previously there were no e2e test cases that were not as comprehensive. The tool is significantly faster especially for large scale projects like LLVM making documentation generation and development a much better experience.
110+
Overall the current state of Clang-Doc is much healthier than it was before. It now has much better test coverage across all its output, markdown, html, yaml. Whereas previously there were no e2e test cases that were not as comprehensive. The tool is significantly faster especially for large scale projects like LLVM making documentation generation and development a much better experience.
109111
The tool also has a simplified HTML backend that will be much easier to work with compared to before leading to a faster velocity for development.
110112
111113

0 commit comments

Comments
 (0)