-
Notifications
You must be signed in to change notification settings - Fork 15.2k
Reland: [clang][test] add testing for the AST matcher reference #112168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reland: [clang][test] add testing for the AST matcher reference #112168
Conversation
|
@llvm/pr-subscribers-clang Author: Julian Schmidt (5chmidti) ChangesProblem StatementPreviously, the examples in the AST matcher reference, which gets generated by the Doxygen comments in SolutionThis patch introduces a simple DSL around Doxygen commands to enable testing the AST matcher documentation in a way that should be relatively easy to use.
This patch rewrites/extends the documentation such that all matchers have a documented example. The current statistics emitted by the parser are: The tests are generated during building, and the script will only print something if it found an issue with the specified tests (e.g., missing tests). DescriptionDSL for generating the tests from documentation. TLDR: The above block can be repeated inside a Doxygen command for multiple code examples for a single matcher. Language Grammar[] denotes an optional, and <> denotes user-input Language Standard VersionsThe 'std' tag and '\compile_args' support specifying a specific language version, a whole language and all of its versions, and thresholds (implies ranges). Multiple arguments are passed with a ',' separator. For a language and version to execute a tested matcher, it has to match the specified '\compile_args' for the code, and the 'std' tag for the matcher. Predicates for the 'std' compiler flag are used with disjunction between languages (e.g. 'c || c++') and conjunction for all predicates specific to each language (e.g. 'c++11-or-later && c++23-or-earlier'). Examples:
Tags
|
|
The last five commits are the cumulative changes for fixing the Buildbots. I will check in with the Buildbot owner to see if the previous issue has been solved by 4a85fa3 |
Previously, the examples in the AST matcher reference, which gets
generated by the doxygen comments in `ASTMatchers.h`, were untested
and best effort.
Some of the matchers had no or wrong examples of how to use the matcher.
This patch introduces a simple DSL around doxygen commands to enable
testing the AST matcher documentation in a way that should be relatively
easy.
In `ASTMatchers.h`, most matchers are documented with a doxygen comment.
Most of these also have a code example that aims to show what the
matcher will match, given a matcher somewhere in the documentation text.
The way that testing the documentation is done, is by using doxygens
alias feature to declare custom aliases. These aliases forward to
`<tt>text</tt>` (which is what doxygens \c does, but for multiple words).
Using the doxygen aliases was the obvious choice, because there are
(now) four consumers:
- people reading the header/using signature help
- the doxygen generated documentation
- the generated html AST matcher reference
- (new) the generated matcher tests
This patch rewrites/extends the documentation such that all matchers
have a documented example.
The new `generate_ast_matcher_doc_tests.py` script will warn on any
undocumented matchers (but not on matchers without a doxygen comment)
and provides diagnostics and statistics about the matchers.
Below is a file-level comment from the test generation script that
describes how documenting matchers to be tested works on a slightly more
technical level. In general, the new comments can be used as a reference
for how to implement a tested documentation.
The current statistics emitted by the parser are:
```text
Statistics:
doxygen_blocks : 519
missing_tests : 10
skipped_objc : 42
code_snippets : 503
matches : 820
matchers : 580
tested_matchers : 574
none_type_matchers : 6
```
The tests are generated during building and the script will only print
something if it found an issue (compile failure, parsing issues,
the expected and actual number of failures differs).
DSL for generating the tests from documentation.
TLDR:
The order for a single code snippet example is:
\header{a.h}
\endheader <- zero or more header
\code
int a = 42;
\endcode
\compile_args{-std=c++,c23-or-later} <- optional, supports std ranges and
whole languages
\matcher{expr()} <- one or more matchers in succession
\match{42} <- one ore more matches in succession
\matcher{varDecl()} <- new matcher resets the context, the above
\match will not count for this new
matcher(-group)
\match{int a = 42} <- only applies to the previous matcher (no the
previous case)
The above block can be repeated inside of a doxygen command for multiple
code examples.
Language Grammar:
[] denotes an optional, and <> denotes user-input
compile_args j:= \compile_args{[<compile_arg>;]<compile_arg>}
matcher_tag_key ::= type
match_tag_key ::= type || std || count
matcher_tags ::= [matcher_tag_key=<value>;]matcher_tag_key=<value>
match_tags ::= [match_tag_key=<value>;]match_tag_key=<value>
matcher ::= \matcher{[matcher_tags$]<matcher>}
matchers ::= [matcher] matcher
match ::= \match{[match_tags$]<match>}
matches ::= [match] match
case ::= matchers matches
cases ::= [case] case
header-block ::= \header{<name>} <code> \endheader
code-block ::= \code <code> \endcode
testcase ::= code-block [compile_args] cases
The 'std' tag and '\compile_args' support specifying a specific
language version, a whole language and all of it's versions, and thresholds
(implies ranges). Multiple arguments are passed with a ',' seperator.
For a language and version to execute a tested matcher, it has to match
the specified '\compile_args' for the code, and the 'std' tag for the matcher.
Predicates for the 'std' compiler flag are used with disjunction between
languages (e.g. 'c || c++') and conjunction for all predicates specific
to each language (e.g. 'c++11-or-later && c++23-or-earlier').
Examples:
- c all available versions of C
- c++11 only C++11
- c++11-or-later C++11 or later
- c++11-or-earlier C++11 or earlier
- c++11-or-later,c++23-or-earlier,c all of C and C++ between 11 and
23 (inclusive)
- c++11-23,c same as above
Tags:
Type:
Match types are used to select where the string that is used to check if
a node matches comes from.
Available: code, name, typestr, typeofstr.
The default is 'code'.
Matcher types are used to mark matchers as submatchers with 'sub' or as
deactivated using 'none'. Testing submatchers is not implemented.
Count:
Specifying a 'count=n' on a match will result in a test that requires that
the specified match will be matched n times. Default is 1.
Std:
A match allows specifying if it matches only in specific language versions.
This may be needed when the AST differs between language versions.
Fixes llvm#57607
Fixes llvm#63748
Fix for the buildbot failure due to lower python versions not supporting some types to be subscripted. Tested with python3.8.
210b60c to
c59ab52
Compare
|
@5chmidti Sorry for the delay. I have tested this and it seems to compile on windows msvc without any regressions. |
Thank you for checking that it works 👍 |
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/161/builds/3227 Here is the relevant piece of the build log for the reference |
Problem Statement
Previously, the examples in the AST matcher reference, which gets generated by the Doxygen comments in
ASTMatchers.h, were untested and best effort.Some of the matchers had no or wrong examples of how to use the matcher.
Solution
This patch introduces a simple DSL around Doxygen commands to enable testing the AST matcher documentation in a way that should be relatively easy to use.
In
ASTMatchers.h, most matchers are documented with a Doxygen comment. Most of these also have a code example that aims to show what the matcher will match, given a matcher somewhere in the documentation text. The way that the documentation is tested, is by using Doxygen's alias feature to declare custom aliases. These aliases forward to<tt>text</tt>(which is what Doxygen's\cdoes, but for multiple words). Using the Doxygen aliases is the obvious choice, because there are (now) four consumers:This patch rewrites/extends the documentation such that all matchers have a documented example.
The new
generate_ast_matcher_doc_tests.pyscript will warn on any undocumented matchers (but not on matchers without a Doxygen comment) and provides diagnostics and statistics about the matchers.The current statistics emitted by the parser are:
The tests are generated during building, and the script will only print something if it found an issue with the specified tests (e.g., missing tests).
Description
DSL for generating the tests from documentation.
TLDR:
The above block can be repeated inside a Doxygen command for multiple code examples for a single matcher.
The test generation script will only look for these annotations and ignore anything else like
\cor the sentences where these annotations are embedded into:The matcher \matcher{expr()} matches the number \match{42}..Language Grammar
[] denotes an optional, and <> denotes user-input
Language Standard Versions
The 'std' tag and '\compile_args' support specifying a specific language version, a whole language and all of its versions, and thresholds (implies ranges). Multiple arguments are passed with a ',' separator. For a language and version to execute a tested matcher, it has to match the specified '\compile_args' for the code, and the 'std' tag for the matcher. Predicates for the 'std' compiler flag are used with disjunction between languages (e.g. 'c || c++') and conjunction for all predicates specific to each language (e.g. 'c++11-or-later && c++23-or-earlier').
Examples:
call available versions of Cc++11only C++11c++11-or-laterC++11 or laterc++11-or-earlierC++11 or earlierc++11-or-later,c++23-or-earlier,call of C and C++ between 11 and23 (inclusive)
c++11-23,csame as aboveTags
type:Match types are used to select where the string that is used to check if a node matches comes from.
Available:
code,name,typestr,typeofstr. The default iscode.code: Forwards totooling::fixit::getText(...)and should be the preferred way to show what matches.name: Casts the match to aNamedDecland returns the result ofgetNameAsString. Useful when the matched AST node is not easy to spell out (codetype), e.g., namespaces or classes with many members.typestr: Returns the result ofQualType::getAsStringfor the type derived fromType(otherwise, if it is derived fromDecl, recurses withNode->getTypeForDecl())Matcher types are used to mark matchers as sub-matcher with 'sub' or as deactivated using 'none'. Testing sub-matcher is not implemented.
count:Specifying a 'count=n' on a match will result in a test that requires that the specified match will be matched n times. Default is 1.
std:A match allows specifying if it matches only in specific language versions. This may be needed when the AST differs between language versions.
sub:The
subtag on a\matchwill indicate that the match is for a node of a bound sub-matcher.E.g.,
\matcher{expr(expr().bind("inner"))}has a sub-matcher that binds toinner, which is the value for thesubtag of the expected match for the sub-matcher\match{sub=inner$...}. Currently, sub-matchers are not tested in any way.What if ...?
... I want to add a matcher?
Add a Doxygen comment to the matcher with a code example, corresponding matchers and matches, that shows what the matcher is supposed to do. Specify the compile arguments/supported languages if required, and run
ninja check-clang-unitto test the documentation.... the example I wrote is wrong?
The test-failure output of the generated test file will provide information about
ASTMatcher.hthe example is fromtypes-targetflag (also in failure summary)... I don't adhere to the required order of the syntax?
The script will diagnose any found issues, such as
matcher is missing an examplewith afile:line:prefix,which should provide enough information about the issue.
... the script diagnoses a false-positive issue with a Doxygen comment?
It hopefully shouldn't, but if you, e.g., added some non-matcher code and documented it with Doxygen, then the script will consider that as a matcher documentation. As a result, the script will print that it detected a mismatch between the actual and the expected number of failures. If the diagnostic truly is a false-positive, change the
expected_failure_statisticsat the top of thegenerate_ast_matcher_doc_tests.pyfile.Fixes #57607
Fixes #63748