Reland: [clang][test] add testing for the AST matcher reference #112168

5chmidti · 2024-10-14T08:34:34Z

Problem Statement

Previously, the examples in the AST matcher reference, which gets generated by the Doxygen comments in ASTMatchers.h, were untested and best effort.
Some of the matchers had no or wrong examples of how to use the matcher.

Solution

This patch introduces a simple DSL around Doxygen commands to enable testing the AST matcher documentation in a way that should be relatively easy to use.
In ASTMatchers.h, most matchers are documented with a Doxygen comment. Most of these also have a code example that aims to show what the matcher will match, given a matcher somewhere in the documentation text. The way that the documentation is tested, is by using Doxygen's alias feature to declare custom aliases. These aliases forward to <tt>text</tt> (which is what Doxygen's \c does, but for multiple words). Using the Doxygen aliases is the obvious choice, because there are (now) four consumers:

people reading the header/using signature help
the Doxygen generated documentation
the generated HTML AST matcher reference
(new) the generated matcher tests

This patch rewrites/extends the documentation such that all matchers have a documented example.
The new generate_ast_matcher_doc_tests.py script will warn on any undocumented matchers (but not on matchers without a Doxygen comment) and provides diagnostics and statistics about the matchers.

The current statistics emitted by the parser are:

Statistics:
        doxygen_blocks                :   519
        missing_tests                 :    10
        skipped_objc                  :    42
        code_snippets                 :   503
        matches                       :   820
        matchers                      :   580
        tested_matchers               :   574
        none_type_matchers            :     6

The tests are generated during building, and the script will only print something if it found an issue with the specified tests (e.g., missing tests).

Description

DSL for generating the tests from documentation.

TLDR:

  \header{a.h}
  \endheader     <- zero or more header

  \code
    int a = 42;
  \endcode
  \compile_args{-std=c++,c23-or-later} <- optional, the std flag supports std ranges and
                                          whole languages

  \matcher{expr()} <- one or more matchers in succession
  \match{42}   <- one or more matches in succession

  \matcher{varDecl()} <- new matcher resets the context, the above
                         \match will not count for this new
                         matcher(-group)
  \match{int a  = 42} <- only applies to the previous matcher (not to the
                         previous case)

The above block can be repeated inside a Doxygen command for multiple code examples for a single matcher.
The test generation script will only look for these annotations and ignore anything else like \c or the sentences where these annotations are embedded into: The matcher \matcher{expr()} matches the number \match{42}..

Language Grammar

[] denotes an optional, and <> denotes user-input

  compile_args j:= \compile_args{[<compile_arg>;]<compile_arg>}
  matcher_tag_key ::= type
  match_tag_key ::= type || std || count || sub
  matcher_tags ::= [matcher_tag_key=<value>;]matcher_tag_key=<value>
  match_tags ::= [match_tag_key=<value>;]match_tag_key=<value>
  matcher ::= \matcher{[matcher_tags$]<matcher>}
  matchers ::= [matcher] matcher
  match ::= \match{[match_tags$]<match>}
  matches ::= [match] match
  case ::= matchers matches
  cases ::= [case] case
  header-block ::= \header{<name>} <code> \endheader
  code-block ::= \code <code> \endcode
  testcase ::= code-block [compile_args] cases

Language Standard Versions

The 'std' tag and '\compile_args' support specifying a specific language version, a whole language and all of its versions, and thresholds (implies ranges). Multiple arguments are passed with a ',' separator. For a language and version to execute a tested matcher, it has to match the specified '\compile_args' for the code, and the 'std' tag for the matcher. Predicates for the 'std' compiler flag are used with disjunction between languages (e.g. 'c || c++') and conjunction for all predicates specific to each language (e.g. 'c++11-or-later && c++23-or-earlier').

Examples:

c all available versions of C
c++11 only C++11
c++11-or-later C++11 or later
c++11-or-earlier C++11 or earlier
c++11-or-later,c++23-or-earlier,c all of C and C++ between 11 and
23 (inclusive)
c++11-23,c same as above

What if ...?

... I want to add a matcher?

Add a Doxygen comment to the matcher with a code example, corresponding matchers and matches, that shows what the matcher is supposed to do. Specify the compile arguments/supported languages if required, and run ninja check-clang-unit to test the documentation.

... the example I wrote is wrong?

The test-failure output of the generated test file will provide information about

where the generated test file is located
which line in ASTMatcher.h the example is from
which matches were: found, not-(yet)-found, expected
in case of an unexpected match: what the node looks like using the different types
the language version and if the test ran with a windows -target flag (also in failure summary)

... I don't adhere to the required order of the syntax?

The script will diagnose any found issues, such as matcher is missing an example with a file:line: prefix,
which should provide enough information about the issue.

... the script diagnoses a false-positive issue with a Doxygen comment?

It hopefully shouldn't, but if you, e.g., added some non-matcher code and documented it with Doxygen, then the script will consider that as a matcher documentation. As a result, the script will print that it detected a mismatch between the actual and the expected number of failures. If the diagnostic truly is a false-positive, change the expected_failure_statistics at the top of the generate_ast_matcher_doc_tests.py file.

Fixes #57607
Fixes #63748

llvmbot · 2024-10-14T08:34:54Z

@llvm/pr-subscribers-clang

Author: Julian Schmidt (5chmidti)

Changes

Problem Statement

Previously, the examples in the AST matcher reference, which gets generated by the Doxygen comments in ASTMatchers.h, were untested and best effort.
Some of the matchers had no or wrong examples of how to use the matcher.

Solution

This patch introduces a simple DSL around Doxygen commands to enable testing the AST matcher documentation in a way that should be relatively easy to use.
In ASTMatchers.h, most matchers are documented with a Doxygen comment. Most of these also have a code example that aims to show what the matcher will match, given a matcher somewhere in the documentation text. The way that the documentation is tested, is by using Doxygen's alias feature to declare custom aliases. These aliases forward to <tt>text</tt> (which is what Doxygen's \c does, but for multiple words). Using the Doxygen aliases is the obvious choice, because there are (now) four consumers:

people reading the header/using signature help
the Doxygen generated documentation
the generated HTML AST matcher reference
(new) the generated matcher tests

This patch rewrites/extends the documentation such that all matchers have a documented example.
The new generate_ast_matcher_doc_tests.py script will warn on any undocumented matchers (but not on matchers without a Doxygen comment) and provides diagnostics and statistics about the matchers.

The current statistics emitted by the parser are:

Statistics:
        doxygen_blocks                :   519
        missing_tests                 :    10
        skipped_objc                  :    42
        code_snippets                 :   503
        matches                       :   820
        matchers                      :   580
        tested_matchers               :   574
        none_type_matchers            :     6

The tests are generated during building, and the script will only print something if it found an issue with the specified tests (e.g., missing tests).

Description

DSL for generating the tests from documentation.

TLDR:

  \header{a.h}
  \endheader     &lt;- zero or more header

  \code
    int a = 42;
  \endcode
  \compile_args{-std=c++,c23-or-later} &lt;- optional, the std flag supports std ranges and
                                          whole languages

  \matcher{expr()} &lt;- one or more matchers in succession
  \match{42}   &lt;- one or more matches in succession

  \matcher{varDecl()} &lt;- new matcher resets the context, the above
                         \match will not count for this new
                         matcher(-group)
  \match{int a  = 42} &lt;- only applies to the previous matcher (not to the
                         previous case)

The above block can be repeated inside a Doxygen command for multiple code examples for a single matcher.
The test generation script will only look for these annotations and ignore anything else like \c or the sentences where these annotations are embedded into: The matcher \matcher{expr()} matches the number \match{42}..

Language Grammar

[] denotes an optional, and <> denotes user-input

  compile_args j:= \compile_args{[&lt;compile_arg&gt;;]&lt;compile_arg&gt;}
  matcher_tag_key ::= type
  match_tag_key ::= type || std || count || sub
  matcher_tags ::= [matcher_tag_key=&lt;value&gt;;]matcher_tag_key=&lt;value&gt;
  match_tags ::= [match_tag_key=&lt;value&gt;;]match_tag_key=&lt;value&gt;
  matcher ::= \matcher{[matcher_tags$]&lt;matcher&gt;}
  matchers ::= [matcher] matcher
  match ::= \match{[match_tags$]&lt;match&gt;}
  matches ::= [match] match
  case ::= matchers matches
  cases ::= [case] case
  header-block ::= \header{&lt;name&gt;} &lt;code&gt; \endheader
  code-block ::= \code &lt;code&gt; \endcode
  testcase ::= code-block [compile_args] cases

Language Standard Versions

The 'std' tag and '\compile_args' support specifying a specific language version, a whole language and all of its versions, and thresholds (implies ranges). Multiple arguments are passed with a ',' separator. For a language and version to execute a tested matcher, it has to match the specified '\compile_args' for the code, and the 'std' tag for the matcher. Predicates for the 'std' compiler flag are used with disjunction between languages (e.g. 'c || c++') and conjunction for all predicates specific to each language (e.g. 'c++11-or-later && c++23-or-earlier').

Examples:

c all available versions of C
c++11 only C++11
c++11-or-later C++11 or later
c++11-or-earlier C++11 or earlier
c++11-or-later,c++23-or-earlier,c all of C and C++ between 11 and
23 (inclusive)
c++11-23,c same as above

What if ...?

... I want to add a matcher?

Add a Doxygen comment to the matcher with a code example, corresponding matchers and matches, that shows what the matcher is supposed to do. Specify the compile arguments/supported languages if required, and run ninja check-clang-unit to test the documentation.

... the example I wrote is wrong?

The test-failure output of the generated test file will provide information about

where the generated test file is located
which line in ASTMatcher.h the example is from
which matches were: found, not-(yet)-found, expected
in case of an unexpected match: what the node looks like using the different types
the language version and if the test ran with a windows -target flag (also in failure summary)

... I don't adhere to the required order of the syntax?

The script will diagnose any found issues, such as matcher is missing an example with a file:line: prefix,
which should provide enough information about the issue.

... the script diagnoses a false-positive issue with a Doxygen comment?

It hopefully shouldn't, but if you, e.g., added some non-matcher code and documented it with Doxygen, then the script will consider that as a matcher documentation. As a result, the script will print that it detected a mismatch between the actual and the expected number of failures. If the diagnostic truly is a false-positive, change the expected_failure_statistics at the top of the generate_ast_matcher_doc_tests.py file.

Fixes #57607
Fixes #63748

Patch is 906.88 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/112168.diff

8 Files Affected:

(modified) clang/docs/LibASTMatchersReference.html (+5679-2272)
(modified) clang/docs/ReleaseNotes.rst (+3)
(modified) clang/docs/doxygen.cfg.in (+8-1)
(modified) clang/docs/tools/dump_ast_matchers.py (+63-5)
(modified) clang/include/clang/ASTMatchers/ASTMatchers.h (+4117-1664)
(modified) clang/unittests/ASTMatchers/ASTMatchersTest.h (+427-3)
(modified) clang/unittests/ASTMatchers/CMakeLists.txt (+15)
(added) clang/utils/generate_ast_matcher_doc_tests.py (+1097)

diff --git a/clang/docs/LibASTMatchersReference.html b/clang/docs/LibASTMatchersReference.html
index a16b9c44ef0eab..baf39befd796a5 100644
--- a/clang/docs/LibASTMatchersReference.html
+++ b/clang/docs/LibASTMatchersReference.html
@@ -586,28 +586,36 @@ <h2 id="decl-matchers">Node Matchers</h2>
 
   #pragma omp declare simd
   int min();
-attr()
-  matches "nodiscard", "nonnull", "noinline", and the whole "#pragma" line.
+
+The matcher attr()
+matches nodiscard, nonnull, noinline, and
+declare simd.
 </pre></td></tr>
 
 
 <tr><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1CXXBaseSpecifier.html">CXXBaseSpecifier</a>&gt;</td><td class="name" onclick="toggle('cxxBaseSpecifier0')"><a name="cxxBaseSpecifier0Anchor">cxxBaseSpecifier</a></td><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1CXXBaseSpecifier.html">CXXBaseSpecifier</a>&gt;...</td></tr>
 <tr><td colspan="4" class="doc" id="cxxBaseSpecifier0"><pre>Matches class bases.
 
-Examples matches public virtual B.
+Given
   class B {};
   class C : public virtual B {};
+
+The matcher cxxRecordDecl(hasDirectBase(cxxBaseSpecifier()))
+matches C.
 </pre></td></tr>
 
 
 <tr><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1CXXCtorInitializer.html">CXXCtorInitializer</a>&gt;</td><td class="name" onclick="toggle('cxxCtorInitializer0')"><a name="cxxCtorInitializer0Anchor">cxxCtorInitializer</a></td><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1CXXCtorInitializer.html">CXXCtorInitializer</a>&gt;...</td></tr>
 <tr><td colspan="4" class="doc" id="cxxCtorInitializer0"><pre>Matches constructor initializers.
 
-Examples matches i(42).
+Given
   class C {
     C() : i(42) {}
     int i;
   };
+
+The matcher cxxCtorInitializer()
+matches i(42).
 </pre></td></tr>
 
 
@@ -619,17 +627,22 @@ <h2 id="decl-matchers">Node Matchers</h2>
   public:
     int a;
   };
-accessSpecDecl()
-  matches 'public:'
+
+The matcher accessSpecDecl()
+matches public:.
 </pre></td></tr>
 
 
 <tr><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1Decl.html">Decl</a>&gt;</td><td class="name" onclick="toggle('bindingDecl0')"><a name="bindingDecl0Anchor">bindingDecl</a></td><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1BindingDecl.html">BindingDecl</a>&gt;...</td></tr>
 <tr><td colspan="4" class="doc" id="bindingDecl0"><pre>Matches binding declarations
-Example matches foo and bar
-(matcher = bindingDecl()
 
-  auto [foo, bar] = std::make_pair{42, 42};
+Given
+  struct pair { int x; int y; };
+  pair make(int, int);
+  auto [foo, bar] = make(42, 42);
+
+The matcher bindingDecl()
+matches foo and bar.
 </pre></td></tr>
 
 
@@ -642,14 +655,18 @@ <h2 id="decl-matchers">Node Matchers</h2>
   myFunc(^(int p) {
     printf("%d", p);
   })
+
 </pre></td></tr>
 
 
 <tr><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1Decl.html">Decl</a>&gt;</td><td class="name" onclick="toggle('classTemplateDecl0')"><a name="classTemplateDecl0Anchor">classTemplateDecl</a></td><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1ClassTemplateDecl.html">ClassTemplateDecl</a>&gt;...</td></tr>
 <tr><td colspan="4" class="doc" id="classTemplateDecl0"><pre>Matches C++ class template declarations.
 
-Example matches Z
+Given
   template&lt;class T&gt; class Z {};
+
+The matcher classTemplateDecl()
+matches Z.
 </pre></td></tr>
 
 
@@ -660,13 +677,14 @@ <h2 id="decl-matchers">Node Matchers</h2>
   template&lt;class T1, class T2, int I&gt;
   class A {};
 
-  template&lt;class T, int I&gt;
-  class A&lt;T, T*, I&gt; {};
+  template&lt;class T, int I&gt; class A&lt;T, T*, I&gt; {};
 
   template&lt;&gt;
   class A&lt;int, int, 1&gt; {};
-classTemplatePartialSpecializationDecl()
-  matches the specialization A&lt;T,T*,I&gt; but not A&lt;int,int,1&gt;
+
+The matcher classTemplatePartialSpecializationDecl()
+matches template&lt;class T, int I&gt; class A&lt;T, T*, I&gt; {},
+but does not match A&lt;int, int, 1&gt;.
 </pre></td></tr>
 
 
@@ -677,87 +695,128 @@ <h2 id="decl-matchers">Node Matchers</h2>
   template&lt;typename T&gt; class A {};
   template&lt;&gt; class A&lt;double&gt; {};
   A&lt;int&gt; a;
-classTemplateSpecializationDecl()
-  matches the specializations A&lt;int&gt; and A&lt;double&gt;
+
+The matcher classTemplateSpecializationDecl()
+matches class A&lt;int&gt;
+and class A&lt;double&gt;.
 </pre></td></tr>
 
 
 <tr><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1Decl.html">Decl</a>&gt;</td><td class="name" onclick="toggle('conceptDecl0')"><a name="conceptDecl0Anchor">conceptDecl</a></td><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1ConceptDecl.html">ConceptDecl</a>&gt;...</td></tr>
 <tr><td colspan="4" class="doc" id="conceptDecl0"><pre>Matches concept declarations.
 
-Example matches integral
-  template&lt;typename T&gt;
-  concept integral = std::is_integral_v&lt;T&gt;;
+Given
+  template&lt;typename T&gt; concept my_concept = true;
+
+
+The matcher conceptDecl()
+matches template&lt;typename T&gt;
+concept my_concept = true.
 </pre></td></tr>
 
 
 <tr><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1Decl.html">Decl</a>&gt;</td><td class="name" onclick="toggle('cxxConstructorDecl0')"><a name="cxxConstructorDecl0Anchor">cxxConstructorDecl</a></td><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1CXXConstructorDecl.html">CXXConstructorDecl</a>&gt;...</td></tr>
 <tr><td colspan="4" class="doc" id="cxxConstructorDecl0"><pre>Matches C++ constructor declarations.
 
-Example matches Foo::Foo() and Foo::Foo(int)
+Given
   class Foo {
    public:
     Foo();
     Foo(int);
     int DoSomething();
   };
+
+  struct Bar {};
+
+
+The matcher cxxConstructorDecl()
+matches Foo() and Foo(int).
 </pre></td></tr>
 
 
 <tr><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1Decl.html">Decl</a>&gt;</td><td class="name" onclick="toggle('cxxConversionDecl0')"><a name="cxxConversionDecl0Anchor">cxxConversionDecl</a></td><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1CXXConversionDecl.html">CXXConversionDecl</a>&gt;...</td></tr>
 <tr><td colspan="4" class="doc" id="cxxConversionDecl0"><pre>Matches conversion operator declarations.
 
-Example matches the operator.
+Given
   class X { operator int() const; };
+
+
+The matcher cxxConversionDecl()
+matches operator int() const.
 </pre></td></tr>
 
 
 <tr><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1Decl.html">Decl</a>&gt;</td><td class="name" onclick="toggle('cxxDeductionGuideDecl0')"><a name="cxxDeductionGuideDecl0Anchor">cxxDeductionGuideDecl</a></td><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1CXXDeductionGuideDecl.html">CXXDeductionGuideDecl</a>&gt;...</td></tr>
 <tr><td colspan="4" class="doc" id="cxxDeductionGuideDecl0"><pre>Matches user-defined and implicitly generated deduction guide.
 
-Example matches the deduction guide.
+Given
   template&lt;typename T&gt;
-  class X { X(int) };
+  class X { X(int); };
   X(int) -&gt; X&lt;int&gt;;
+
+
+The matcher cxxDeductionGuideDecl()
+matches the written deduction guide
+auto (int) -&gt; X&lt;int&gt;,
+the implicit copy deduction guide auto (int) -&gt; X&lt;T&gt;
+and the implicitly declared deduction guide
+auto (X&lt;T&gt;) -&gt; X&lt;T&gt;.
 </pre></td></tr>
 
 
 <tr><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1Decl.html">Decl</a>&gt;</td><td class="name" onclick="toggle('cxxDestructorDecl0')"><a name="cxxDestructorDecl0Anchor">cxxDestructorDecl</a></td><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1CXXDestructorDecl.html">CXXDestructorDecl</a>&gt;...</td></tr>
 <tr><td colspan="4" class="doc" id="cxxDestructorDecl0"><pre>Matches explicit C++ destructor declarations.
 
-Example matches Foo::~Foo()
+Given
   class Foo {
    public:
     virtual ~Foo();
   };
+
+  struct Bar {};
+
+
+The matcher cxxDestructorDecl()
+matches virtual ~Foo().
 </pre></td></tr>
 
 
 <tr><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1Decl.html">Decl</a>&gt;</td><td class="name" onclick="toggle('cxxMethodDecl0')"><a name="cxxMethodDecl0Anchor">cxxMethodDecl</a></td><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1CXXMethodDecl.html">CXXMethodDecl</a>&gt;...</td></tr>
 <tr><td colspan="4" class="doc" id="cxxMethodDecl0"><pre>Matches method declarations.
 
-Example matches y
+Given
   class X { void y(); };
+
+
+The matcher cxxMethodDecl()
+matches void y().
 </pre></td></tr>
 
 
 <tr><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1Decl.html">Decl</a>&gt;</td><td class="name" onclick="toggle('cxxRecordDecl0')"><a name="cxxRecordDecl0Anchor">cxxRecordDecl</a></td><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1CXXRecordDecl.html">CXXRecordDecl</a>&gt;...</td></tr>
 <tr><td colspan="4" class="doc" id="cxxRecordDecl0"><pre>Matches C++ class declarations.
 
-Example matches X, Z
+Given
   class X;
   template&lt;class T&gt; class Z {};
+
+The matcher cxxRecordDecl()
+matches X and Z.
 </pre></td></tr>
 
 
 <tr><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1Decl.html">Decl</a>&gt;</td><td class="name" onclick="toggle('decl0')"><a name="decl0Anchor">decl</a></td><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1Decl.html">Decl</a>&gt;...</td></tr>
 <tr><td colspan="4" class="doc" id="decl0"><pre>Matches declarations.
 
-Examples matches X, C, and the friend declaration inside C;
+Given
   void X();
   class C {
-    friend X;
+    friend void X();
   };
+
+The matcher decl()
+matches void X(), C
+and friend void X().
 </pre></td></tr>
 
 
@@ -767,40 +826,49 @@ <h2 id="decl-matchers">Node Matchers</h2>
 
 Given
   class X { int y; };
-declaratorDecl()
-  matches int y.
+
+The matcher declaratorDecl()
+matches int y.
 </pre></td></tr>
 
 
 <tr><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1Decl.html">Decl</a>&gt;</td><td class="name" onclick="toggle('decompositionDecl0')"><a name="decompositionDecl0Anchor">decompositionDecl</a></td><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1DecompositionDecl.html">DecompositionDecl</a>&gt;...</td></tr>
 <tr><td colspan="4" class="doc" id="decompositionDecl0"><pre>Matches decomposition-declarations.
 
-Examples matches the declaration node with foo and bar, but not
-number.
-(matcher = declStmt(has(decompositionDecl())))
-
+Given
+  struct pair { int x; int y; };
+  pair make(int, int);
   int number = 42;
-  auto [foo, bar] = std::make_pair{42, 42};
+  auto [foo, bar] = make(42, 42);
+
+The matcher decompositionDecl()
+matches auto [foo, bar] = make(42, 42),
+but does not match number.
 </pre></td></tr>
 
 
 <tr><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1Decl.html">Decl</a>&gt;</td><td class="name" onclick="toggle('enumConstantDecl0')"><a name="enumConstantDecl0Anchor">enumConstantDecl</a></td><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1EnumConstantDecl.html">EnumConstantDecl</a>&gt;...</td></tr>
 <tr><td colspan="4" class="doc" id="enumConstantDecl0"><pre>Matches enum constants.
 
-Example matches A, B, C
+Given
   enum X {
     A, B, C
   };
+The matcher enumConstantDecl()
+matches A, B and C.
 </pre></td></tr>
 
 
 <tr><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1Decl.html">Decl</a>&gt;</td><td class="name" onclick="toggle('enumDecl0')"><a name="enumDecl0Anchor">enumDecl</a></td><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1EnumDecl.html">EnumDecl</a>&gt;...</td></tr>
 <tr><td colspan="4" class="doc" id="enumDecl0"><pre>Matches enum declarations.
 
-Example matches X
+Given
   enum X {
     A, B, C
   };
+
+The matcher enumDecl()
+matches the enum X.
 </pre></td></tr>
 
 
@@ -808,9 +876,14 @@ <h2 id="decl-matchers">Node Matchers</h2>
 <tr><td colspan="4" class="doc" id="fieldDecl0"><pre>Matches field declarations.
 
 Given
-  class X { int m; };
-fieldDecl()
-  matches 'm'.
+  int a;
+  struct Foo {
+    int x;
+  };
+  void bar(int val);
+
+The matcher fieldDecl()
+matches int x.
 </pre></td></tr>
 
 
@@ -819,16 +892,20 @@ <h2 id="decl-matchers">Node Matchers</h2>
 
 Given
   class X { friend void foo(); };
-friendDecl()
-  matches 'friend void foo()'.
+
+The matcher friendDecl()
+matches friend void foo().
 </pre></td></tr>
 
 
 <tr><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1Decl.html">Decl</a>&gt;</td><td class="name" onclick="toggle('functionDecl0')"><a name="functionDecl0Anchor">functionDecl</a></td><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1FunctionDecl.html">FunctionDecl</a>&gt;...</td></tr>
 <tr><td colspan="4" class="doc" id="functionDecl0"><pre>Matches function declarations.
 
-Example matches f
+Given
   void f();
+
+The matcher functionDecl()
+matches void f().
 </pre></td></tr>
 
 
@@ -837,6 +914,10 @@ <h2 id="decl-matchers">Node Matchers</h2>
 
 Example matches f
   template&lt;class T&gt; void f(T t) {}
+
+
+The matcher functionTemplateDecl()
+matches template&lt;class T&gt; void f(T t) {}.
 </pre></td></tr>
 
 
@@ -845,8 +926,8 @@ <h2 id="decl-matchers">Node Matchers</h2>
 
 Given
   struct X { struct { int a; }; };
-indirectFieldDecl()
-  matches 'a'.
+The matcher indirectFieldDecl()
+matches a.
 </pre></td></tr>
 
 
@@ -854,10 +935,13 @@ <h2 id="decl-matchers">Node Matchers</h2>
 <tr><td colspan="4" class="doc" id="labelDecl0"><pre>Matches a declaration of label.
 
 Given
-  goto FOO;
-  FOO: bar();
-labelDecl()
-  matches 'FOO:'
+  void bar();
+  void foo() {
+    goto FOO;
+    FOO: bar();
+  }
+The matcher labelDecl()
+matches FOO: bar().
 </pre></td></tr>
 
 
@@ -866,8 +950,9 @@ <h2 id="decl-matchers">Node Matchers</h2>
 
 Given
   extern "C" {}
-linkageSpecDecl()
-  matches "extern "C" {}"
+
+The matcher linkageSpecDecl()
+matches extern "C" {}.
 </pre></td></tr>
 
 
@@ -875,12 +960,18 @@ <h2 id="decl-matchers">Node Matchers</h2>
 <tr><td colspan="4" class="doc" id="namedDecl0"><pre>Matches a declaration of anything that could have a name.
 
 Example matches X, S, the anonymous union type, i, and U;
+Given
   typedef int X;
   struct S {
     union {
       int i;
     } U;
   };
+The matcher namedDecl()
+matches typedef int X, S, int i
+ and U,
+with S matching twice in C++.
+Once for the injected class name and once for the declaration itself.
 </pre></td></tr>
 
 
@@ -890,8 +981,10 @@ <h2 id="decl-matchers">Node Matchers</h2>
 Given
   namespace test {}
   namespace alias = ::test;
-namespaceAliasDecl()
-  matches "namespace alias" but not "namespace test"
+
+The matcher namespaceAliasDecl()
+matches alias,
+but does not match test.
 </pre></td></tr>
 
 
@@ -901,8 +994,9 @@ <h2 id="decl-matchers">Node Matchers</h2>
 Given
   namespace {}
   namespace test {}
-namespaceDecl()
-  matches "namespace {}" and "namespace test {}"
+
+The matcher namespaceDecl()
+matches namespace {} and namespace test {}.
 </pre></td></tr>
 
 
@@ -911,8 +1005,10 @@ <h2 id="decl-matchers">Node Matchers</h2>
 
 Given
   template &lt;typename T, int N&gt; struct C {};
-nonTypeTemplateParmDecl()
-  matches 'N', but not 'T'.
+
+The matcher nonTypeTemplateParmDecl()
+matches int N,
+but does not match typename T.
 </pre></td></tr>
 
 
@@ -922,6 +1018,7 @@ <h2 id="decl-matchers">Node Matchers</h2>
 Example matches Foo (Additions)
   @interface Foo (Additions)
   @end
+
 </pre></td></tr>
 
 
@@ -931,6 +1028,7 @@ <h2 id="decl-matchers">Node Matchers</h2>
 Example matches Foo (Additions)
   @implementation Foo (Additions)
   @end
+
 </pre></td></tr>
 
 
@@ -940,6 +1038,7 @@ <h2 id="decl-matchers">Node Matchers</h2>
 Example matches Foo
   @implementation Foo
   @end
+
 </pre></td></tr>
 
 
@@ -949,6 +1048,7 @@ <h2 id="decl-matchers">Node Matchers</h2>
 Example matches Foo
   @interface Foo
   @end
+
 </pre></td></tr>
 
 
@@ -960,6 +1060,7 @@ <h2 id="decl-matchers">Node Matchers</h2>
     BOOL _enabled;
   }
   @end
+
 </pre></td></tr>
 
 
@@ -974,6 +1075,7 @@ <h2 id="decl-matchers">Node Matchers</h2>
   @implementation Foo
   - (void)method {}
   @end
+
 </pre></td></tr>
 
 
@@ -984,6 +1086,7 @@ <h2 id="decl-matchers">Node Matchers</h2>
   @interface Foo
   @property BOOL enabled;
   @end
+
 </pre></td></tr>
 
 
@@ -993,6 +1096,7 @@ <h2 id="decl-matchers">Node Matchers</h2>
 Example matches FooDelegate
   @protocol FooDelegate
   @end
+
 </pre></td></tr>
 
 
@@ -1001,48 +1105,58 @@ <h2 id="decl-matchers">Node Matchers</h2>
 
 Given
   void f(int x);
-parmVarDecl()
-  matches int x.
+The matcher parmVarDecl()
+matches int x.
 </pre></td></tr>
 
 
 <tr><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1Decl.html">Decl</a>&gt;</td><td class="name" onclick="toggle('recordDecl0')"><a name="recordDecl0Anchor">recordDecl</a></td><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1RecordDecl.html">RecordDecl</a>&gt;...</td></tr>
 <tr><td colspan="4" class="doc" id="recordDecl0"><pre>Matches class, struct, and union declarations.
 
-Example matches X, Z, U, and S
+Given
   class X;
   template&lt;class T&gt; class Z {};
   struct S {};
   union U {};
+
+The matcher recordDecl()
+matches X, Z,
+S and U.
 </pre></td></tr>
 
 
 <tr><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1Decl.html">Decl</a>&gt;</td><td class="name" onclick="toggle('staticAssertDecl0')"><a name="staticAssertDecl0Anchor">staticAssertDecl</a></td><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1StaticAssertDecl.html">StaticAssertDecl</a>&gt;...</td></tr>
 <tr><td colspan="4" class="doc" id="staticAssertDecl0"><pre>Matches a C++ static_assert declaration.
 
-Example:
-  staticAssertDecl()
-matches
-  static_assert(sizeof(S) == sizeof(int))
-in
+Given
   struct S {
     int x;
   };
   static_assert(sizeof(S) == sizeof(int));
+
+
+The matcher staticAssertDecl()
+matches static_assert(sizeof(S) == sizeof(int)).
 </pre></td></tr>
 
 
 <tr><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1Decl.html">Decl</a>&gt;</td><td class="name" onclick="toggle('tagDecl0')"><a name="tagDecl0Anchor">tagDecl</a></td><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1TagDecl.html">TagDecl</a>&gt;...</td></tr>
 <tr><td colspan="4" class="doc" id="tagDecl0"><pre>Matches tag declarations.
 
-Example matches X, Z, U, S, E
+Given
   class X;
   template&lt;class T&gt; class Z {};
   struct S {};
   union U {};
-  enum E {
-    A, B, C
-  };
+  enum E { A, B, C };
+
+
+The matcher tagDecl()
+matches class X, class Z {}, the injected class name
+class Z, struct S {},
+the injected class name struct S, union U {},
+the injected class name union U
+and enum E { A, B, C }.
 </pre></td></tr>
 
 
@@ -1051,8 +1165,10 @@ <h2 id="decl-matchers">Node Matchers</h2>
 
 Given
   template &lt;template &lt;typename&gt; class Z, int N&gt; struct C {};
-templateTypeParmDecl()
-  matches 'Z', but not 'N'.
+
+The matcher templateTemplateParmDecl()
+matches template &lt;typename&gt; class Z,
+but does not match int N.
 </pre></td></tr>
 
 
@@ -1061,8 +1177,10 @@ <h2 id="decl-matchers">Node Matchers</h2>
 
 Given
   template &lt;typename T, int N&gt; struct C {};
-templateTypeParmDecl()
-  matches 'T', but not 'N'.
+
+The matcher templateTypeParmDecl()
+matches typename T,
+but does not int N.
 </pre></td></tr>
 
 
@@ -1072,10 +1190,12 @@ <h2 id="decl-matchers">Node Matchers</h2>
 Given
   int X;
   namespace NS {
-  int Y;
+    int Y;
   }  // namespace NS
-decl(hasDeclContext(translationUnitDecl()))
-  matches "int X", but not "int Y".
+
+The matcher namedDecl(hasDeclContext(translationUnitDecl()))
+matches X and NS,
+but does not match Y.
 </pre></td></tr>
 
 
@@ -1085,17 +1205,22 @@ <h2 id="decl-matchers">Node Matchers</h2>
 Given
   typedef int X;
   using Y = int;
-typeAliasDecl()
-  matches "using Y = int", but not "typedef int X"
+
+The matcher typeAliasDecl()
+matches using Y = int,
+but does not match typedef int X.
 </pre></td></tr>
 
 
 <tr><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1Decl.html">Decl</a>&gt;</td><td class="name" onclick="toggle('typeAliasTemplateDecl0')"><a name="typeAliasTemplateDecl0Anchor">typeAliasTemplateDecl</a></td><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1TypeAliasTemplateDecl.html">TypeAliasTemplateDecl</a>&gt;...</td></tr>
 <tr><t...
[truncated]

5chmidti · 2024-10-14T08:35:34Z

The last five commits are the cumulative changes for fixing the Buildbots. I will check in with the Buildbot owner to see if the previous issue has been solved by 4a85fa3

Previously, the examples in the AST matcher reference, which gets generated by the doxygen comments in `ASTMatchers.h`, were untested and best effort. Some of the matchers had no or wrong examples of how to use the matcher. This patch introduces a simple DSL around doxygen commands to enable testing the AST matcher documentation in a way that should be relatively easy. In `ASTMatchers.h`, most matchers are documented with a doxygen comment. Most of these also have a code example that aims to show what the matcher will match, given a matcher somewhere in the documentation text. The way that testing the documentation is done, is by using doxygens alias feature to declare custom aliases. These aliases forward to `<tt>text</tt>` (which is what doxygens \c does, but for multiple words). Using the doxygen aliases was the obvious choice, because there are (now) four consumers: - people reading the header/using signature help - the doxygen generated documentation - the generated html AST matcher reference - (new) the generated matcher tests This patch rewrites/extends the documentation such that all matchers have a documented example. The new `generate_ast_matcher_doc_tests.py` script will warn on any undocumented matchers (but not on matchers without a doxygen comment) and provides diagnostics and statistics about the matchers. Below is a file-level comment from the test generation script that describes how documenting matchers to be tested works on a slightly more technical level. In general, the new comments can be used as a reference for how to implement a tested documentation. The current statistics emitted by the parser are: ```text Statistics: doxygen_blocks : 519 missing_tests : 10 skipped_objc : 42 code_snippets : 503 matches : 820 matchers : 580 tested_matchers : 574 none_type_matchers : 6 ``` The tests are generated during building and the script will only print something if it found an issue (compile failure, parsing issues, the expected and actual number of failures differs). DSL for generating the tests from documentation. TLDR: The order for a single code snippet example is: \header{a.h} \endheader <- zero or more header \code int a = 42; \endcode \compile_args{-std=c++,c23-or-later} <- optional, supports std ranges and whole languages \matcher{expr()} <- one or more matchers in succession \match{42} <- one ore more matches in succession \matcher{varDecl()} <- new matcher resets the context, the above \match will not count for this new matcher(-group) \match{int a = 42} <- only applies to the previous matcher (no the previous case) The above block can be repeated inside of a doxygen command for multiple code examples. Language Grammar: [] denotes an optional, and <> denotes user-input compile_args j:= \compile_args{[<compile_arg>;]<compile_arg>} matcher_tag_key ::= type match_tag_key ::= type || std || count matcher_tags ::= [matcher_tag_key=<value>;]matcher_tag_key=<value> match_tags ::= [match_tag_key=<value>;]match_tag_key=<value> matcher ::= \matcher{[matcher_tags$]<matcher>} matchers ::= [matcher] matcher match ::= \match{[match_tags$]<match>} matches ::= [match] match case ::= matchers matches cases ::= [case] case header-block ::= \header{<name>} <code> \endheader code-block ::= \code <code> \endcode testcase ::= code-block [compile_args] cases The 'std' tag and '\compile_args' support specifying a specific language version, a whole language and all of it's versions, and thresholds (implies ranges). Multiple arguments are passed with a ',' seperator. For a language and version to execute a tested matcher, it has to match the specified '\compile_args' for the code, and the 'std' tag for the matcher. Predicates for the 'std' compiler flag are used with disjunction between languages (e.g. 'c || c++') and conjunction for all predicates specific to each language (e.g. 'c++11-or-later && c++23-or-earlier'). Examples: - c all available versions of C - c++11 only C++11 - c++11-or-later C++11 or later - c++11-or-earlier C++11 or earlier - c++11-or-later,c++23-or-earlier,c all of C and C++ between 11 and 23 (inclusive) - c++11-23,c same as above Tags: Type: Match types are used to select where the string that is used to check if a node matches comes from. Available: code, name, typestr, typeofstr. The default is 'code'. Matcher types are used to mark matchers as submatchers with 'sub' or as deactivated using 'none'. Testing submatchers is not implemented. Count: Specifying a 'count=n' on a match will result in a test that requires that the specified match will be matched n times. Default is 1. Std: A match allows specifying if it matches only in specific language versions. This may be needed when the AST differs between language versions. Fixes llvm#57607 Fixes llvm#63748

…expressive

Fix for the buildbot failure due to lower python versions not supporting some types to be subscripted. Tested with python3.8.

omjavaid · 2024-11-13T21:20:53Z

@5chmidti Sorry for the delay. I have tested this and it seems to compile on windows msvc without any regressions.

5chmidti · 2024-11-15T09:51:06Z

@5chmidti Sorry for the delay. I have tested this and it seems to compile on windows msvc without any regressions.

Thank you for checking that it works 👍

llvm-ci · 2024-11-16T05:25:59Z

LLVM Buildbot has detected a new failure on builder clang-arm64-windows-msvc running on linaro-armv8-windows-msvc-04 while building clang at step 5 "ninja check 1".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/161/builds/3227

Here is the relevant piece of the build log for the reference

Step 5 (ninja check 1) failure: 1200 seconds without output running [b'ninja', b'check-all'], attempting to kill
...
[1516/1535] Linking CXX executable unittests\Transforms\Coroutines\CoroTests.exe
[1517/1535] Linking CXX executable unittests\Transforms\Vectorize\SandboxVectorizer\SandboxVectorizerTests.exe
[1518/1535] Linking CXX executable unittests\Transforms\Instrumentation\InstrumentationTests.exe
[1519/1535] Linking CXX executable unittests\tools\llvm-profgen\LLVMProfgenTests.exe
[1520/1535] Linking CXX executable unittests\Transforms\Utils\UtilsTests.exe
[1521/1535] Linking CXX executable unittests\tools\llvm-profdata\LLVMProfdataTests.exe
[1522/1535] Linking CXX executable unittests\Transforms\Scalar\ScalarTests.exe
[1523/1535] Linking CXX executable unittests\tools\llvm-cfi-verify\CFIVerifyTests.exe
[1524/1535] Linking CXX executable unittests\tools\llvm-mca\LLVMMCATests.exe
[1525/1535] Linking CXX executable unittests\tools\llvm-exegesis\LLVMExegesisTests.exe
command timed out: 1200 seconds without output running [b'ninja', b'check-all'], attempting to kill
program finished with exit code 1
elapsedTime=2216.650482

…ce" (#116477) Reverts #112168

5chmidti added the clang Clang issues not falling into any other category label Oct 14, 2024

5chmidti added 18 commits October 31, 2024 09:27

fix cuda tests

e364ac1

fix cuda tests

ee6fac2

fix f-string containing backslash

236909b

rm not needed typeofstr match kind

7cf190f

move some FIXME's to the main function

975a8a2

replace some type=name matches with explicit code matches to be more …

24c2dd0

…expressive

add missing comments on some matchers on the number of matches generated

5969c52

add documentation (pr desc) to ASTMatchers.h's file comment

714369c

fix empty enum decl

c00f1c0

fix formatting

32cb6f6

fix formatting

64d7057

[clang][test][NFC] fix for python version requirements

d5b33fd

Fix for the buildbot failure due to lower python versions not supporting some types to be subscripted. Tested with python3.8.

remove testing if the examples compile in the test gen script

05419a1

update header comment

5483271

fix control flow reaching end of function without return

6012540

[NFC] rm unused arg

c2923a6

rm misplaced ARGS in add_custom_command

c59ab52

5chmidti force-pushed the users/5chmidti/add_testing_for_the_AST_matcher_reference branch from 210b60c to c59ab52 Compare October 31, 2024 08:32

5chmidti merged commit 53e92e4 into llvm:main Nov 15, 2024
9 checks passed

5chmidti deleted the users/5chmidti/add_testing_for_the_AST_matcher_reference branch November 15, 2024 09:51

5chmidti mentioned this pull request Nov 16, 2024

Revert "Reland: [clang][test] add testing for the AST matcher reference" #116477

Merged

5chmidti added a commit that referenced this pull request Nov 16, 2024

Revert "Reland: [clang][test] add testing for the AST matcher referen…

ec0a27f

…ce" (#116477) Reverts #112168

Reland: [clang][test] add testing for the AST matcher reference #112168

Reland: [clang][test] add testing for the AST matcher reference #112168

Uh oh!

Conversation

5chmidti commented Oct 14, 2024

Problem Statement

Solution

Description

Language Grammar

Language Standard Versions

Tags

type:

count:

std:

sub:

What if ...?

... I want to add a matcher?

... the example I wrote is wrong?

... I don't adhere to the required order of the syntax?

... the script diagnoses a false-positive issue with a Doxygen comment?

Uh oh!

llvmbot commented Oct 14, 2024

Problem Statement

Solution

Description

Language Grammar

Language Standard Versions

Tags

type:

count:

std:

sub:

What if ...?

... I want to add a matcher?

... the example I wrote is wrong?

... I don't adhere to the required order of the syntax?

... the script diagnoses a false-positive issue with a Doxygen comment?

Uh oh!

5chmidti commented Oct 14, 2024

Uh oh!

omjavaid commented Nov 13, 2024

Uh oh!

5chmidti commented Nov 15, 2024

Uh oh!

Uh oh!

llvm-ci commented Nov 16, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

`type`:

`count`:

`std`:

`sub`:

`type`:

`count`:

`std`:

`sub`: