Skip to content

Conversation

@b1llow
Copy link
Member

@b1llow b1llow commented Nov 12, 2025

This is the 3nd attempt to rewrite old C++ demangler GPL code.

New features:

C++20 name mangling support

Resources :

Old PR:

@b1llow b1llow closed this Nov 12, 2025
@b1llow b1llow reopened this Nov 12, 2025
Copy link

@notxvilka notxvilka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it would be a lot of work, but I recommend splitting tests semantically rather than arbitrarily. Group them by the feature, for example. In the long run it will be much easier to add new tests and improve old ones whenever necessary.

@brightprogrammer
Copy link
Contributor

Hi!

Sorry for not being active for a long time. Last few (4-5) months have been a roller coaster ride for me. Thanks for picking up the PR. I just couldn't bring myself back at it again.

During my process of writing this, I modified the grammar so it's compatible with the parsing method here. I see that I didn't keep the grammar file here also I don't have access to the system on which it is on at the moment. I'll get it soon and make it available here so it's easier to follow the decisions made.

Sorry again 🙏 for keeping in the dark. RizinOrg is a lovely and welcoming community.

@brightprogrammer
Copy link
Contributor

Attachments

4-final-almost-gnf-grammar.txt

The above file contains the final grammar that I deduced after making multiple transformations to the original grammar. The following file contains all files that I created during the transformations iteratively. Most probably if you ever decide to use the grammar files, you'll mostly use the above, but if you need any answer to "why" then following zip file may help.

steps-grammars.zip

How To Read The Grammar Files?

Rules are written like this

[.nested-name] <-- rule section start and anything that follows before any other rule starts is this rule's alternative
               <-- each rule alternative is represented in a single line
N [{{CV-qualifiers}}] [{ref-qualifier}] {prefix} {ctor-name} E
N [{{CV-qualifiers}}] [{ref-qualifier}] {prefix} {dtor-name} E
N [{{CV-qualifiers}}] [{ref-qualifier}] {prefix} {unqualified-name} E
N [{{CV-qualifiers}}] [{ref-qualifier}] {template-prefix} {ctor-name} {template-args} E
N [{{CV-qualifiers}}] [{ref-qualifier}] {template-prefix} {dtor-name} {template-args} E
N [{{CV-qualifiers}}] [{ref-qualifier}] {template-prefix} {unqualified-name} {template-args} E
  • A rule is referenced like {rule} and
  • A rule that may result in an empty string is represented like {{rule}}
  • If a rule is optional then it's enclosed in [..] big brackets.

Derivation Process

This is the general GNF derivation process.

  • I started with original grammar and prefixed all rules that surely start with a terminal with a . (dot). I called this a dot-rule.
  • Then looked for rules that start with a dot-rule. This meant that this rule will eventually always start with a terminal as well and then marked this rule with a . (dot) prefix as well, meaning this is a dot-rule as well (inductively)
  • Some rules that look very long and complicated is because of recursive application of this process.

This was done manually but I checked it manually more than hundred times so I believe this should be correct.

@b1llow
Copy link
Member Author

b1llow commented Nov 17, 2025

@brightprogrammer I feel that implementing such a demangler using the C macro makes source-level debugging practically impossible. What was the original reasoning behind this approach?

@brightprogrammer
Copy link
Contributor

Code size reduced by a lot and I could compare code with grammar easily. Debugging does become harder, that is one of the reasons it too so much time to even get here.

@brightprogrammer
Copy link
Contributor

@b1llow The trace graph really helps here though. I found debugging using it quite helpful. Also because of the degeneracy of most of the code because of use of a select set of macros, only debugging of grammar rules is required, and some context information like substitutions and templates

@b1llow b1llow force-pushed the update-cpp-demangle branch 3 times, most recently from 22e6b2e to 60a8e63 Compare November 22, 2025 12:34
…ory management

Enhance C++ demangling support with new rule declarations and macro improvements

Add support for C++ type demangling in new test file

Refactor macros to support positional tracking and improve clarity
@b1llow b1llow force-pushed the update-cpp-demangle branch from 60a8e63 to 6497e50 Compare November 22, 2025 12:36
…types.h, and macros.h

- Updated AST handling in ast.c (+29 lines)
- Minor macro adjustments in macros.h
- Refactored meta functions in meta.c (net -15 lines)
- Modified type definitions in types.h (net -5 lines)
- Small fixes in v2.c
- Significant refactoring in v3.c (net -65 lines)

Total: 83 insertions, 91 deletions across 6 files
- Refactor Meta struct to use DemAstNode for detected_types instead of Name
  - Change detected_types from Vec(Name) to Vec(DemAstNode) for consistency
  - Update meta_copy, meta_move, and meta_deinit functions accordingly
  - Change current_prefix from DemString to DemAstNode

- Fix bugs in meta.c:
  - Remove duplicate 'is_const' assignment in meta_copy_scalars
  - Fix logic error in append_type (was checking !dem_string_empty, should be dem_string_empty)
  - Fix append_type to properly append NULL and copy node data
  - Fix indentation in append_tparam function

- Update find_type_index and meta_substitute_type to work with DemAstNode

- Simplify rule_prefix_tail in v3.c:
  - Replace complex backtracking logic with cleaner while-loop approach
  - Build fully qualified prefix incrementally
  - Properly update substitution table and current_prefix in each iteration

- Fix minor issues:
  - Add missing semicolon in types.h function declaration
  - Fix const qualifier in cpdem_get_demangled return type (v2.c)
  - Fix formatting in v3.c (spacing around 'else')
  - Update trace_graph.c to use DemAstNode instead of Name

- Add test timeout configuration in meson.build for slow tests (cxx_base)
  to handle large number of test cases with ASAN
…x construction in rule_type and rule_prefix_tail
Removing complex prefix parsing rules
Improving memory handling and node initialization
Enhancing template and function pointer parsing capabilities
Reducing code redundancy and improving maintainability

1. AST Node Constructor Refactoring (src/cplusplus/ast.c)
Added DemAstNode_ctor_inplace() for in-place initialization
Changed DemAstNode_ctor() signature to accept raw string parameters instead of struct pointers
Enhanced DemAstNode_copy() to properly copy subtag and children fields
2. Type System Enhancements (src/cplusplus/types.h)
Introduced subtag field for handling sub-type tags
Added QUALIFIED_TYPE enum value
Simplified type definitions
3. Macro Simplification (src/cplusplus/macros.h)
Added CONSUME() macro
Removed IS_CONST() / SET_CONST() / UNSET_CONST() macros
Removed FORCE_APPEND_TYPE() and APPEND_TPARAM() macros
Simplified append_type() calls (removed force parameter)
4. Metadata Handling Refactoring (src/cplusplus/meta.c)
Added NodeList_copy(), NodeList_make(), NodeList_pop_trailing() helper functions
Significantly simplified meta_copy_scalars() logic
5. Core Parsing Logic Overhaul (src/cplusplus/v3.c, ~500 lines reduced)
Removed complex prefix rule functions:
rule_prefix_suffix()
rule_prefix_start()
rule_prefix_tail()
rule_prefix()
rule_template_prefix()
Improved nested name parsing
Enhanced template argument handling
Optimized function pointer handling logic
Added debug trace toggle (m->trace)
…to prevent pointer overflow and ensure valid terminators
Parse std:: qualified templates correctly by detecting St<digit> pattern
and parsing the identifier after the std substitution.

Key changes:
- Detect when St substitution is followed by a digit (identifier)
- Append :: and parse the source name to form std::<identifier>
- Add qualified name to substitution table BEFORE parsing template args
  to ensure correct back-reference resolution

This fixes parsing of std::function<T>, std::char_traits<T>, and other
std-qualified templates that were being split incorrectly.

Test results: All 1003 tests in cxx.00 now pass (was 10 failures)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants