[clang-doc] concatenate SymbolIDs to truncated mangled names #159490

evelez7 · 2025-09-18T01:45:19Z

Previously, if mangled names were too long to be used as filenames, the
object's SymbolID was used as a filename. This worked for length
restrictions, but made URLs/filenames inconsistent. This patch truncates
the mangled name and appends the SymbolID. Thus, we can keep some
context in the URL/filename while preserving uniqueness.

Previously, if mangled names were too long to be used as filenames, the object's SymbolID was used as a filename. This worked for length restrictions, but made URLs/filenames inconsistent. This patch truncates the mangled name and appends the SymbolID. Thus, we can keep some context in the URL/filename while preserving uniqueness.

evelez7 · 2025-09-18T01:45:34Z

[clang-doc] concatenate SymbolIDs to truncated mangled names #159490 👈 (View in Graphite)
main

This stack of pull requests is managed by Graphite. Learn more about stacking.

llvmbot · 2025-09-18T01:46:00Z

@llvm/pr-subscribers-clang-tools-extra

Author: Erick Velez (evelez7)

Changes

Previously, if mangled names were too long to be used as filenames, the
object's SymbolID was used as a filename. This worked for length
restrictions, but made URLs/filenames inconsistent. This patch truncates
the mangled name and appends the SymbolID. Thus, we can keep some
context in the URL/filename while preserving uniqueness.

Full diff: https://github.com/llvm/llvm-project/pull/159490.diff

2 Files Affected:

(modified) clang-tools-extra/clang-doc/Serialize.cpp (+4-6)
(modified) clang-tools-extra/test/clang-doc/long-name.cpp (+2-2)

diff --git a/clang-tools-extra/clang-doc/Serialize.cpp b/clang-tools-extra/clang-doc/Serialize.cpp
index dd7cd0b2ae736..186f634dd892a 100644
--- a/clang-tools-extra/clang-doc/Serialize.cpp
+++ b/clang-tools-extra/clang-doc/Serialize.cpp
@@ -780,12 +780,10 @@ static void populateSymbolInfo(SymbolInfo &I, const T *D, const FullComment *C,
     MangledStream << D->getNameAsString();
   // A 250 length limit was chosen since 255 is a common limit across
   // different filesystems, with a 5 character buffer for file extensions.
-  if (MangledName.size() > 250)
-    // File creation fails if the mangled name is too long, so default to the
-    // USR. We should look for a better check since filesystems differ in
-    // maximum filename length
-    I.MangledName = llvm::toStringRef(llvm::toHex(I.USR));
-  else
+  if (MangledName.size() > 250) {
+    auto SymbolID = llvm::toStringRef(llvm::toHex(I.USR)).str();
+    I.MangledName = MangledName.substr(0, 250 - SymbolID.size()) + SymbolID;
+  } else
     I.MangledName = MangledName;
   delete Mangler;
 }
diff --git a/clang-tools-extra/test/clang-doc/long-name.cpp b/clang-tools-extra/test/clang-doc/long-name.cpp
index b33337588da19..db96fc4aebe5a 100644
--- a/clang-tools-extra/test/clang-doc/long-name.cpp
+++ b/clang-tools-extra/test/clang-doc/long-name.cpp
@@ -9,6 +9,6 @@ struct ThisStructHasANameThatResultsInAMangledNameThatIsExactly250CharactersLong
 struct ThisStructHasANameThatResultsInAMangledNameThatIsExactly251CharactersLongThatIsSupposedToTestTheFilenameLengthLimitsWithinClangDocInOrdertoSeeifclangdocwillcrashornotdependingonthelengthofthestructIfTheLengthIsTooLongThenClangDocWillCrashAnd123 {};
 
 // CHECK-JSON: ThisStructHasANameThatResultsInAMangledNameThatIsExactly250CharactersLongThatIsSupposedToTestTheFilenameLengthLimitsWithinClangDocInOrdertoSeeifclangdocwillcrashornotdependingonthelengthofthestructIfTheLengthIsTooLongThenClangDocWillCrashAnd12.json
-// CHECK-JSON: {{[0-9A-F]*}}.json
+// CHECK-JSON: _ZTV244ThisStructHasANameThatResultsInAMangledNameThatIsExactly251CharactersLongThatIsSupposedToTestTheFilenameLengthLimitsWithinClangDocInOrdertoSeeifclangdocwillcrashornotdependingonthelengthofthestructIfTheL29DE8558215A13A506661C0E01E50AA3E5C9C7FA.json 
 // CHECK-HTML: ThisStructHasANameThatResultsInAMangledNameThatIsExactly250CharactersLongThatIsSupposedToTestTheFilenameLengthLimitsWithinClangDocInOrdertoSeeifclangdocwillcrashornotdependingonthelengthofthestructIfTheLengthIsTooLongThenClangDocWillCrashAnd12.html
-// CHECK-HTML: {{[0-9A-F]*}}.html
+// CHECK-HTML: _ZTV244ThisStructHasANameThatResultsInAMangledNameThatIsExactly251CharactersLongThatIsSupposedToTestTheFilenameLengthLimitsWithinClangDocInOrdertoSeeifclangdocwillcrashornotdependingonthelengthofthestructIfTheL29DE8558215A13A506661C0E01E50AA3E5C9C7FA.html

ilovepi · 2025-09-22T20:01:33Z

clang-tools-extra/clang-doc/Serialize.cpp

+    auto SymbolID = llvm::toStringRef(llvm::toHex(I.USR)).str();
+    I.MangledName = MangledName.substr(0, 250 - SymbolID.size()) + SymbolID;


Its a micro optimization, but you can save 2 allocations by using Twine to concat in one go in the assignment. It does leave the code a bit less readable, but not too badly. Cant recall if the code below needs to be wrapped in ().str() or not either. But I'm also fine if you leave this as is.

Suggested change

auto SymbolID = llvm::toStringRef(llvm::toHex(I.USR)).str();

I.MangledName = MangledName.substr(0, 250 - SymbolID.size()) + SymbolID;

I.MangledName = Twine(MangledName.substr(0, 250 - SymbolID.size())) + llvm::toStringRef(llvm::toHex(I.USR));

I tried this but I need to know the hex version of the USR's length to subtract, not the Info's hashed SymbolID. I think the hex value is guaranteed to be at most 40. Do you know if that's an upper limit? There's a reference to that in the YAML generator. If it is a constant, this would work and be better than querying for the length since anything greater than that would break this anyways. The test has a 40 character hex.

SymbolID is std::array<uint8_t,20>, so I'd guess thats why the length is fixed at 40, since you'd get 2chars per byte. the SHA1.cpp returns the same std::array<uint8_t,20> type, so I beleive you can rely on that as an upper limit, and sha1 has a 20 byte fixed size (160-bit).

in any case, we can fix that later. its not important in the bigger scheme of things, and as i said is really just a micro optimization. we only do this once per file, so its not terrible, even for a large amount of files.

evelez7 · 2025-09-23T17:22:16Z

Merge activity

Sep 23, 5:22 PM UTC: A user started a stack merge that includes this pull request via Graphite.
Sep 23, 5:24 PM UTC: @evelez7 merged this pull request with Graphite.

evelez7 marked this pull request as ready for review September 18, 2025 01:45

llvmbot added the clang-tools-extra label Sep 18, 2025

evelez7 requested review from ilovepi and petrhosek September 18, 2025 01:45

ilovepi approved these changes Sep 22, 2025

View reviewed changes

evelez7 merged commit e80a207 into main Sep 23, 2025
13 checks passed

evelez7 deleted the users/evelez7/clang-doc-mangled-names-usr branch September 23, 2025 17:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[clang-doc] concatenate SymbolIDs to truncated mangled names #159490

[clang-doc] concatenate SymbolIDs to truncated mangled names #159490

Uh oh!

evelez7 commented Sep 18, 2025

Uh oh!

evelez7 commented Sep 18, 2025 •

edited

Loading

Uh oh!

llvmbot commented Sep 18, 2025

Uh oh!

ilovepi Sep 22, 2025 •

edited

Loading

Uh oh!

evelez7 Sep 23, 2025

Uh oh!

ilovepi Sep 23, 2025

Uh oh!

ilovepi Sep 23, 2025

Uh oh!

evelez7 commented Sep 23, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		auto SymbolID = llvm::toStringRef(llvm::toHex(I.USR)).str();
		I.MangledName = MangledName.substr(0, 250 - SymbolID.size()) + SymbolID;

	auto SymbolID = llvm::toStringRef(llvm::toHex(I.USR)).str();
	I.MangledName = MangledName.substr(0, 250 - SymbolID.size()) + SymbolID;
	I.MangledName = Twine(MangledName.substr(0, 250 - SymbolID.size())) + llvm::toStringRef(llvm::toHex(I.USR));

[clang-doc] concatenate SymbolIDs to truncated mangled names #159490

[clang-doc] concatenate SymbolIDs to truncated mangled names #159490

Uh oh!

Conversation

evelez7 commented Sep 18, 2025

Uh oh!

evelez7 commented Sep 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Sep 18, 2025

Uh oh!

ilovepi Sep 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

evelez7 Sep 23, 2025

Choose a reason for hiding this comment

Uh oh!

ilovepi Sep 23, 2025

Choose a reason for hiding this comment

Uh oh!

ilovepi Sep 23, 2025

Choose a reason for hiding this comment

Uh oh!

evelez7 commented Sep 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merge activity

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

evelez7 commented Sep 18, 2025 •

edited

Loading

ilovepi Sep 22, 2025 •

edited

Loading

evelez7 commented Sep 23, 2025 •

edited

Loading