Skip to content

Conversation

@evelez7
Copy link
Member

@evelez7 evelez7 commented Sep 18, 2025

Previously, if mangled names were too long to be used as filenames, the
object's SymbolID was used as a filename. This worked for length
restrictions, but made URLs/filenames inconsistent. This patch truncates
the mangled name and appends the SymbolID. Thus, we can keep some
context in the URL/filename while preserving uniqueness.

Previously, if mangled names were too long to be used as filenames, the
object's SymbolID was used as a filename. This worked for length
restrictions, but made URLs/filenames inconsistent. This patch truncates
the mangled name and appends the SymbolID. Thus, we can keep some
context in the URL/filename while preserving uniqueness.
@evelez7 evelez7 marked this pull request as ready for review September 18, 2025 01:45
Copy link
Member Author

evelez7 commented Sep 18, 2025

This stack of pull requests is managed by Graphite. Learn more about stacking.

@llvmbot
Copy link
Member

llvmbot commented Sep 18, 2025

@llvm/pr-subscribers-clang-tools-extra

Author: Erick Velez (evelez7)

Changes

Previously, if mangled names were too long to be used as filenames, the
object's SymbolID was used as a filename. This worked for length
restrictions, but made URLs/filenames inconsistent. This patch truncates
the mangled name and appends the SymbolID. Thus, we can keep some
context in the URL/filename while preserving uniqueness.


Full diff: https://github.com/llvm/llvm-project/pull/159490.diff

2 Files Affected:

  • (modified) clang-tools-extra/clang-doc/Serialize.cpp (+4-6)
  • (modified) clang-tools-extra/test/clang-doc/long-name.cpp (+2-2)
diff --git a/clang-tools-extra/clang-doc/Serialize.cpp b/clang-tools-extra/clang-doc/Serialize.cpp
index dd7cd0b2ae736..186f634dd892a 100644
--- a/clang-tools-extra/clang-doc/Serialize.cpp
+++ b/clang-tools-extra/clang-doc/Serialize.cpp
@@ -780,12 +780,10 @@ static void populateSymbolInfo(SymbolInfo &I, const T *D, const FullComment *C,
     MangledStream << D->getNameAsString();
   // A 250 length limit was chosen since 255 is a common limit across
   // different filesystems, with a 5 character buffer for file extensions.
-  if (MangledName.size() > 250)
-    // File creation fails if the mangled name is too long, so default to the
-    // USR. We should look for a better check since filesystems differ in
-    // maximum filename length
-    I.MangledName = llvm::toStringRef(llvm::toHex(I.USR));
-  else
+  if (MangledName.size() > 250) {
+    auto SymbolID = llvm::toStringRef(llvm::toHex(I.USR)).str();
+    I.MangledName = MangledName.substr(0, 250 - SymbolID.size()) + SymbolID;
+  } else
     I.MangledName = MangledName;
   delete Mangler;
 }
diff --git a/clang-tools-extra/test/clang-doc/long-name.cpp b/clang-tools-extra/test/clang-doc/long-name.cpp
index b33337588da19..db96fc4aebe5a 100644
--- a/clang-tools-extra/test/clang-doc/long-name.cpp
+++ b/clang-tools-extra/test/clang-doc/long-name.cpp
@@ -9,6 +9,6 @@ struct ThisStructHasANameThatResultsInAMangledNameThatIsExactly250CharactersLong
 struct ThisStructHasANameThatResultsInAMangledNameThatIsExactly251CharactersLongThatIsSupposedToTestTheFilenameLengthLimitsWithinClangDocInOrdertoSeeifclangdocwillcrashornotdependingonthelengthofthestructIfTheLengthIsTooLongThenClangDocWillCrashAnd123 {};
 
 // CHECK-JSON: ThisStructHasANameThatResultsInAMangledNameThatIsExactly250CharactersLongThatIsSupposedToTestTheFilenameLengthLimitsWithinClangDocInOrdertoSeeifclangdocwillcrashornotdependingonthelengthofthestructIfTheLengthIsTooLongThenClangDocWillCrashAnd12.json
-// CHECK-JSON: {{[0-9A-F]*}}.json
+// CHECK-JSON: _ZTV244ThisStructHasANameThatResultsInAMangledNameThatIsExactly251CharactersLongThatIsSupposedToTestTheFilenameLengthLimitsWithinClangDocInOrdertoSeeifclangdocwillcrashornotdependingonthelengthofthestructIfTheL29DE8558215A13A506661C0E01E50AA3E5C9C7FA.json 
 // CHECK-HTML: ThisStructHasANameThatResultsInAMangledNameThatIsExactly250CharactersLongThatIsSupposedToTestTheFilenameLengthLimitsWithinClangDocInOrdertoSeeifclangdocwillcrashornotdependingonthelengthofthestructIfTheLengthIsTooLongThenClangDocWillCrashAnd12.html
-// CHECK-HTML: {{[0-9A-F]*}}.html
+// CHECK-HTML: _ZTV244ThisStructHasANameThatResultsInAMangledNameThatIsExactly251CharactersLongThatIsSupposedToTestTheFilenameLengthLimitsWithinClangDocInOrdertoSeeifclangdocwillcrashornotdependingonthelengthofthestructIfTheL29DE8558215A13A506661C0E01E50AA3E5C9C7FA.html

Comment on lines +784 to +785
auto SymbolID = llvm::toStringRef(llvm::toHex(I.USR)).str();
I.MangledName = MangledName.substr(0, 250 - SymbolID.size()) + SymbolID;
Copy link
Contributor

@ilovepi ilovepi Sep 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its a micro optimization, but you can save 2 allocations by using Twine to concat in one go in the assignment. It does leave the code a bit less readable, but not too badly. Cant recall if the code below needs to be wrapped in ().str() or not either. But I'm also fine if you leave this as is.

Suggested change
auto SymbolID = llvm::toStringRef(llvm::toHex(I.USR)).str();
I.MangledName = MangledName.substr(0, 250 - SymbolID.size()) + SymbolID;
I.MangledName = Twine(MangledName.substr(0, 250 - SymbolID.size())) + llvm::toStringRef(llvm::toHex(I.USR));

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried this but I need to know the hex version of the USR's length to subtract, not the Info's hashed SymbolID. I think the hex value is guaranteed to be at most 40. Do you know if that's an upper limit? There's a reference to that in the YAML generator. If it is a constant, this would work and be better than querying for the length since anything greater than that would break this anyways. The test has a 40 character hex.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SymbolID is std::array<uint8_t,20>, so I'd guess thats why the length is fixed at 40, since you'd get 2chars per byte. the SHA1.cpp returns the same std::array<uint8_t,20> type, so I beleive you can rely on that as an upper limit, and sha1 has a 20 byte fixed size (160-bit).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in any case, we can fix that later. its not important in the bigger scheme of things, and as i said is really just a micro optimization. we only do this once per file, so its not terrible, even for a large amount of files.

Copy link
Member Author

evelez7 commented Sep 23, 2025

Merge activity

  • Sep 23, 5:22 PM UTC: A user started a stack merge that includes this pull request via Graphite.
  • Sep 23, 5:24 PM UTC: @evelez7 merged this pull request with Graphite.

@evelez7 evelez7 merged commit e80a207 into main Sep 23, 2025
13 checks passed
@evelez7 evelez7 deleted the users/evelez7/clang-doc-mangled-names-usr branch September 23, 2025 17:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants