- 
                Notifications
    You must be signed in to change notification settings 
- Fork 15k
[clang-doc] concatenate SymbolIDs to truncated mangled names #159490
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Previously, if mangled names were too long to be used as filenames, the object's SymbolID was used as a filename. This worked for length restrictions, but made URLs/filenames inconsistent. This patch truncates the mangled name and appends the SymbolID. Thus, we can keep some context in the URL/filename while preserving uniqueness.
| @llvm/pr-subscribers-clang-tools-extra Author: Erick Velez (evelez7) ChangesPreviously, if mangled names were too long to be used as filenames, the Full diff: https://github.com/llvm/llvm-project/pull/159490.diff 2 Files Affected: 
 diff --git a/clang-tools-extra/clang-doc/Serialize.cpp b/clang-tools-extra/clang-doc/Serialize.cpp
index dd7cd0b2ae736..186f634dd892a 100644
--- a/clang-tools-extra/clang-doc/Serialize.cpp
+++ b/clang-tools-extra/clang-doc/Serialize.cpp
@@ -780,12 +780,10 @@ static void populateSymbolInfo(SymbolInfo &I, const T *D, const FullComment *C,
     MangledStream << D->getNameAsString();
   // A 250 length limit was chosen since 255 is a common limit across
   // different filesystems, with a 5 character buffer for file extensions.
-  if (MangledName.size() > 250)
-    // File creation fails if the mangled name is too long, so default to the
-    // USR. We should look for a better check since filesystems differ in
-    // maximum filename length
-    I.MangledName = llvm::toStringRef(llvm::toHex(I.USR));
-  else
+  if (MangledName.size() > 250) {
+    auto SymbolID = llvm::toStringRef(llvm::toHex(I.USR)).str();
+    I.MangledName = MangledName.substr(0, 250 - SymbolID.size()) + SymbolID;
+  } else
     I.MangledName = MangledName;
   delete Mangler;
 }
diff --git a/clang-tools-extra/test/clang-doc/long-name.cpp b/clang-tools-extra/test/clang-doc/long-name.cpp
index b33337588da19..db96fc4aebe5a 100644
--- a/clang-tools-extra/test/clang-doc/long-name.cpp
+++ b/clang-tools-extra/test/clang-doc/long-name.cpp
@@ -9,6 +9,6 @@ struct ThisStructHasANameThatResultsInAMangledNameThatIsExactly250CharactersLong
 struct ThisStructHasANameThatResultsInAMangledNameThatIsExactly251CharactersLongThatIsSupposedToTestTheFilenameLengthLimitsWithinClangDocInOrdertoSeeifclangdocwillcrashornotdependingonthelengthofthestructIfTheLengthIsTooLongThenClangDocWillCrashAnd123 {};
 
 // CHECK-JSON: ThisStructHasANameThatResultsInAMangledNameThatIsExactly250CharactersLongThatIsSupposedToTestTheFilenameLengthLimitsWithinClangDocInOrdertoSeeifclangdocwillcrashornotdependingonthelengthofthestructIfTheLengthIsTooLongThenClangDocWillCrashAnd12.json
-// CHECK-JSON: {{[0-9A-F]*}}.json
+// CHECK-JSON: _ZTV244ThisStructHasANameThatResultsInAMangledNameThatIsExactly251CharactersLongThatIsSupposedToTestTheFilenameLengthLimitsWithinClangDocInOrdertoSeeifclangdocwillcrashornotdependingonthelengthofthestructIfTheL29DE8558215A13A506661C0E01E50AA3E5C9C7FA.json 
 // CHECK-HTML: ThisStructHasANameThatResultsInAMangledNameThatIsExactly250CharactersLongThatIsSupposedToTestTheFilenameLengthLimitsWithinClangDocInOrdertoSeeifclangdocwillcrashornotdependingonthelengthofthestructIfTheLengthIsTooLongThenClangDocWillCrashAnd12.html
-// CHECK-HTML: {{[0-9A-F]*}}.html
+// CHECK-HTML: _ZTV244ThisStructHasANameThatResultsInAMangledNameThatIsExactly251CharactersLongThatIsSupposedToTestTheFilenameLengthLimitsWithinClangDocInOrdertoSeeifclangdocwillcrashornotdependingonthelengthofthestructIfTheL29DE8558215A13A506661C0E01E50AA3E5C9C7FA.html
 | 
| auto SymbolID = llvm::toStringRef(llvm::toHex(I.USR)).str(); | ||
| I.MangledName = MangledName.substr(0, 250 - SymbolID.size()) + SymbolID; | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its a micro optimization, but you can save 2 allocations by using Twine to concat in one go in the assignment. It does leave the code a bit less readable, but not too badly. Cant recall if the code below needs to be wrapped in ().str() or not either. But I'm also fine if you leave this as is.
| auto SymbolID = llvm::toStringRef(llvm::toHex(I.USR)).str(); | |
| I.MangledName = MangledName.substr(0, 250 - SymbolID.size()) + SymbolID; | |
| I.MangledName = Twine(MangledName.substr(0, 250 - SymbolID.size())) + llvm::toStringRef(llvm::toHex(I.USR)); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried this but I need to know the hex version of the USR's length to subtract, not the Info's hashed SymbolID. I think the hex value is guaranteed to be at most 40. Do you know if that's an upper limit? There's a reference to that in the YAML generator. If it is a constant, this would work and be better than querying for the length since anything greater than that would break this anyways. The test has a 40 character hex.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SymbolID is std::array<uint8_t,20>, so I'd guess thats why the length is fixed at 40, since you'd get 2chars per byte. the SHA1.cpp returns the same  std::array<uint8_t,20> type, so I beleive you can rely on that as an upper limit, and sha1 has a 20 byte fixed size (160-bit).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in any case, we can fix that later. its not important in the bigger scheme of things, and as i said is really just a micro optimization. we only do this once per file, so its not terrible, even for a large amount of files.

Previously, if mangled names were too long to be used as filenames, the
object's SymbolID was used as a filename. This worked for length
restrictions, but made URLs/filenames inconsistent. This patch truncates
the mangled name and appends the SymbolID. Thus, we can keep some
context in the URL/filename while preserving uniqueness.