Skip to content

Conversation

@kenballus
Copy link
Contributor

Starting in version 15, GCC emits a .base64 directive instead of .string or .ascii for char arrays of length >= 3.

See this godbolt link for an example.

This patch adds support for the .base64 directive to AsmParser.cpp, so tools like llvm-mc can process the output of GCC more effectively.

This addresses #165499.

@github-actions
Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot llvmbot added the llvm:mc Machine (object) code label Oct 29, 2025
@llvmbot
Copy link
Member

llvmbot commented Oct 29, 2025

@llvm/pr-subscribers-llvm-mc

Author: Ben Kallus (kenballus)

Changes

Starting in version 15, GCC emits a .base64 directive instead of .string or .ascii for char arrays of length >= 3.

See this godbolt link for an example.

This patch adds support for the .base64 directive to AsmParser.cpp, so tools like llvm-mc can process the output of GCC more effectively.

This addresses #165499.


Full diff: https://github.com/llvm/llvm-project/pull/165549.diff

1 Files Affected:

  • (modified) llvm/lib/MC/MCParser/AsmParser.cpp (+21)
diff --git a/llvm/lib/MC/MCParser/AsmParser.cpp b/llvm/lib/MC/MCParser/AsmParser.cpp
index dd1bc2be5feb4..54bb1451a5a73 100644
--- a/llvm/lib/MC/MCParser/AsmParser.cpp
+++ b/llvm/lib/MC/MCParser/AsmParser.cpp
@@ -46,6 +46,7 @@
 #include "llvm/MC/MCSymbolMachO.h"
 #include "llvm/MC/MCTargetOptions.h"
 #include "llvm/MC/MCValue.h"
+#include "llvm/Support/Base64.h"
 #include "llvm/Support/Casting.h"
 #include "llvm/Support/CommandLine.h"
 #include "llvm/Support/ErrorHandling.h"
@@ -530,6 +531,7 @@ class AsmParser : public MCAsmParser {
     DK_LTO_SET_CONDITIONAL,
     DK_CFI_MTE_TAGGED_FRAME,
     DK_MEMTAG,
+    DK_BASE64,
     DK_END
   };
 
@@ -552,6 +554,7 @@ class AsmParser : public MCAsmParser {
 
   // ".ascii", ".asciz", ".string"
   bool parseDirectiveAscii(StringRef IDVal, bool ZeroTerminated);
+  bool parseDirectiveBase64();                  // ".base64"
   bool parseDirectiveReloc(SMLoc DirectiveLoc); // ".reloc"
   bool parseDirectiveValue(StringRef IDVal,
                            unsigned Size);       // ".byte", ".long", ...
@@ -1953,6 +1956,8 @@ bool AsmParser::parseStatement(ParseStatementInfo &Info,
     case DK_ASCIZ:
     case DK_STRING:
       return parseDirectiveAscii(IDVal, true);
+    case DK_BASE64:
+      return parseDirectiveBase64();
     case DK_BYTE:
     case DK_DC_B:
       return parseDirectiveValue(IDVal, 1);
@@ -3076,6 +3081,21 @@ bool AsmParser::parseDirectiveAscii(StringRef IDVal, bool ZeroTerminated) {
   return parseMany(parseOp);
 }
 
+/// parseDirectiveBase64:
+//    ::= .base64 "string"
+bool AsmParser::parseDirectiveBase64() {
+  std::vector<char> Decoded;
+
+  std::string str;
+
+  if (parseEscapedString(str) || str.empty() || decodeBase64(str, Decoded)) {
+    return true;
+  }
+
+  getStreamer().emitBytes(std::string(Decoded.begin(), Decoded.end()));
+  return false;
+}
+
 /// parseDirectiveReloc
 ///  ::= .reloc expression , identifier [ , expression ]
 bool AsmParser::parseDirectiveReloc(SMLoc DirectiveLoc) {
@@ -5345,6 +5365,7 @@ void AsmParser::initializeDirectiveKindMap() {
   DirectiveKindMap[".asciz"] = DK_ASCIZ;
   DirectiveKindMap[".string"] = DK_STRING;
   DirectiveKindMap[".byte"] = DK_BYTE;
+  DirectiveKindMap[".base64"] = DK_BASE64;
   DirectiveKindMap[".short"] = DK_SHORT;
   DirectiveKindMap[".value"] = DK_VALUE;
   DirectiveKindMap[".2byte"] = DK_2BYTE;

@lenary
Copy link
Member

lenary commented Oct 29, 2025

Please add tests.

@dtcxzyw dtcxzyw requested a review from MaskRay October 29, 2025 17:12
@dtcxzyw dtcxzyw linked an issue Oct 29, 2025 that may be closed by this pull request
@kenballus
Copy link
Contributor Author

Tests added. If you'd like anything else tested, please let me know.

@github-actions
Copy link

github-actions bot commented Oct 31, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@kenballus kenballus force-pushed the add_base64_directive branch from 0626420 to 4ca5466 Compare October 31, 2025 13:06
Copy link
Member

@lenary lenary left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Please wait for any more comments from Maskray before landing.

@kenballus
Copy link
Contributor Author

One more thing to consider:

The .base64 directive from GNU as accepts a comma-separated list of strings, and decodes them all. However, GCC doesn't emit a comma-separated list for this directive, and the point of this patch is to be more compatible with GCC, not as.

If we want llvm-mc to be perfectly compatible with as, then I should probably add that functionality. If we instead only want to be compatible with GCC without complicating things too much, then this is likely fine as-is. I'll leave the decision to you all.

@lenary
Copy link
Member

lenary commented Nov 3, 2025

Oh. Please can you add support for multiple string arguments (to align with gnu as), as we would like portability between as-compatible code and Clang/LLVM's integrated assembler.

@kenballus
Copy link
Contributor Author

Oh. Please can you add support for multiple string arguments (to align with gnu as), as we would like portability between as-compatible code and Clang/LLVM's integrated assembler.

Will do!

@kenballus
Copy link
Contributor Author

Oh. Please can you add support for multiple string arguments (to align with gnu as), as we would like portability between as-compatible code and Clang/LLVM's integrated assembler.

Done.

@kenballus kenballus requested a review from lenary November 3, 2025 23:28
@lenary
Copy link
Member

lenary commented Nov 4, 2025

You probably need to merge main into your branch to get the tests to pass, I'm not sure though. The failures are unrelated as far as I can tell.

Can you also check your code formatting again? The comment above has been re-used to show the diff for where things need to be updated.

@kenballus kenballus force-pushed the add_base64_directive branch from 6e28554 to 0df7acd Compare November 4, 2025 14:33
@kenballus
Copy link
Contributor Author

You probably need to merge main into your branch to get the tests to pass, I'm not sure though. The failures are unrelated as far as I can tell.

Can you also check your code formatting again? The comment above has been re-used to show the diff for where things need to be updated.

Got it. Now merged and reformatted.

Copy link
Member

@lenary lenary left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for the updates.

@lenary
Copy link
Member

lenary commented Nov 17, 2025

@kenballus please can you follow these instructions: https://llvm.org/docs/DeveloperPolicy.html#github-email-address (and then I will merge this PR)

@kenballus
Copy link
Contributor Author

@kenballus please can you follow these instructions: https://llvm.org/docs/DeveloperPolicy.html#github-email-address (and then I will merge this PR)

Done.

@lenary lenary merged commit 6245a4f into llvm:main Nov 17, 2025
10 checks passed
@github-actions
Copy link

@kenballus Congratulations on having your first Pull Request (PR) merged into the LLVM Project!

Your changes will be combined with recent changes from other authors, then tested by our build bots. If there is a problem with a build, you may receive a report in an email or a comment on this PR.

Please check whether problems have been caused by your change specifically, as the builds can include changes from many authors. It is not uncommon for your change to be included in a build that fails due to someone else's changes, or infrastructure issues.

How to do this, and the rest of the post-merge process, is covered in detail here.

If your change does cause a problem, it may be reverted, or you can revert it yourself. This is a normal part of LLVM development. You can fix your changes and open a new PR to merge them again.

If you don't get any reports, no action is required from you. Your changes are working as expected, well done!

@kenballus kenballus deleted the add_base64_directive branch November 19, 2025 20:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

llvm:mc Machine (object) code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

MCParser/AsmParser.cpp doesn't support the .base64 directive

4 participants