Skip to content

Conversation

@jurahul
Copy link
Contributor

@jurahul jurahul commented Oct 8, 2024

Add support for C style comments in LLVM assembly.

@jurahul jurahul force-pushed the ll_c_style_comments branch 2 times, most recently from 60f0ab0 to 629e20e Compare October 24, 2024 21:25
@jurahul jurahul marked this pull request as ready for review October 25, 2024 03:42
@jurahul jurahul requested review from arsenm and nikic October 25, 2024 03:43
@arsenm arsenm added the llvm:ir label Oct 26, 2024
@llvmbot
Copy link
Member

llvmbot commented Oct 26, 2024

@llvm/pr-subscribers-llvm-ir

Author: Rahul Joshi (jurahul)

Changes

Add support for C style comments in LLVM assembly.


Full diff: https://github.com/llvm/llvm-project/pull/111554.diff

5 Files Affected:

  • (modified) llvm/include/llvm/AsmParser/LLLexer.h (+2)
  • (modified) llvm/lib/AsmParser/LLLexer.cpp (+47-3)
  • (added) llvm/test/Assembler/c-style-comment.ll (+30)
  • (added) llvm/test/Assembler/invalid-c-style-comment0.ll (+6)
  • (added) llvm/test/Assembler/invalid-c-style-comment1.ll (+11)
diff --git a/llvm/include/llvm/AsmParser/LLLexer.h b/llvm/include/llvm/AsmParser/LLLexer.h
index a9f51fb925f5d5..8e0c5638eef37d 100644
--- a/llvm/include/llvm/AsmParser/LLLexer.h
+++ b/llvm/include/llvm/AsmParser/LLLexer.h
@@ -94,7 +94,9 @@ namespace llvm {
     lltok::Kind LexToken();
 
     int getNextChar();
+    int peekNextChar() const;
     void SkipLineComment();
+    bool SkipCComment();
     lltok::Kind ReadString(lltok::Kind kind);
     bool ReadVarName();
 
diff --git a/llvm/lib/AsmParser/LLLexer.cpp b/llvm/lib/AsmParser/LLLexer.cpp
index 759db6db60774c..dbccfcde74caa8 100644
--- a/llvm/lib/AsmParser/LLLexer.cpp
+++ b/llvm/lib/AsmParser/LLLexer.cpp
@@ -175,17 +175,25 @@ LLLexer::LLLexer(StringRef StartBuf, SourceMgr &SM, SMDiagnostic &Err,
 }
 
 int LLLexer::getNextChar() {
-  char CurChar = *CurPtr++;
+  int NextChar = peekNextChar();
+  // Keeping CurPtr unchanged at EOF, so that another call to `getNextChar`
+  // returns EOF again.
+  if (NextChar != EOF)
+    ++CurPtr;
+  return NextChar;
+}
+
+int LLLexer::peekNextChar() const {
+  char CurChar = *CurPtr;
   switch (CurChar) {
   default: return (unsigned char)CurChar;
   case 0:
     // A nul character in the stream is either the end of the current buffer or
     // a random nul in the file.  Disambiguate that here.
-    if (CurPtr-1 != CurBuf.end())
+    if (CurPtr != CurBuf.end())
       return 0;  // Just whitespace.
 
     // Otherwise, return end of file.
-    --CurPtr;  // Another call to lex will return EOF again.
     return EOF;
   }
 }
@@ -251,6 +259,10 @@ lltok::Kind LLLexer::LexToken() {
     case ',': return lltok::comma;
     case '*': return lltok::star;
     case '|': return lltok::bar;
+    case '/':
+      if (peekNextChar() == '*' && SkipCComment())
+        return lltok::Error;
+      continue;
     }
   }
 }
@@ -262,6 +274,38 @@ void LLLexer::SkipLineComment() {
   }
 }
 
+/// SkipCComment - This skips C-style /**/ comments.  The only difference from C
+/// is that we allow nesting.
+bool LLLexer::SkipCComment() {
+  getNextChar(); // skip the star.
+  unsigned CommentDepth = 1;
+
+  while (true) {
+    int CurChar = getNextChar();
+    switch (CurChar) {
+    case EOF:
+      LexError("Unterminated comment!");
+      return true;
+    case '*':
+      // End of the comment?
+      if (peekNextChar() != '/')
+        break;
+
+      getNextChar(); // End the '/'.
+      if (--CommentDepth == 0)
+        return false;
+      break;
+    case '/':
+      // Start of a nested comment?
+      if (peekNextChar() != '*')
+        break;
+      getNextChar(); // Eat the '*'.
+      ++CommentDepth;
+      break;
+    }
+  }
+}
+
 /// Lex all tokens that start with an @ character.
 ///   GlobalVar   @\"[^\"]*\"
 ///   GlobalVar   @[-a-zA-Z$._][-a-zA-Z$._0-9]*
diff --git a/llvm/test/Assembler/c-style-comment.ll b/llvm/test/Assembler/c-style-comment.ll
new file mode 100644
index 00000000000000..b24a3e560e0e90
--- /dev/null
+++ b/llvm/test/Assembler/c-style-comment.ll
@@ -0,0 +1,30 @@
+; RUN: llvm-as < %s | llvm-dis | FileCheck %s 
+
+/* Simple C style comment */
+
+; CHECK: @B = external global i32
+@B = external global i32
+
+/* multiline C ctyle comment at "top-level"
+ * This is the second line
+ * and this is third
+ */
+
+
+; CHECK: @foo
+define <4 x i1> @foo(<4 x float> %a, <4 x float> %b) nounwind {
+entry: /* inline comment */
+  %cmp = fcmp olt <4 x float> %a, /* to be ignored */ %b
+  ret <4 x i1> %cmp /* ignore */
+ /* C style nested comment
+    /* Nest
+       /*
+        * ; ignored
+	*/
+    */
+ */
+
+}
+
+/* End of the assembly file */
+
diff --git a/llvm/test/Assembler/invalid-c-style-comment0.ll b/llvm/test/Assembler/invalid-c-style-comment0.ll
new file mode 100644
index 00000000000000..e3d1c6f4ef732d
--- /dev/null
+++ b/llvm/test/Assembler/invalid-c-style-comment0.ll
@@ -0,0 +1,6 @@
+; RUN: not llvm-as --disable-output %s 2>&1 | FileCheck %s -DFILE=%s
+
+@B = external global i32
+
+; CHECK: [[FILE]]:[[@LINE+1]]:1: error: Unterminated comment!
+/* End of the assembly file
diff --git a/llvm/test/Assembler/invalid-c-style-comment1.ll b/llvm/test/Assembler/invalid-c-style-comment1.ll
new file mode 100644
index 00000000000000..9ca97af4b4875c
--- /dev/null
+++ b/llvm/test/Assembler/invalid-c-style-comment1.ll
@@ -0,0 +1,11 @@
+; RUN: not llvm-as --disable-output %s 2>&1 | FileCheck %s -DFILE=%s
+
+@B = external global i32
+
+; CHECK: [[FILE]]:[[@LINE+1]]:1: error: Unterminated comment!
+/* End of the assembly file
+   /* Unterminated comment with multiple nesting depths */
+   /* /* ignored */ */
+   /* /* /* ignored */ */ */
+* /   
+

@jurahul jurahul force-pushed the ll_c_style_comments branch from 629e20e to 7c9b538 Compare October 26, 2024 01:01
@jurahul jurahul requested a review from arsenm October 26, 2024 01:01
@jurahul
Copy link
Contributor Author

jurahul commented Oct 30, 2024

@arsenm any further comments? Are we at this point waiting for some consensus to be reached around the RFC?

@arsenm
Copy link
Contributor

arsenm commented Oct 31, 2024

This is pretty independent of the RFC

@jurahul
Copy link
Contributor Author

jurahul commented Oct 31, 2024 via email

@jurahul
Copy link
Contributor Author

jurahul commented Oct 31, 2024 via email

@jurahul jurahul force-pushed the ll_c_style_comments branch from 7c9b538 to bbc9b7d Compare October 31, 2024 21:51
Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be documented in LangRef.

@jurahul jurahul force-pushed the ll_c_style_comments branch from bbc9b7d to d6e23a6 Compare November 1, 2024 13:31
@jurahul
Copy link
Contributor Author

jurahul commented Nov 1, 2024

This needs to be documented in LangRef.

Done.

@jurahul jurahul requested review from jrtc27, nikic and topperc November 1, 2024 13:32
@jurahul jurahul requested a review from nikic November 1, 2024 14:05
@jurahul jurahul force-pushed the ll_c_style_comments branch from d6e23a6 to 43e3a01 Compare November 1, 2024 16:08
@jurahul jurahul requested a review from nikic November 1, 2024 16:38
Add support for C style comments in LLVM assembly.
@jurahul jurahul force-pushed the ll_c_style_comments branch from 43e3a01 to c7d916d Compare November 1, 2024 17:12
Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but please wait for a second approval.

@jurahul
Copy link
Contributor Author

jurahul commented Nov 1, 2024

LGTM, but please wait for a second approval.

Thanks, will do.

@jurahul
Copy link
Contributor Author

jurahul commented Nov 4, 2024

@arsenm @jrtc27 can one of you give an additional approval?

@jurahul
Copy link
Contributor Author

jurahul commented Nov 5, 2024

Thanks @arsenm

@jurahul jurahul merged commit b8ac87f into llvm:main Nov 5, 2024
9 checks passed
@jurahul jurahul deleted the ll_c_style_comments branch November 5, 2024 21:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants