Skip to content

Conversation

@yuxuanchen1997
Copy link
Member

As of today, one can make clang serialize diagnostics to a file with the flag -serialize-diagnostics. For example:

// file errors_and_warning.c
#include <stdio.h>

int forgot_return() {}

int main() {
  printf("Hello world!")
}

There are obvious issues with this program. And as expected, clang emitted two diagnostics when compiling. We can save it into a file.

$ clang errors_and_warning.c -serialize-diagnostics errors_and_warning.c.diag
errors_and_warning.c:3:22: warning: non-void function does not return a value [-Wreturn-type]
    3 | int forgot_return() {}
      |                      ^
errors_and_warning.c:6:25: error: expected ';' after expression
    6 |   printf("Hello world!")
      |                         ^
      |                         ;
1 warning and 1 error generated.

However, based on my research, we don't yet have a clang tool to read the serialized binary. Most likely this flag is here for IDE integration.

This patch is adding a tool to read serialized diagnostics. Currently, it's fairly minimal as it just prints everything it knows to stdout. In the future, we can extend it to export diagnostics to different formats.

$ clang-read-diagnostics errors_and_warning.c.diag                                                                                                                                                                                                                                         
errors_and_warning.c:3:22: non-void function does not return a value [category='Semantic Issue', flag=return-type]
errors_and_warning.c:6:25: expected ';' after expression [category='Parse Issue']

@llvmbot
Copy link
Member

llvmbot commented Dec 3, 2024

@llvm/pr-subscribers-clang-tools-extra

Author: Yuxuan Chen (yuxuanchen1997)

Changes

As of today, one can make clang serialize diagnostics to a file with the flag -serialize-diagnostics. For example:

// file errors_and_warning.c
#include &lt;stdio.h&gt;

int forgot_return() {}

int main() {
  printf("Hello world!")
}

There are obvious issues with this program. And as expected, clang emitted two diagnostics when compiling. We can save it into a file.

$ clang errors_and_warning.c -serialize-diagnostics errors_and_warning.c.diag
errors_and_warning.c:3:22: warning: non-void function does not return a value [-Wreturn-type]
    3 | int forgot_return() {}
      |                      ^
errors_and_warning.c:6:25: error: expected ';' after expression
    6 |   printf("Hello world!")
      |                         ^
      |                         ;
1 warning and 1 error generated.

However, based on my research, we don't yet have a clang tool to read the serialized binary. Most likely this flag is here for IDE integration.

This patch is adding a tool to read serialized diagnostics. Currently, it's fairly minimal as it just prints everything it knows to stdout. In the future, we can extend it to export diagnostics to different formats.

$ clang-read-diagnostics errors_and_warning.c.diag                                                                                                                                                                                                                                         
errors_and_warning.c:3:22: non-void function does not return a value [category='Semantic Issue', flag=return-type]
errors_and_warning.c:6:25: expected ';' after expression [category='Parse Issue']

---
Full diff: https://github.com/llvm/llvm-project/pull/118522.diff


3 Files Affected:

- (modified) clang-tools-extra/CMakeLists.txt (+1) 
- (added) clang-tools-extra/clang-read-diagnostics/CMakeLists.txt (+10) 
- (added) clang-tools-extra/clang-read-diagnostics/ClangReadDiagnostics.cpp (+126) 


``````````diff
diff --git a/clang-tools-extra/CMakeLists.txt b/clang-tools-extra/CMakeLists.txt
index 6b6f2b1ca22765..7ff17fa4896939 100644
--- a/clang-tools-extra/CMakeLists.txt
+++ b/clang-tools-extra/CMakeLists.txt
@@ -25,6 +25,7 @@ add_subdirectory(clang-doc)
 add_subdirectory(clang-include-fixer)
 add_subdirectory(clang-move)
 add_subdirectory(clang-query)
+add_subdirectory(clang-read-diagnostics)
 add_subdirectory(include-cleaner)
 add_subdirectory(pp-trace)
 add_subdirectory(tool-template)
diff --git a/clang-tools-extra/clang-read-diagnostics/CMakeLists.txt b/clang-tools-extra/clang-read-diagnostics/CMakeLists.txt
new file mode 100644
index 00000000000000..cb615caca4d2f3
--- /dev/null
+++ b/clang-tools-extra/clang-read-diagnostics/CMakeLists.txt
@@ -0,0 +1,10 @@
+include_directories(${CMAKE_CURRENT_SOURCE_DIR}/..)
+
+add_clang_tool(clang-read-diagnostics
+  ClangReadDiagnostics.cpp
+  )
+
+clang_target_link_libraries(clang-read-diagnostics
+  PRIVATE
+  clangFrontend
+  )
diff --git a/clang-tools-extra/clang-read-diagnostics/ClangReadDiagnostics.cpp b/clang-tools-extra/clang-read-diagnostics/ClangReadDiagnostics.cpp
new file mode 100644
index 00000000000000..e7ced459b2f06d
--- /dev/null
+++ b/clang-tools-extra/clang-read-diagnostics/ClangReadDiagnostics.cpp
@@ -0,0 +1,126 @@
+//===---- ClangReadDiagnostics.cpp - clang-read-diagnostics tool -----------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// This tool is for reading clang diagnostics files from -serialize-diagnostics.
+//
+// Example usage:
+//
+// $ clang -serialize-diagnostics foo.c.diag foo.c
+// $ clang-read-diagnostics foo.c.diag
+//
+//===----------------------------------------------------------------------===//
+
+#include "clang/Frontend/SerializedDiagnosticReader.h"
+#include "llvm/ADT/DenseMap.h"
+#include "llvm/Support/CommandLine.h"
+#include "llvm/Support/Signals.h"
+
+#include <string>
+
+using namespace clang;
+using namespace clang::serialized_diags;
+using namespace llvm;
+
+static cl::list<std::string> InputFiles(cl::Sink, cl::desc("<input files...>"), cl::Required);
+
+class BasicSerializedDiagnosticReader : public SerializedDiagnosticReader {
+public:
+  struct DecodedDiagnostics {
+    unsigned Severity;
+    StringRef Filename;
+    unsigned Line;
+    unsigned Col;
+    StringRef Category;
+    StringRef Flag;
+    StringRef Message;
+  };
+
+  SmallVector<DecodedDiagnostics> getDiagnostics() {
+    SmallVector<DecodedDiagnostics> Ret;
+    for (const auto &Diag : Diagnostics_) {
+      auto Filename = FilenameIdx_.at(Diag.Location.FileID);
+      auto Category = CategoryIdx_.at(Diag.Category);
+      auto Flag = Diag.Flag == 0 ? "" : FlagIdx_.at(Diag.Flag);
+      Ret.emplace_back(DecodedDiagnostics{Diag.Severity, Filename,
+                                          Diag.Location.Line, Diag.Location.Col,
+                                          Category, Flag, Diag.Message});
+    }
+    return Ret;
+  }
+
+  void dump() {
+    for (const auto &Diag : getDiagnostics()) {
+      llvm::dbgs() << Diag.Filename << ":" << Diag.Line << ":" << Diag.Col
+                   << ": " << Diag.Message << " [category=\'" << Diag.Category
+                   << "'";
+
+      if (!Diag.Flag.empty())
+        llvm::dbgs() << ", flag=" << Diag.Flag;
+
+      llvm::dbgs() << "]" << "\n";
+    }
+  }
+
+protected:
+  virtual std::error_code visitCategoryRecord(unsigned ID,
+                                              StringRef Name) override {
+    const auto &[_, Inserted] = CategoryIdx_.try_emplace(ID, Name);
+    assert(Inserted && "duplicate IDs");
+    return {};
+  }
+
+  virtual std::error_code visitDiagFlagRecord(unsigned ID,
+                                              StringRef Name) override {
+    const auto &[_, Inserted] = FlagIdx_.try_emplace(ID, Name);
+    assert(Inserted && "duplicate IDs");
+    return {};
+  }
+
+  virtual std::error_code visitFilenameRecord(unsigned ID, unsigned Size,
+                                              unsigned Timestamp,
+                                              StringRef Name) override {
+    const auto &[_, Inserted] = FilenameIdx_.try_emplace(ID, Name);
+    assert(Inserted && "duplicate IDs");
+    return {};
+  }
+
+  virtual std::error_code
+  visitDiagnosticRecord(unsigned Severity, const Location &Location,
+                        unsigned Category, unsigned Flag, StringRef Message) override {
+    Diagnostics_.emplace_back(
+        RawDiagnostic{Severity, Location, Category, Flag, Message});
+    return {};
+  }
+
+private:
+  struct RawDiagnostic {
+    unsigned Severity;
+    Location Location;
+    unsigned Category;
+    unsigned Flag;
+    StringRef Message;
+  };
+
+  DenseMap<unsigned, StringRef> CategoryIdx_;
+  DenseMap<unsigned, StringRef> FlagIdx_;
+  DenseMap<unsigned, StringRef> FilenameIdx_;
+  std::vector<RawDiagnostic> Diagnostics_;
+};
+
+int main(int argc, const char **argv) {
+  cl::ParseCommandLineOptions(argc, argv);
+  llvm::sys::PrintStackTraceOnErrorSignal(argv[0]);
+
+  for (const auto &File : InputFiles) {
+    BasicSerializedDiagnosticReader BSDR{};
+    BSDR.readDiagnostics(File);
+    BSDR.dump();
+  }
+
+  return 0;
+}

@github-actions
Copy link

github-actions bot commented Dec 3, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@yuxuanchen1997
Copy link
Member Author

Looking at the test failure, it appears that when only building check-clang-tools, no clang binary was built. So the test driver cannot find clang. Maybe this diagnostics binary file should be checked in for the test? That will not catch future errors if the diagnostics binary format changes.

@jroelofs
Copy link
Contributor

jroelofs commented Dec 4, 2024

Looking at the test failure, it appears that when only building check-clang-tools, no clang binary was built. So the test driver cannot find clang. Maybe this diagnostics binary file should be checked in for the test? That will not catch future errors if the diagnostics binary format changes.

We should stay away from committed binaries, per the xz fiasco. A better way to solve this, I think, would be to add a REQUIRES: clause for it. See the available_features.add() calls in llvm/utils/lit/lit/llvm/config.py.

@yuxuanchen1997
Copy link
Member Author

Looking at the test failure, it appears that when only building check-clang-tools, no clang binary was built. So the test driver cannot find clang. Maybe this diagnostics binary file should be checked in for the test? That will not catch future errors if the diagnostics binary format changes.

We should stay away from committed binaries, per the xz fiasco. A better way to solve this, I think, would be to add a REQUIRES: clause for it. See the available_features.add() calls in llvm/utils/lit/lit/llvm/config.py.

Great. This should work now.

@@ -0,0 +1,15 @@
// REQUIRES: clang
// RUN: not %clang %s -serialize-diagnostics %s.diag
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

recommend putting the new file under %t instead of %s, so this doesn't pollute the source dir.

@jroelofs
Copy link
Contributor

jroelofs commented Dec 4, 2024

Looks okay to me, but let me tag some clang-tools-extra folks for a second look.

Copy link
Collaborator

@AaronBallman AaronBallman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to understand the motivation for the tool a bit better. IMO, the goal should be to deprecate serialized diagnostics because we are now able to emit diagnostics to SARIF instead, and the whole point to SARIF is to be a machine-readable interchange format for diagnostic information to be shared between tools. So if we want to promote SARIF for sharing diagnostics between tools, it seems like we don't really need a tool to deserialized diagnostics we've serialized.

CC @cjdb for awareness, as he's been working on SARIF support.

@yuxuanchen1997
Copy link
Member Author

yuxuanchen1997 commented Dec 6, 2024

I'd like to understand the motivation for the tool a bit better. IMO, the goal should be to deprecate serialized diagnostics because we are now able to emit diagnostics to SARIF instead, and the whole point to SARIF is to be a machine-readable interchange format for diagnostic information to be shared between tools. So if we want to promote SARIF for sharing diagnostics between tools, it seems like we don't really need a tool to deserialized diagnostics we've serialized.

CC @cjdb for awareness, as he's been working on SARIF support.

The goal of these tools is to eventually be able to obtain clang diagnostics as structural data. Say, I want the compiler to emit warnings and learn about the number of warnings I will get. For us, this is something that's worth putting into a database so that we can nudge relevant teams to make changes.

I didn't know that we plan to deprecate serialized diagnostics. I would have done differently if I had known about SARIF.

@AaronBallman
Copy link
Collaborator

I'd like to understand the motivation for the tool a bit better. IMO, the goal should be to deprecate serialized diagnostics because we are now able to emit diagnostics to SARIF instead, and the whole point to SARIF is to be a machine-readable interchange format for diagnostic information to be shared between tools. So if we want to promote SARIF for sharing diagnostics between tools, it seems like we don't really need a tool to deserialized diagnostics we've serialized.
CC @cjdb for awareness, as he's been working on SARIF support.

The goal of these tools is to eventually be able to obtain clang diagnostics as structural data. Say, I want the compiler to emit warnings and learn about the number of warnings I will get. For us, this is something that's worth putting into a database so that we can nudge relevant teams to make changes.

Okay, that makes sense to me. Yeah, I would recommend using SARIF for that.

I didn't know that we plan to deprecate serialized diagnostics. I would have done differently if I had known about SARIF.

I didn't mean to oversell it, I don't think we have firm plans to deprecate serialized diagnostics, just that it seems like something we're going to want to do at some point so we don't need to carry around multiple implementations of machine-readable diagnostic information.

@yuxuanchen1997
Copy link
Member Author

so we don't need to carry around multiple implementations of machine-readable diagnostic information.

I just realized that we also have -diagnostic-log-file as a cc1 flag. This logs into an XML. Is this preferred to the binary format or no?

@AaronBallman
Copy link
Collaborator

so we don't need to carry around multiple implementations of machine-readable diagnostic information.

I just realized that we also have -diagnostic-log-file as a cc1 flag. This logs into an XML. Is this preferred to the binary format or no?

I don't think either is preferred over the other, but I was unaware that we had something which logs to XML, so it's possible there's bitrot there (it seems we may not have any direct test coverage of that feature, at least from my cursory look).

@yuxuanchen1997
Copy link
Member Author

so we don't need to carry around multiple implementations of machine-readable diagnostic information.

I just realized that we also have -diagnostic-log-file as a cc1 flag. This logs into an XML. Is this preferred to the binary format or no?

I don't think either is preferred over the other, but I was unaware that we had something which logs to XML, so it's possible there's bitrot there (it seems we may not have any direct test coverage of that feature, at least from my cursory look).

I'd be more than happy to drop this patch if the alternative is better. I believe that we shouldn't need a tool like this, as the majority of third-party tooling won't want to implement reading from a proprietary format. It adds moving parts while getting little in return (maybe encoding/decoding speed or file size).

Things could also be in a friendlier format than XML. We currently use YAML for middle end opt remarks but that's different from diagnostics engine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants