-
Notifications
You must be signed in to change notification settings - Fork 15.1k
[TableGen] correctly escape dependency filenames #160834
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Currently, *-tblgen do not escape special characters in dependency
filenames. This can lead to unnecessary rebuilds when the filenames
contain characters that require escaping, as the build system may not
treat them correctly.
For instance, when building LLVM in a directory that contains spaces
("/home/user/repos/llvm project" in the example below), the build system
always rebuilds a large portion of the project despite nothing having
changed, due to an unescaped space in the dependency filename:
$ ninja -C build -d explain
ninja: Entering directory `build'
ninja explain: /home/user/repos/llvm is dirty
...
$ cat build/include/llvm/IR/IntrinsicsRISCV.h.d
IntrinsicsRISCV.h: /home/user/repos/llvm project/llvm/include/llvm/CodeGen/SDNodeProperties.td ...
Fix this by escaping special characters in dependency filenames using
backslashes. This is consistent with how GCC, Clang [1] and lld [2]
handle this.
After this change (notice the escaped space):
$ cat build/include/llvm/IR/IntrinsicsRISCV.h.d
IntrinsicsRISCV.h: /home/user/repos/llvm\ project/llvm/include/llvm/CodeGen/SDNodeProperties.td ...
[1]: https://github.com/llvm/llvm-project/blob/2cacf7117ba0fb7c134413a1a51302f8d6649052/clang/lib/Frontend/DependencyFile.cpp#L267-L344
[2]: https://github.com/llvm/llvm-project/blob/2cacf7117ba0fb7c134413a1a51302f8d6649052/lld/ELF/Driver.cpp#L2482-L2503
Signed-off-by: Ruoyu Zhong <[email protected]>
Signed-off-by: Ruoyu Zhong <[email protected]>
|
Thank you for submitting a Pull Request (PR) to the LLVM Project! This PR will be automatically labeled and the relevant teams will be notified. If you wish to, you can add reviewers by using the "Reviewers" section on this page. If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers. If you have further questions, they may be answered by the LLVM GitHub User Guide. You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums. |
|
@llvm/pr-subscribers-tablegen Author: Ruoyu Zhong (ZhongRuoyu) ChangesCurrently, For instance, when building LLVM in a directory that contains spaces ( $ ninja -C build -d explain
ninja: Entering directory `build'
ninja explain: /home/user/repos/llvm is dirty
...
$ cat build/include/llvm/IR/IntrinsicsRISCV.h.d
IntrinsicsRISCV.h: /home/user/repos/llvm project/llvm/include/llvm/CodeGen/SDNodeProperties.td ...Fix this by escaping special characters in dependency filenames using backslashes. This is consistent with how GCC, Clang 1 and lld 2 handle this. After this change (notice the escaped space): $ cat build/include/llvm/IR/IntrinsicsRISCV.h.d
IntrinsicsRISCV.h: /home/user/repos/llvm\ project/llvm/include/llvm/CodeGen/SDNodeProperties.td ...N.B. Full diff: https://github.com/llvm/llvm-project/pull/160834.diff 1 Files Affected:
diff --git a/llvm/lib/TableGen/Main.cpp b/llvm/lib/TableGen/Main.cpp
index f545706d6fe30..76b161d087f60 100644
--- a/llvm/lib/TableGen/Main.cpp
+++ b/llvm/lib/TableGen/Main.cpp
@@ -17,12 +17,14 @@
#include "llvm/TableGen/Main.h"
#include "TGLexer.h"
#include "TGParser.h"
+#include "llvm/ADT/SmallString.h"
#include "llvm/ADT/StringRef.h"
#include "llvm/ADT/Twine.h"
#include "llvm/Support/CommandLine.h"
#include "llvm/Support/ErrorOr.h"
#include "llvm/Support/FileSystem.h"
#include "llvm/Support/MemoryBuffer.h"
+#include "llvm/Support/Path.h"
#include "llvm/Support/SMLoc.h"
#include "llvm/Support/SourceMgr.h"
#include "llvm/Support/ToolOutputFile.h"
@@ -37,26 +39,24 @@
#include <utility>
using namespace llvm;
-static cl::opt<std::string>
-OutputFilename("o", cl::desc("Output filename"), cl::value_desc("filename"),
- cl::init("-"));
+static cl::opt<std::string> OutputFilename("o", cl::desc("Output filename"),
+ cl::value_desc("filename"),
+ cl::init("-"));
-static cl::opt<std::string>
-DependFilename("d",
- cl::desc("Dependency filename"),
- cl::value_desc("filename"),
- cl::init(""));
+static cl::opt<std::string> DependFilename("d", cl::desc("Dependency filename"),
+ cl::value_desc("filename"),
+ cl::init(""));
static cl::opt<std::string>
-InputFilename(cl::Positional, cl::desc("<input file>"), cl::init("-"));
+ InputFilename(cl::Positional, cl::desc("<input file>"), cl::init("-"));
static cl::list<std::string>
IncludeDirs("I", cl::desc("Directory of include files"),
cl::value_desc("directory"), cl::Prefix);
static cl::list<std::string>
-MacroNames("D", cl::desc("Name of the macro to be defined"),
- cl::value_desc("macro name"), cl::Prefix);
+ MacroNames("D", cl::desc("Name of the macro to be defined"),
+ cl::value_desc("macro name"), cl::Prefix);
static cl::opt<bool>
WriteIfChanged("write-if-changed", cl::desc("Only write output if it changed"));
@@ -83,6 +83,35 @@ static int reportError(const char *ProgName, Twine Msg) {
return 1;
}
+/// Escape a filename in the dependency file so that it is correctly
+/// interpreted by `make`. This is consistent with Clang, GCC, and lld.
+static TGLexer::DependenciesSetTy::value_type escapeDependencyFilename(
+ const TGLexer::DependenciesSetTy::value_type &Filename) {
+ std::string Res;
+ raw_string_ostream OS(Res);
+
+ SmallString<256> NativePath;
+ sys::path::native(Filename, NativePath);
+
+ for (unsigned I = 0, E = NativePath.size(); I != E; ++I) {
+ if (NativePath[I] == '#') {
+ OS << '\\';
+ } else if (NativePath[I] == ' ') {
+ OS << '\\';
+ unsigned J = I;
+ while (J > 0 && NativePath[--J] == '\\') {
+ OS << '\\';
+ }
+ } else if (NativePath[I] == '$') {
+ OS << '$';
+ }
+ OS << NativePath[I];
+ }
+
+ OS.flush();
+ return Res;
+}
+
/// Create a dependency file for `-d` option.
///
/// This functionality is really only for the benefit of the build system.
@@ -96,9 +125,9 @@ static int createDependencyFile(const TGParser &Parser, const char *argv0) {
if (EC)
return reportError(argv0, "error opening " + DependFilename + ":" +
EC.message() + "\n");
- DepOut.os() << OutputFilename << ":";
+ DepOut.os() << escapeDependencyFilename(OutputFilename) << ":";
for (const auto &Dep : Parser.getDependencies()) {
- DepOut.os() << ' ' << Dep;
+ DepOut.os() << ' ' << escapeDependencyFilename(Dep);
}
DepOut.os() << "\n";
DepOut.keep();
Footnotes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to write a lit test for this, especially for the funky logic handling backslashes before a space?
b11bf0f to
67488de
Compare
| // CHECK-DAG: file\ with\ spaces.td | ||
| // CHECK-DAG: file\#with\#hash.td | ||
| // CHECK-DAG: file$$with$$dollar.td | ||
| // CHECK-DAG: file\ with\ escape\\\ before\ spaces.td |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test (file with escape\ before spaces) actually failed on UNIX before 7718c55, as the filename was escaped as file\ with\ space/\ before\ spaces.td suggesting that the backslash in the filename was mistakenly (?) transformed to the path separator / due to this:
llvm-project/llvm/lib/Support/Path.cpp
Line 564 in 2cacf71
| llvm::replace(Path, '\\', '/'); |
So if the test case actually makes sense, I believe it's more correct to only transform the path to its native form using sys::path::native on Windows. Its "On Unix, it converts all '' to '/'" behavior doesn't seem to be needed here (*-tblgen don't have the -fms-compatibility flag where backslashes are treated as path separators):
llvm-project/llvm/include/llvm/Support/Path.h
Lines 258 to 265 in 2cacf71
| /// Convert path to the native form. This is used to give paths to users and | |
| /// operating system calls in the platform's normal way. For example, on Windows | |
| /// all '/' are converted to '\'. On Unix, it converts all '\' to '/'. | |
| /// | |
| /// @param path A path that is transformed to native format. | |
| /// @param result Holds the result of the transformation. | |
| LLVM_ABI void native(const Twine &path, SmallVectorImpl<char> &result, | |
| Style style = Style::native); |
As for Clang and lld (as mentioned in the PR description), this also seems to be an issue that's worth checking.
$ echo '#include <a\ b>' | gcc-15 -M -MG -xc -
-: a\\\ b
$ echo '#include <a\ b>' | clang-21 -M -MG -xc -
-.o: a/\ bThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Proposed the same change in Clang and lld, respectively:
- [Clang] only convert dependency filename to native form when MS-compatible #160903
- [lld][ELF] Only convert dependency filename to native form on Windows #160927
If these changes are accepted we can probably DRY this up into a new llvm/support helper class.
f400432 to
7718c55
Compare
|
Gentle ping -- would be nice to have this landed. |
Currently,
*-tblgendo not escape special characters in dependency filenames. This can lead to unnecessary rebuilds when the filenames contain characters that require escaping, as the build system may not treat them correctly.For instance, when building LLVM in a directory that contains spaces (
/home/user/repos/llvm projectin the example below), the build system always rebuilds a large portion of the project despite nothing having changed, due to an unescaped space in the dependency filename:Fix this by escaping special characters in dependency filenames using backslashes. This is consistent with how GCC, Clang 1 and lld 2 handle this.
After this change (notice the escaped space):
N.B.
git clang-formatwanted me to format some other parts of the source so there is a second commit a8d9a90 that does that.See also:
If these PRs are accepted then I'll try to extract the logic into a new
llvm/supporthelper class. These changes are presented as separate PRs because each of them change the behavior slightly differently.Footnotes
https://github.com/llvm/llvm-project/blob/2cacf7117ba0fb7c134413a1a51302f8d6649052/clang/lib/Frontend/DependencyFile.cpp#L267-L344 ↩
https://github.com/llvm/llvm-project/blob/2cacf7117ba0fb7c134413a1a51302f8d6649052/lld/ELF/Driver.cpp#L2482-L2503 ↩