Skip to content

feat: zero deps format#2655

Open
magistau wants to merge 1 commit intoninja-build:masterfrom
magistau:zero-deps
Open

feat: zero deps format#2655
magistau wants to merge 1 commit intoninja-build:masterfrom
magistau:zero-deps

Conversation

@magistau
Copy link

@magistau magistau commented Aug 19, 2025

This patch adds support for NUL-delimited dependency files, which is quite a simple format avoiding all the issues possible with gcc/msvc formats. It requires almost no parsing: just read NUL-terminated strings until the file is empty. Because of this, it doesn't depend on file paths being "nice" in any way and doesn't require any escaping for special characters like $ or \. It is quite common in shell tooling, like find -print0, xargs -0 and others.
GCC format, which should more accurately be described as "GCC version of Makefile subset", is quite problematic: #2599, #2590, #2568, #2357, #1856, #1825, #1774, #1542, #1227 and probably more. Compared to this, the only possible issue with the proposed format is lack of a NUL after the last entry, which I handled by adding a NUL if the file doesn't end in one.
The parser itself is implemented in 9 lines of code, compared to hundreds of lines MSVC and GCC need. Because of this simplicity, any project can easily implement a reader or a writer without any shortcomings. GCC format is impossible to get right, as even GCC itself was unable to do so (quoted from libcpp/mkdeps.cc from GCC 14.3.0):

/* Apply Make quoting to STR, TRAIL.  Note that it's not possible to
   quote all such characters - e.g. \n, %, *, ?, [, \ (in some
   contexts), and ~ are not properly handled.  It isn't possible to
   get this right in any current version of Make.  (??? Still true?
   Old comment referred to 3.76.1.)  */

Typst contains a similar remark: https://github.com/typst/typst/blob/880e69b0c840cd0d2f0c51e72966ab72c008e713/crates/typst-cli/src/compile.rs#L524-L525
That led me to creating typst/typst#6648 typst/typst#7022, adding NUL-delimited format, and that in turn led me here.

Note that the proposed format has no "official" standard, RFC or otherwise. Due to its simplicity, though, there are only two flavors: either you terminate your fail with a NUL (e.g. foo␀bar␀baz␀) or you don't (e.g. foo␀bar␀baz). The former is usually easier to implement due to being more consistent, and this situation is similar to how some old tools require that a text file ends with an LF and some don't. To be honest, I haven't seen any tools that omit the trailing NUL, but they might still exist. Anyway, the implementation here supports both "flavors" without any compatibility cost.

src/build.cc Outdated
Comment on lines 1038 to 1050
if (content.back() != '\0') {
content.push_back('\0');
}
const char* ptr = content.c_str();
const char* const end = &content.back();
while (ptr < end) {
StringPiece piece = ptr;
ptr += piece.size() + 1;
uint64_t slash_bits;
CanonicalizePath(const_cast<char*>(piece.str_), &piece.len_,
&slash_bits);
deps_nodes->push_back(state_->GetNode(piece, slash_bits));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you wanted to you could include the null character at the end of the std::string to avoid modification, but then you need to skip empty paths in your loop (which may be a good idea anyway?)

Suggested change
if (content.back() != '\0') {
content.push_back('\0');
}
const char* ptr = content.c_str();
const char* const end = &content.back();
while (ptr < end) {
StringPiece piece = ptr;
ptr += piece.size() + 1;
uint64_t slash_bits;
CanonicalizePath(const_cast<char*>(piece.str_), &piece.len_,
&slash_bits);
deps_nodes->push_back(state_->GetNode(piece, slash_bits));
}
const char* ptr = content.c_str();
// Include the terminating '\0' in the range
const char* const end = content.c_str() + content.size() + 1;
while (ptr < end) {
StringPiece piece = ptr;
ptr += piece.size() + 1;
if (piece.empty())
continue;
uint64_t slash_bits;
CanonicalizePath(const_cast<char*>(piece.str_), &piece.len_,
&slash_bits);
deps_nodes->push_back(state_->GetNode(piece, slash_bits));
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what one has to do with the other. It is true that c_str() returns a pointer to a NUL-terminated string, although I wasn't sure about that initially; but I'm not sure if there's a need to handle empty file names in a special way. It's an easy check, sure, but how is it different from any other invalid path?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes they are independent things, but if the user's data already contains a null character at the end, then by parsing the extra one added by std::string we will end up with an empty StringPiece and I don't think we should be adding that as a dependency. You can handle that by skipping all empty StringPieces, if you want to be more flexible and ignore empty paths from the user, or you could conditionally allow the null character and then not skip empty StringPieces:

      const char* ptr = content.c_str();
      const bool needNullTerminator = content.empty() || content.back() != '\0';
      const char* const end = ptr + content.size() + needNullTerminator;

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It won't. The ptr += line advances the pointer past the current piece's NUL. For non-terminal pieces, this is the first character of the next piece, and for the terminal piece it's the string's NUL, which is also pointed to by end. Now, the ptr < end condition fails, so there's no attempt to parse any sort of an empty piece. Same logic applies to cases where user's file doesn't end with NUL — then ptr ends up on end+1 after reading the past piece, and the condition likewise fails.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for any confusion, but when I wrote,

If you wanted to you could include the null character at the end of the std::string to avoid modification, but then you need to skip empty paths in your loop (which may be a good idea anyway?)

it was in reference to the code snippet that I had suggested. I don't think your original code adds in any empty StringPiece code that wasn't supplied by the user.

But instead of conditionally adding \0,

      if (content.back() != '\0') {
        content.push_back('\0');
      }

we should instead conditionally use the null-terminator that is already there in std::string.

      const char* ptr = content.c_str();
      const bool needNullTerminator = content.back() != '\0';
      const char* const end = ptr + content.size() + needNullTerminator;

(I realised that content is never empty at this point in the code so we I didn't need to put the check in).

Copy link
Author

@magistau magistau Jan 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see the latest version of the patch — I applied your suggestion without any checks (neither for "ends with NUL" nor for "is empty").

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants