Conversation
src/build.cc
Outdated
| if (content.back() != '\0') { | ||
| content.push_back('\0'); | ||
| } | ||
| const char* ptr = content.c_str(); | ||
| const char* const end = &content.back(); | ||
| while (ptr < end) { | ||
| StringPiece piece = ptr; | ||
| ptr += piece.size() + 1; | ||
| uint64_t slash_bits; | ||
| CanonicalizePath(const_cast<char*>(piece.str_), &piece.len_, | ||
| &slash_bits); | ||
| deps_nodes->push_back(state_->GetNode(piece, slash_bits)); | ||
| } |
There was a problem hiding this comment.
If you wanted to you could include the null character at the end of the std::string to avoid modification, but then you need to skip empty paths in your loop (which may be a good idea anyway?)
| if (content.back() != '\0') { | |
| content.push_back('\0'); | |
| } | |
| const char* ptr = content.c_str(); | |
| const char* const end = &content.back(); | |
| while (ptr < end) { | |
| StringPiece piece = ptr; | |
| ptr += piece.size() + 1; | |
| uint64_t slash_bits; | |
| CanonicalizePath(const_cast<char*>(piece.str_), &piece.len_, | |
| &slash_bits); | |
| deps_nodes->push_back(state_->GetNode(piece, slash_bits)); | |
| } | |
| const char* ptr = content.c_str(); | |
| // Include the terminating '\0' in the range | |
| const char* const end = content.c_str() + content.size() + 1; | |
| while (ptr < end) { | |
| StringPiece piece = ptr; | |
| ptr += piece.size() + 1; | |
| if (piece.empty()) | |
| continue; | |
| uint64_t slash_bits; | |
| CanonicalizePath(const_cast<char*>(piece.str_), &piece.len_, | |
| &slash_bits); | |
| deps_nodes->push_back(state_->GetNode(piece, slash_bits)); | |
| } |
There was a problem hiding this comment.
I'm not sure what one has to do with the other. It is true that c_str() returns a pointer to a NUL-terminated string, although I wasn't sure about that initially; but I'm not sure if there's a need to handle empty file names in a special way. It's an easy check, sure, but how is it different from any other invalid path?
There was a problem hiding this comment.
Yes they are independent things, but if the user's data already contains a null character at the end, then by parsing the extra one added by std::string we will end up with an empty StringPiece and I don't think we should be adding that as a dependency. You can handle that by skipping all empty StringPieces, if you want to be more flexible and ignore empty paths from the user, or you could conditionally allow the null character and then not skip empty StringPieces:
const char* ptr = content.c_str();
const bool needNullTerminator = content.empty() || content.back() != '\0';
const char* const end = ptr + content.size() + needNullTerminator;There was a problem hiding this comment.
It won't. The ptr += line advances the pointer past the current piece's NUL. For non-terminal pieces, this is the first character of the next piece, and for the terminal piece it's the string's NUL, which is also pointed to by end. Now, the ptr < end condition fails, so there's no attempt to parse any sort of an empty piece. Same logic applies to cases where user's file doesn't end with NUL — then ptr ends up on end+1 after reading the past piece, and the condition likewise fails.
There was a problem hiding this comment.
Sorry for any confusion, but when I wrote,
If you wanted to you could include the null character at the end of the std::string to avoid modification, but then you need to skip empty paths in your loop (which may be a good idea anyway?)
it was in reference to the code snippet that I had suggested. I don't think your original code adds in any empty StringPiece code that wasn't supplied by the user.
But instead of conditionally adding \0,
if (content.back() != '\0') {
content.push_back('\0');
}we should instead conditionally use the null-terminator that is already there in std::string.
const char* ptr = content.c_str();
const bool needNullTerminator = content.back() != '\0';
const char* const end = ptr + content.size() + needNullTerminator;(I realised that content is never empty at this point in the code so we I didn't need to put the check in).
There was a problem hiding this comment.
Please see the latest version of the patch — I applied your suggestion without any checks (neither for "ends with NUL" nor for "is empty").
This patch adds support for NUL-delimited dependency files, which is quite a simple format avoiding all the issues possible with
gcc/msvcformats. It requires almost no parsing: just read NUL-terminated strings until the file is empty. Because of this, it doesn't depend on file paths being "nice" in any way and doesn't require any escaping for special characters like$or\. It is quite common in shell tooling, likefind -print0,xargs -0and others.GCC format, which should more accurately be described as "GCC version of Makefile subset", is quite problematic: #2599, #2590, #2568, #2357, #1856, #1825, #1774, #1542, #1227 and probably more. Compared to this, the only possible issue with the proposed format is lack of a NUL after the last entry, which I handled by adding a NUL if the file doesn't end in one.
The parser itself is implemented in 9 lines of code, compared to hundreds of lines MSVC and GCC need. Because of this simplicity, any project can easily implement a reader or a writer without any shortcomings. GCC format is impossible to get right, as even GCC itself was unable to do so (quoted from
libcpp/mkdeps.ccfrom GCC 14.3.0):Typst contains a similar remark: https://github.com/typst/typst/blob/880e69b0c840cd0d2f0c51e72966ab72c008e713/crates/typst-cli/src/compile.rs#L524-L525
That led me to creating
typst/typst#6648typst/typst#7022, adding NUL-delimited format, and that in turn led me here.Note that the proposed format has no "official" standard, RFC or otherwise. Due to its simplicity, though, there are only two flavors: either you terminate your fail with a NUL (e.g.
foo␀bar␀baz␀) or you don't (e.g.foo␀bar␀baz). The former is usually easier to implement due to being more consistent, and this situation is similar to how some old tools require that a text file ends with an LF and some don't. To be honest, I haven't seen any tools that omit the trailing NUL, but they might still exist. Anyway, the implementation here supports both "flavors" without any compatibility cost.