Avoid string copies on lookup methods#2716
Avoid string copies on lookup methods#2716elliotgoodrich wants to merge 1 commit intoninja-build:masterfrom
Conversation
src/string_piece_util.h
Outdated
| struct StringPieceLess { | ||
| using is_transparent = void; | ||
|
|
||
| bool operator()(const StringPiece& lhs, const StringPiece& rhs) const; |
There was a problem hiding this comment.
Should this take StringPiece value arguments instead?
https://quuxplusone.github.io/blog/2021/11/09/pass-string-view-by-value/
There was a problem hiding this comment.
Yes, missed this one.
src/string_piece_util.cc
Outdated
| return std::lexicographical_compare(lhs.begin(), | ||
| lhs.end(), | ||
| rhs.begin(), | ||
| rhs.end()); |
There was a problem hiding this comment.
nit: There is no guarantee that the std::lexicographical_compare() implementation will be efficient compared to the use of std::min() and ::memcmp(), which can use optimized SIMD instructions. See https://godbolt.org/z/4xcn46rGM
There was a problem hiding this comment.
In my experience the standard libraries do a great job of falling back to the fast C methods and I would have argued this. However, you're right about GCC! Even though GCC can fallback to memcmp - they fail to do so for char because it's not unsigned https://godbolt.org/z/4s8jhWE7T :(
This has got to be a bug as their std::char_traits<char>::compare does use it and the implementation of std::lexicographical_compare for MSVC and Clang use memcmp.
I'll file a bug and swap this to memcmp in the meantime.
There was a problem hiding this comment.
This isn't a bug. Compilers can only use memcmp in this case when char is signed on the platform (swap your godbolt to GCC ARM and it'll use memcmp) as they differ when the value of char is negative.
Learn something new every day!
There was a problem hiding this comment.
Oh right. Nice catch. Thanks for the reminder!
In the context of Ninja, our strings are UTF-8 so std::memcmp() probably makes more sense then (it always compares bytes as unsigned char values). I'd suggest a comment in the definition of the struct to clarify that it does utf8 code unit (i.e. unsigned char) comparisons.
Also means that when switching to std::string_view, we might want to keep this custom comparator instead of relying on std::less<>.
There was a problem hiding this comment.
@digit-google Sure thing. I've added an attempt at documenting this and pushed it in the commit. Let me know if you want any revisions.
/// A comparator for StringPiece that can be used in
/// `std::map<std::string, V, StringPieceLess>` that will allow using a
/// StringPiece as a key. Note that this compares elements of the
/// strings as `unsigned char`, which means that implementations that
/// have a signed `char` may give an unexpected (yet valid) ordering
/// when iterating through a `std::map` containing negative `char`s.|
Not sure if we should put in this work and not directly go to |
|
The benefit of this great PR is to improve performance, possibly quite significantly, by removing temporary string buffer heap allocations. I.e. it does change the generated code significantly. Switching to Also have we switched to C++17 yet (a requirement for std::string_view)? I thought C++14 was still the required minimum at least for the base Ninja code (tests are already on C++17 due to GTest). Please correct me if I'm wrong. |
39ce344 to
6e94a40
Compare
|
Thanks Elliot, latest version seems great (feel free to add the comment referenced above or not), @jhasse would you consider approving this? |
1dbd42a to
af99b3e
Compare
Create a transparent comparator `StringPieceLess` that can allow us to lookup up values in a `std::map<std::string, V>` using a `StringPiece` without needing to create a temporary `std::string`. Similarly, there are other methods on classes in `graph.h` that take a `std::string` that can work just fine with a `StringPIece` and avoid more temporaries.
af99b3e to
7e612cc
Compare
Create a transparent comparator
StringPieceLessthat can allow us to lookup up values in astd::map<std::string, V>using aStringPiecewithout needing to create a temporarystd::string.Similarly, there are other methods on classes in
graph.hthat take astd::stringthat can work just fine with aStringPIeceand avoid more temporaries.