Skip to content

Avoid string copies on lookup methods#2716

Open
elliotgoodrich wants to merge 1 commit intoninja-build:masterfrom
elliotgoodrich:take-stringpiece-on-graph-methods
Open

Avoid string copies on lookup methods#2716
elliotgoodrich wants to merge 1 commit intoninja-build:masterfrom
elliotgoodrich:take-stringpiece-on-graph-methods

Conversation

@elliotgoodrich
Copy link
Contributor

Create a transparent comparator StringPieceLess that can allow us to lookup up values in a std::map<std::string, V> using a StringPiece without needing to create a temporary std::string.

Similarly, there are other methods on classes in graph.h that take a std::string that can work just fine with a StringPIece and avoid more temporaries.

struct StringPieceLess {
using is_transparent = void;

bool operator()(const StringPiece& lhs, const StringPiece& rhs) const;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this take StringPiece value arguments instead?

https://quuxplusone.github.io/blog/2021/11/09/pass-string-view-by-value/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, missed this one.

return std::lexicographical_compare(lhs.begin(),
lhs.end(),
rhs.begin(),
rhs.end());
Copy link
Contributor

@digit-google digit-google Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: There is no guarantee that the std::lexicographical_compare() implementation will be efficient compared to the use of std::min() and ::memcmp(), which can use optimized SIMD instructions. See https://godbolt.org/z/4xcn46rGM

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my experience the standard libraries do a great job of falling back to the fast C methods and I would have argued this. However, you're right about GCC! Even though GCC can fallback to memcmp - they fail to do so for char because it's not unsigned https://godbolt.org/z/4s8jhWE7T :(

This has got to be a bug as their std::char_traits<char>::compare does use it and the implementation of std::lexicographical_compare for MSVC and Clang use memcmp.

I'll file a bug and swap this to memcmp in the meantime.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't a bug. Compilers can only use memcmp in this case when char is signed on the platform (swap your godbolt to GCC ARM and it'll use memcmp) as they differ when the value of char is negative.

Learn something new every day!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right. Nice catch. Thanks for the reminder!

In the context of Ninja, our strings are UTF-8 so std::memcmp() probably makes more sense then (it always compares bytes as unsigned char values). I'd suggest a comment in the definition of the struct to clarify that it does utf8 code unit (i.e. unsigned char) comparisons.

Also means that when switching to std::string_view, we might want to keep this custom comparator instead of relying on std::less<>.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@digit-google Sure thing. I've added an attempt at documenting this and pushed it in the commit. Let me know if you want any revisions.

/// A comparator for StringPiece that can be used in
/// `std::map<std::string, V, StringPieceLess>` that will allow using a
/// StringPiece as a key.  Note that this compares elements of the
/// strings as `unsigned char`, which means that implementations that
/// have a signed `char` may give an unexpected (yet valid) ordering
/// when iterating through a `std::map` containing negative `char`s.

@jhasse
Copy link
Collaborator

jhasse commented Jan 21, 2026

Not sure if we should put in this work and not directly go to std::string_view.

@digit-google
Copy link
Contributor

The benefit of this great PR is to improve performance, possibly quite significantly, by removing temporary string buffer heap allocations. I.e. it does change the generated code significantly.

Switching to std::string_view may require a massive change. It's a good idea but I recommend to do that in a different PR that does not change behavior, as this will be much easier to review for correctness.

Also have we switched to C++17 yet (a requirement for std::string_view)? I thought C++14 was still the required minimum at least for the base Ninja code (tests are already on C++17 due to GTest). Please correct me if I'm wrong.

@elliotgoodrich elliotgoodrich force-pushed the take-stringpiece-on-graph-methods branch 3 times, most recently from 39ce344 to 6e94a40 Compare January 22, 2026 07:48
@digit-google
Copy link
Contributor

Thanks Elliot, latest version seems great (feel free to add the comment referenced above or not), @jhasse would you consider approving this?

@elliotgoodrich elliotgoodrich force-pushed the take-stringpiece-on-graph-methods branch 2 times, most recently from 1dbd42a to af99b3e Compare January 22, 2026 18:54
Create a transparent comparator `StringPieceLess` that can allow
us to lookup up values in a `std::map<std::string, V>` using a
`StringPiece` without needing to create a temporary `std::string`.

Similarly, there are other methods on classes in `graph.h` that
take a `std::string` that can work just fine with a `StringPIece` and
avoid more temporaries.
@elliotgoodrich elliotgoodrich force-pushed the take-stringpiece-on-graph-methods branch from af99b3e to 7e612cc Compare January 22, 2026 19:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants