-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[llvm][mustache] Specialize delimiter search #160165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[llvm][mustache] Specialize delimiter search #160165
Conversation
Delimiters in mustache are generally 2-4 character sequences. While good for general search, we can beat find() for these short sequences by just using memchr() to find the first match, and then checking the next few characters directly.
This stack of pull requests is managed by Graphite. Learn more about stacking. |
|
@llvm/pr-subscribers-llvm-support Author: Paul Kirth (ilovepi) ChangesDelimiters in mustache are generally 2-4 character sequences. While good Full diff: https://github.com/llvm/llvm-project/pull/160165.diff 1 Files Affected:
diff --git a/llvm/lib/Support/Mustache.cpp b/llvm/lib/Support/Mustache.cpp
index 6c2ed6c84c6cf..c7cebe6b64fae 100644
--- a/llvm/lib/Support/Mustache.cpp
+++ b/llvm/lib/Support/Mustache.cpp
@@ -17,6 +17,50 @@ namespace {
using Accessor = SmallVector<std::string>;
+// A more generic specialized find for needles of length 1-3.
+[[maybe_unused]]
+static size_t findDelimiters(StringRef Haystack, StringRef Needle,
+ size_t Offset = 0) {
+ const size_t N = Needle.size();
+ if (N == 0)
+ return Offset;
+ if (N > 3) {
+ // Fallback for longer needles where more advanced algorithms are better.
+ return Haystack.find(Needle, Offset);
+ }
+
+ const char *HaystackStart = Haystack.data();
+ const size_t HaystackSize = Haystack.size();
+ if (HaystackSize < N + Offset)
+ return StringRef::npos;
+
+ const char *NeedleStart = Needle.data();
+ const char *Current = HaystackStart + Offset;
+ const char *End = HaystackStart + HaystackSize;
+
+ while (Current + N <= End) {
+ // Stage 1: Find the first character of the needle.
+ Current = (const char *)::memchr(Current, NeedleStart[0], End - Current);
+ if (!Current || Current + N > End)
+ return StringRef::npos;
+
+ // Stage 2: Validate the rest of the sequence.
+ if (N == 1)
+ return Current - HaystackStart;
+ if (N == 2 && Current[1] == NeedleStart[1])
+ return Current - HaystackStart;
+ if (N == 3 && Current[1] == NeedleStart[1] && Current[2] == NeedleStart[2])
+ return Current - HaystackStart;
+
+ // Mismatch, advance and continue the search.
+ ++Current;
+ }
+
+ return StringRef::npos;
+}
+
+
+
static bool isFalsey(const json::Value &V) {
return V.getAsNull() || (V.getAsBoolean() && !V.getAsBoolean().value()) ||
(V.getAsArray() && V.getAsArray()->empty());
@@ -306,7 +350,9 @@ SmallVector<Token> tokenize(StringRef Template) {
StringLiteral Open("{{");
StringLiteral Close("}}");
size_t Start = 0;
- size_t DelimiterStart = Template.find(Open);
+ // size_t DelimiterStart = Template.find(Open);
+ size_t DelimiterStart = findDelimiters(Template, Open);
+
if (DelimiterStart == StringRef::npos) {
Tokens.emplace_back(Template.str());
return Tokens;
@@ -314,7 +360,8 @@ SmallVector<Token> tokenize(StringRef Template) {
while (DelimiterStart != StringRef::npos) {
if (DelimiterStart != Start)
Tokens.emplace_back(Template.substr(Start, DelimiterStart - Start).str());
- size_t DelimiterEnd = Template.find(Close, DelimiterStart);
+ // size_t DelimiterEnd = Template.find(Close, DelimiterStart);
+ size_t DelimiterEnd = findDelimiters(Template, Close, DelimiterStart);
if (DelimiterEnd == StringRef::npos)
break;
@@ -326,7 +373,8 @@ SmallVector<Token> tokenize(StringRef Template) {
std::string RawBody = Open.str() + Interpolated + Close.str();
Tokens.emplace_back(RawBody, Interpolated, Interpolated[0]);
Start = DelimiterEnd + Close.size();
- DelimiterStart = Template.find(Open, Start);
+ // DelimiterStart = Template.find(Open, Start);
+ DelimiterStart = findDelimiters(Template, Open, Start);
}
if (Start < Template.size())
@@ -572,6 +620,8 @@ void ASTNode::render(const json::Value &CurrentCtx, raw_ostream &OS) {
ParentContext = &CurrentCtx;
const json::Value *ContextPtr = Ty == Root ? ParentContext : findContext();
+ if (AccessorValue.empty() && (Ty != Root && Ty != Text))
+ return;
switch (Ty) {
case Root:
renderChild(CurrentCtx, OS);
|
You can test this locally with the following command:git-clang-format --diff origin/main HEAD --extensions cpp -- llvm/lib/Support/Mustache.cpp
View the diff from clang-format here.diff --git a/llvm/lib/Support/Mustache.cpp b/llvm/lib/Support/Mustache.cpp
index c7cebe6b6..4acd7b816 100644
--- a/llvm/lib/Support/Mustache.cpp
+++ b/llvm/lib/Support/Mustache.cpp
@@ -59,8 +59,6 @@ static size_t findDelimiters(StringRef Haystack, StringRef Needle,
return StringRef::npos;
}
-
-
static bool isFalsey(const json::Value &V) {
return V.getAsNull() || (V.getAsBoolean() && !V.getAsBoolean().value()) ||
(V.getAsArray() && V.getAsArray()->empty());
|
nikic
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How much does this improve performance?
| using Accessor = SmallVector<std::string>; | ||
|
|
||
| // A more generic specialized find for needles of length 1-3. | ||
| [[maybe_unused]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not maybe_unused?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops left from early debugging. I'll remove.
| StringLiteral Close("}}"); | ||
| size_t Start = 0; | ||
| size_t DelimiterStart = Template.find(Open); | ||
| // size_t DelimiterStart = Template.find(Open); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leftover commented code
| const json::Value *ContextPtr = Ty == Root ? ParentContext : findContext(); | ||
|
|
||
| if (AccessorValue.empty() && (Ty != Root && Ty != Text)) | ||
| return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated change?
|
sigh. I was consistently getting 10% on the template parsing benchmarks, w/ the rest under 1% variance run to run. But now I'm getting more consistent regressions with only 4% improvements. That was across 20 iterations of the benchmark via Given that I'm not seeing consistent wins on this patch, lets close this for now, and if we find a bottleneck splitting up the template, we can revisit. |

Delimiters in mustache are generally 2-4 character sequences. While good
for general search, we can beat find() for these short sequences by just
using memchr() to find the first match, and then checking the next few
characters directly.