Skip to content

Use memchr in parse_file#5437

Open
bscuron wants to merge 1 commit intomawww:masterfrom
bscuron:memchr_parse_file
Open

Use memchr in parse_file#5437
bscuron wants to merge 1 commit intomawww:masterfrom
bscuron:memchr_parse_file

Conversation

@bscuron
Copy link
Copy Markdown
Contributor

@bscuron bscuron commented Jan 15, 2026

perf record shows improvement from ~15-18% to 1-3% using kak -n

If this is equivalent to the current implementation, a similar update can be made to parse_lines as well.

@mawww
Copy link
Copy Markdown
Owner

mawww commented Feb 27, 2026

Experimented with this on large file loading, it did not seem to make a significant difference. I am not sure what was going from 15-18% to 1-3% in perf but it does not seem to impact the actual startup time on a big file, which is dominated by allocating the lines.

I was measuring using this change, which migrates the other std::find to memchr as well:

diff --git a/src/buffer_utils.cc b/src/buffer_utils.cc
index cfac4881..7ed5430b 100644
--- a/src/buffer_utils.cc
+++ b/src/buffer_utils.cc
@@ -97,11 +97,13 @@ static BufferLines parse_lines(const char* pos, const char* end, EolFormat eolfo
         if (lines.size() >= std::numeric_limits<int>::max())
             throw runtime_error("too many lines");
 
-        const char* eol = std::find(pos, end, '\n');
-        if ((eol - pos) >= std::numeric_limits<int>::max())
+        const char* eol = reinterpret_cast<const char*>(memchr(pos, '\n', end - pos));
+        if (((eol ? eol : end) - pos) >= std::numeric_limits<int>::max())
             throw runtime_error("line is too long");
 
-        lines.emplace_back(StringData::create(StringView{pos, eol - (eolformat == EolFormat::Crlf and eol != end ? 1 : 0)}, "\n"));
+        lines.emplace_back(StringData::create(StringView{pos, eol ? eol - (eolformat == EolFormat::Crlf ? 1 : 0) : end}, "\n"));
+        if (not eol)
+            break;
         pos = eol + 1;
     }
 
@@ -137,8 +139,8 @@ decltype(auto) parse_file(StringView filename, Func&& func)
 
     size_t line_count = 1;
     bool has_crlf = false, has_lf = false;
-    for (auto it = std::find(pos, end, '\n'); it != end; it = std::find(it+1, end, '\n'), ++line_count)
-        ((it != pos and *(it-1) == '\r') ? has_crlf : has_lf) = true;
+    for (const char* eol = pos; (eol = reinterpret_cast<const char*>(memchr(eol, '\n', end - eol))); ++eol, ++line_count)
+        ((eol != pos and *(eol-1) == '\r') ? has_crlf : has_lf) = true;
     auto eolformat = (has_crlf and not has_lf) ? EolFormat::Crlf : EolFormat::Lf;
 
     FsStatus fs_status{file.st.st_mtim, file.st.st_size, murmur3(file.data, file.st.st_size)};

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants