Skip to content

Conversation

aokblast
Copy link
Contributor

@aokblast aokblast commented Oct 12, 2025

To prevent large memory footprint, we use small buffer and a seperated string to print. This can largely reduce the memory consumption when processing.

In the following example, the test.file is 1G regular file, which is possible in the daily workload.

cat ./test.file | llvm-strings --radix=o > out &&
ps -o command,vsz,rss | grep strings

Reading whole file:
llvm-strings 5129824 (5G) 3999920 (3G)

Small buffer implementation:
llvm-strings 17604 (16M) 5476 (5M)

Also, it does not affect too much runtime:

time llvm-strings --radix=o > out < ./test.file

Reading whole file:
Executed in 5.67 secs fish external
usr time 4.38 secs 2.40 millis 4.38 secs
sys time 1.28 secs 0.00 millis 1.28 secs

Small buffer implementation:
Executed in 5.87 secs fish external
usr time 5.65 secs 2.59 millis 5.65 secs
sys time 0.21 secs 0.00 millis 0.21 secs

@llvmbot
Copy link
Member

llvmbot commented Oct 12, 2025

@llvm/pr-subscribers-llvm-binary-utilities

Author: None (aokblast)

Changes

To prevent large memory footprint, we use small buffer and a seperated string to print. This can largely reduce the memory consumption when processing.

In the following example, the test.file is 1G regular file, which is possible in the daily workload.

cat ./test.file | llvm-strings --radix=o > out &&
ps -o command,vsz,rss | grep strings

Reading whole file:
llvm-strings 5129824 3999920

Small buffer implementation:
llvm-strings 17604 5476

Also, it does not affect too much runtime:

time llvm-strings --radix=o > out < ./test.file

Reading whole file:
Executed in 5.67 secs fish external
usr time 4.38 secs 2.40 millis 4.38 secs
sys time 1.28 secs 0.00 millis 1.28 secs

Small buffer implementation:
Executed in 5.87 secs fish external
usr time 5.65 secs 2.59 millis 5.65 secs
sys time 0.21 secs 0.00 millis 0.21 secs


Full diff: https://github.com/llvm/llvm-project/pull/163073.diff

1 Files Affected:

  • (modified) llvm/tools/llvm-strings/llvm-strings.cpp (+52-19)
diff --git a/llvm/tools/llvm-strings/llvm-strings.cpp b/llvm/tools/llvm-strings/llvm-strings.cpp
index 9979b93de8427..2be070d43f294 100644
--- a/llvm/tools/llvm-strings/llvm-strings.cpp
+++ b/llvm/tools/llvm-strings/llvm-strings.cpp
@@ -18,7 +18,9 @@
 #include "llvm/Option/ArgList.h"
 #include "llvm/Option/Option.h"
 #include "llvm/Support/CommandLine.h"
+#include "llvm/Support/Errno.h"
 #include "llvm/Support/Error.h"
+#include "llvm/Support/FileSystem.h"
 #include "llvm/Support/Format.h"
 #include "llvm/Support/InitLLVM.h"
 #include "llvm/Support/MemoryBuffer.h"
@@ -88,7 +90,9 @@ static void parseIntArg(const opt::InputArgList &Args, int ID, T &Value) {
   }
 }
 
-static void strings(raw_ostream &OS, StringRef FileName, StringRef Contents) {
+static void strings(raw_ostream &OS, StringRef FileName,
+                    sys::fs::file_t FileHandle) {
+  SmallString<sys::fs::DefaultReadChunkSize> Buffer;
   auto print = [&OS, FileName](unsigned Offset, StringRef L) {
     if (L.size() < static_cast<size_t>(MinLength))
       return;
@@ -110,19 +114,39 @@ static void strings(raw_ostream &OS, StringRef FileName, StringRef Contents) {
     OS << L << '\n';
   };
 
-  const char *B = Contents.begin();
-  const char *P = nullptr, *E = nullptr, *S = nullptr;
-  for (P = Contents.begin(), E = Contents.end(); P < E; ++P) {
-    if (isPrint(*P) || *P == '\t') {
-      if (S == nullptr)
-        S = P;
-    } else if (S) {
-      print(S - B, StringRef(S, P - S));
-      S = nullptr;
+  unsigned Offset = 0, LocalOffset = 0, CurSize = 0;
+  Buffer.resize(sys::fs::DefaultReadChunkSize);
+  auto FillBuffer = [&Buffer, FileHandle, &FileName]() -> unsigned {
+    Expected<size_t> ReadBytesOrErr = sys::fs::readNativeFile(
+        FileHandle,
+        MutableArrayRef(Buffer.begin(), sys::fs::DefaultReadChunkSize));
+    if (!ReadBytesOrErr) {
+      errs() << FileName << ": "
+             << errorToErrorCode(ReadBytesOrErr.takeError()).message() << '\n';
+      return 0;
+    }
+    return *ReadBytesOrErr;
+  };
+  std::string StringBuffer;
+  while (true) {
+    if (LocalOffset == CurSize) {
+      CurSize = FillBuffer();
+      if (CurSize == 0)
+        break;
+      LocalOffset = 0;
+    }
+    char C = Buffer[LocalOffset++];
+    if (isPrint(C) || C == '\t') {
+      StringBuffer.push_back(C);
+    } else if (StringBuffer.size()) {
+      print(Offset, StringRef(StringBuffer.c_str(), StringBuffer.size()));
+      Offset += StringBuffer.size();
+      StringBuffer.clear();
+      ++Offset;
+    } else {
+      ++Offset;
     }
   }
-  if (S)
-    print(S - B, StringRef(S, E - S));
 }
 
 int main(int argc, char **argv) {
@@ -174,13 +198,22 @@ int main(int argc, char **argv) {
     InputFileNames.push_back("-");
 
   for (const auto &File : InputFileNames) {
-    ErrorOr<std::unique_ptr<MemoryBuffer>> Buffer =
-        MemoryBuffer::getFileOrSTDIN(File, /*IsText=*/true);
-    if (std::error_code EC = Buffer.getError())
-      errs() << File << ": " << EC.message() << '\n';
-    else
-      strings(llvm::outs(), File == "-" ? "{standard input}" : File,
-              Buffer.get()->getMemBufferRef().getBuffer());
+    sys::fs::file_t FD;
+    if (File == "-") {
+      FD = sys::fs::getStdinHandle();
+    } else {
+      Expected<sys::fs::file_t> FDOrErr =
+          sys::fs::openNativeFileForRead(File, sys::fs::OF_None);
+      ;
+      if (!FDOrErr) {
+        errs() << File << ": "
+               << errorToErrorCode(FDOrErr.takeError()).message() << '\n';
+        continue;
+      }
+      FD = *FDOrErr;
+    }
+
+    strings(llvm::outs(), File == "-" ? "{standard input}" : File, FD);
   }
 
   return EXIT_SUCCESS;

To prevent large memory footprint, we use small buffer and a seperated
string to print. This can largely reduce the memory consumption when
processing.

In the following example, the test.file is 1G regular file, which is possible in the
daily workload.

cat ./test.file | llvm-strings --radix=o > out &&
ps -o command,vsz,rss | grep strings

Reading whole file:
llvm-strings 5129824 3999920

Small buffer implementation:
llvm-strings 17604   5476

Also, it does not affect too much runtime:

time llvm-strings --radix=o > out < ./test.file

Reading whole file:
________________________________________________________
Executed in    5.67 secs    fish           external
   usr time    4.38 secs    2.40 millis    4.38 secs
   sys time    1.28 secs    0.00 millis    1.28 secs

Small buffer implementation:
________________________________________________________
Executed in    5.87 secs    fish           external
   usr time    5.65 secs    2.59 millis    5.65 secs
   sys time    0.21 secs    0.00 millis    0.21 secs
Copy link
Collaborator

@jh7370 jh7370 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My initial thought is that this is overcomplicating things. Could you have strings just take a std::istream object and use std::istream::read or something similar? You'd pass in either std::cin or std::ifstream, if using a file.

@aokblast
Copy link
Contributor Author

My initial thought is that this is overcomplicating things. Could you have strings just take a std::istream object and use std::istream::read or something similar? You'd pass in either std::cin or std::ifstream, if using a file.

That is really strange. In my experiment, it takes 10sec to process a 1GB file. Even if I use basic_streambuf::putsetbuf to set the buffer size to 4k * 4 like my original implementation.

@aokblast
Copy link
Contributor Author

aokblast commented Oct 15, 2025

Here are the flamegraphs for two different implementations with same file.

Notice that the first implementation calls strings() only once so it can be inlined.

  1. Small llvm::buffer
image
  1. std::istream
image

It seems that istream::get has a very huge delay on output. I am not sure where is the bug or is an intended behavior. I am using FreeBSD-Current so that I use the libcxx instead of libstdcxx.

@jh7370
Copy link
Collaborator

jh7370 commented Oct 16, 2025

Notice that the first implementation calls strings() only once so it can be inlined.

Without seeing your code, I can't see what's going on, but I wouldn't expect there to be a need to call strings more than once... Anyway, the inlining is unlikely to be the culprit for the performance difference in this case.

It seems that istream::get has a very huge delay on output. I am not sure where is the bug or is an intended behavior. I am using FreeBSD-Current so that I use the libcxx instead of libstdcxx.

Are you sure you're blaming the right function (keeping in mind that inlining is potentially going on)? I note that the flame graph has cut off the names of most of the functions, so I can't really see the detail. More useful would be looking at a line-by-line breakdown of the get method, i.e. something that shows where the time is spent within the function. You can find the current get source code at

typename basic_istream<_CharT, _Traits>::int_type basic_istream<_CharT, _Traits>::get() {
, though it may not exactly match the version you did. You might also want to reach out to the libc++ developers for help figuring out why there's a performance difference.

@aokblast
Copy link
Contributor Author

aokblast commented Oct 16, 2025

Notice that the first implementation calls strings() only once so it can be inlined.

Without seeing your code, I can't see what's going on, but I wouldn't expect there to be a need to call strings more than once... Anyway, the inlining is unlikely to be the culprit for the performance difference in this case.

Here is my code.
Since, I call strings two times in different branch, LLVM inline pass refused to inline this function since the host is too high. I mention this since you can see one flamegraph is shown without the strings() stack as it is inlined.

It seems that istream::get has a very huge delay on output. I am not sure where is the bug or is an intended behavior. I am using FreeBSD-Current so that I use the libcxx instead of libstdcxx.

Are you sure you're blaming the right function (keeping in mind that inlining is potentially going on)? I note that the flame graph has cut off the names of most of the functions, so I can't really see the detail. More useful would be looking at a line-by-line breakdown of the get method, i.e. something that shows where the time is spent within the function. You can find the current get source code at

typename basic_istream<_CharT, _Traits>::int_type basic_istream<_CharT, _Traits>::get() {

, though it may not exactly match the version you did. You might also want to reach out to the libc++ developers for help figuring out why there's a performance difference.

Thanks! Yes, that is what I am doing now. Since most of the function is inlined, I have to build llvm-strings in debug build to figure it out.

@jh7370
Copy link
Collaborator

jh7370 commented Oct 16, 2025

Hopefully this goes without saying, but debug builds are not going to provide the same performance characteristics as release builds.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants