Skip to content

Conversation

aokblast
Copy link
Contributor

@aokblast aokblast commented Oct 5, 2025

When stdin is executed like ./prog < file, we are able to mmap the buffer so that we can reduce the total memory usage if we try to open a large file.

@llvmbot
Copy link
Member

llvmbot commented Oct 5, 2025

@llvm/pr-subscribers-llvm-binary-utilities

@llvm/pr-subscribers-llvm-support

Author: None (aokblast)

Changes

When stdin is executed like ./prog < file, we are able to mmap the buffer so that we can reduce the total memory usage if we try to open a large file.


Full diff: https://github.com/llvm/llvm-project/pull/162013.diff

1 Files Affected:

  • (modified) llvm/lib/Support/MemoryBuffer.cpp (+21)
diff --git a/llvm/lib/Support/MemoryBuffer.cpp b/llvm/lib/Support/MemoryBuffer.cpp
index 1c4645ad83641..97ea5f205b0ee 100644
--- a/llvm/lib/Support/MemoryBuffer.cpp
+++ b/llvm/lib/Support/MemoryBuffer.cpp
@@ -573,6 +573,27 @@ ErrorOr<std::unique_ptr<MemoryBuffer>> MemoryBuffer::getSTDIN() {
   // FIXME: That isn't necessarily true, we should try to mmap stdin and
   // fallback if it fails.
   sys::ChangeStdinMode(sys::fs::OF_Text);
+  std::error_code EC;
+  sys::fs::file_type Type;
+  sys::fs::file_status Status;
+  EC = sys::fs::status(sys::fs::getStdinHandle(), Status);
+  if (EC)
+    return EC;
+
+  Type = Status.type();
+
+  /*
+   * If the FD is regular file or block file,
+   * we try to create a mmap buffer first.
+   * If failed, rollback to read and copy.
+   */
+  if (Type == sys::fs::file_type::regular_file ||
+      Type == sys::fs::file_type::block_file) {
+    std::unique_ptr<MemoryBuffer> Result(new MemoryBufferMMapFile<MemoryBuffer>(
+        false, sys::fs::getStdinHandle(), Status.getSize(), 0, EC));
+    if (!EC)
+      return std::move(Result);
+  }
 
   return getMemoryBufferForStream(sys::fs::getStdinHandle(), "<stdin>");
 }

@aokblast
Copy link
Contributor Author

aokblast commented Oct 5, 2025

One thing that I am not sure is if we need to overwrite operator new like new (NamedBufferAlloc(BufferName)). If so, what is the proper BufferName? stdin?

@aokblast aokblast force-pushed the support/add_mmap_backend_when_possible branch 4 times, most recently from ccaa772 to a9d7f6c Compare October 6, 2025 08:32
@aokblast
Copy link
Contributor Author

aokblast commented Oct 6, 2025

Currently, some programs rely *c = '\0' to find the end of a buffer instead of the provided size.
In my experiment, therefore, it is impossible for us to mmap in all cases since some file may not have '\0' in the end of the file and requires copy buffer and append '\0' manually.
However, it would be a huge performace improvement if we use mmap when processing a huge file.
Can I rely on the test-suite's results to make sure which program rely on *c = '\0' and therefore provide mmap acceleration for other programs?

@jh7370
Copy link
Collaborator

jh7370 commented Oct 6, 2025

Currently, some programs rely *c = '\0' to find the end of a buffer instead of the provided size. In my experiment, therefore, it is impossible for us to mmap in all cases since some file may not have '\0' in the end of the file and requires copy buffer and append '\0' manually. However, it would be a huge performace improvement if we use mmap when processing a huge file. Can I rely on the test-suite's results to make sure which program rely on *c = '\0' and therefore provide mmap acceleration for other programs?

Without knowing the context, it sounds to me like a tool that ignores the provided size and looks for a trailing null terminator has a bug in it (what if the input contains null bytes other than at the end of the data?), so the tool should be updated. What tools rely on this behaviour?

@aokblast
Copy link
Contributor Author

aokblast commented Oct 6, 2025

Currently, some programs rely *c = '\0' to find the end of a buffer instead of the provided size. In my experiment, therefore, it is impossible for us to mmap in all cases since some file may not have '\0' in the end of the file and requires copy buffer and append '\0' manually. However, it would be a huge performace improvement if we use mmap when processing a huge file. Can I rely on the test-suite's results to make sure which program rely on *c = '\0' and therefore provide mmap acceleration for other programs?

Without knowing the context, it sounds to me like a tool that ignores the provided size and looks for a trailing null terminator has a bug in it (what if the input contains null bytes other than at the end of the data?), so the tool should be updated. What tools rely on this behaviour?

I am checking that. When I add the trailing zero test to judge if we can mmap, it passes the test-suite. When I remove the test, CI failed. I think these programs just have some behavior like to find the end of a string in string sections.

@aokblast
Copy link
Contributor Author

aokblast commented Oct 6, 2025

This is the CI result for removing trailing zero check.
https://github.com/llvm/llvm-project/actions/runs/18273318718/job/52019747511
It seems that Linux failed in a DAP test.
And windows failed in llc, however, the test results in Windows shows no symbol name and I don't have a Windows machine for debugging.

@aokblast
Copy link
Contributor Author

aokblast commented Oct 6, 2025

I plan to run the CI that turns off RequiresNullTermination by defualt again after sucess this time.

When stdin is executed like ./prog < file, we are possible to mmap the
buffer so that we can reduce the total memory usage if we try to open a
large file.

For example, programs like llvm-strings iterates the buffer until the
end of given size.
@aokblast aokblast force-pushed the support/add_mmap_backend_when_possible branch from f15f5b4 to c56c44d Compare October 6, 2025 09:52
@aokblast
Copy link
Contributor Author

aokblast commented Oct 6, 2025

Oops, I am so sorry for the noise... I forget to rebase main before push to this branch.

@aokblast
Copy link
Contributor Author

aokblast commented Oct 6, 2025

Emm, the problem only happens on Windows this time. I think l need to get a windows machine since it is definately a WIndows related bug unrelated with my patch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants