Skip to content

Conversation

@nickdesaulniers
Copy link
Member

@llvmbot
Copy link
Member

llvmbot commented Jun 25, 2024

@llvm/pr-subscribers-libc

Author: Nick Desaulniers (paternity leave) (nickdesaulniers)

Changes

Port Bionic's mmap64 implementation.

Link: https://android.googlesource.com/platform/bionic/+/main/libc/bionic/legacy_32_bit_support.cpp#117


Full diff: https://github.com/llvm/llvm-project/pull/96700.diff

11 Files Affected:

  • (modified) libc/hdr/CMakeLists.txt (+16)
  • (added) libc/hdr/sys_auxv_macros.h (+22)
  • (added) libc/hdr/sys_mman_macros.h (+22)
  • (modified) libc/hdr/types/CMakeLists.txt (+12)
  • (added) libc/hdr/types/off64_t.h (+22)
  • (added) libc/hdr/types/off_t.h (+22)
  • (modified) libc/src/sys/auxv/getauxval.h (-2)
  • (modified) libc/src/sys/mman/linux/CMakeLists.txt (+5)
  • (modified) libc/src/sys/mman/linux/mmap.cpp (+63-28)
  • (modified) libc/src/sys/mman/mmap.h (+3-1)
  • (modified) libc/test/src/sys/mman/linux/mmap_test.cpp (+14)
diff --git a/libc/hdr/CMakeLists.txt b/libc/hdr/CMakeLists.txt
index 66b82c84dac49..b7c887e9b3da2 100644
--- a/libc/hdr/CMakeLists.txt
+++ b/libc/hdr/CMakeLists.txt
@@ -97,4 +97,20 @@ add_proxy_header_library(
     libc.include.float
 )
 
+add_proxy_header_library(
+  sys_mman_macros
+  HDRS
+    sys_mman_macros.h
+  FULL_BUILD_DEPENDS
+    libc.include.llvm-libc-macros.sys_mman_macros
+)
+
+add_proxy_header_library(
+  sys_auxv_macros
+  HDRS
+    sys_auxv_macros.h
+  FULL_BUILD_DEPENDS
+    libc.include.llvm-libc-macros.sys_auxv_macros
+)
+
 add_subdirectory(types)
diff --git a/libc/hdr/sys_auxv_macros.h b/libc/hdr/sys_auxv_macros.h
new file mode 100644
index 0000000000000..7fae034937a81
--- /dev/null
+++ b/libc/hdr/sys_auxv_macros.h
@@ -0,0 +1,22 @@
+//===-- Definition of macros from sys/auxv.h ------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_LIBC_HDR_SYS_AUXV_MACROS_H
+#define LLVM_LIBC_HDR_SYS_AUXV_MACROS_H
+
+#ifdef LIBC_FULL_BUILD
+
+#include "include/llvm-libc-macros/sys-auxv-macros.h"
+
+#else // Overlay mode
+
+#include <sys/auxv.h>
+
+#endif // LLVM_LIBC_FULL_BUILD
+
+#endif // LLVM_LIBC_HDR_SYS/AUXV_MACROS_H
diff --git a/libc/hdr/sys_mman_macros.h b/libc/hdr/sys_mman_macros.h
new file mode 100644
index 0000000000000..a656db48e96c3
--- /dev/null
+++ b/libc/hdr/sys_mman_macros.h
@@ -0,0 +1,22 @@
+//===-- Definition of macros from sys/mman.h ------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_LIBC_HDR_SYS_MMAN_MACROS_H
+#define LLVM_LIBC_HDR_SYS_MMAN_MACROS_H
+
+#ifdef LIBC_FULL_BUILD
+
+#include "include/llvm-libc-macros/sys-mman-macros.h"
+
+#else // Overlay mode
+
+#include <sys/mman.h>
+
+#endif // LLVM_LIBC_FULL_BUILD
+
+#endif // LLVM_LIBC_HDR_SYS/MMAN_MACROS_H
diff --git a/libc/hdr/types/CMakeLists.txt b/libc/hdr/types/CMakeLists.txt
index 9b3373a0ca390..b4da1da4b1bda 100644
--- a/libc/hdr/types/CMakeLists.txt
+++ b/libc/hdr/types/CMakeLists.txt
@@ -126,3 +126,15 @@ add_proxy_header_library(
     libc.include.llvm-libc-types.atexithandler_t
     libc.include.stdlib
 )
+
+add_proxy_header_library(
+  off_t
+  HDRS
+    off_t.h
+)
+
+add_proxy_header_library(
+  off64_t
+  HDRS
+    off64_t.h
+)
diff --git a/libc/hdr/types/off64_t.h b/libc/hdr/types/off64_t.h
new file mode 100644
index 0000000000000..183d58e6141a1
--- /dev/null
+++ b/libc/hdr/types/off64_t.h
@@ -0,0 +1,22 @@
+//===-- Proxy header for the off64_t type ---------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_LIBC_HDR_OFF64_T_H
+#define LLVM_LIBC_HDR_OFF64_T_H
+
+#ifdef LIBC_FULL_BUILD
+
+#include "include/llvm-libc-types/off64_t.h"
+
+#else // overlay mode
+
+#include <sys/types.h>
+
+#endif // LLVM_LIBC_FULL_BUILD
+
+#endif // LLVM_LIBC_HDR_OFF64_T_H
diff --git a/libc/hdr/types/off_t.h b/libc/hdr/types/off_t.h
new file mode 100644
index 0000000000000..cbeee1c7eec97
--- /dev/null
+++ b/libc/hdr/types/off_t.h
@@ -0,0 +1,22 @@
+//===-- Proxy header for the off_t type -----------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_LIBC_HDR_OFF_T_H
+#define LLVM_LIBC_HDR_OFF_T_H
+
+#ifdef LIBC_FULL_BUILD
+
+#include "include/llvm-libc-types/off_t.h"
+
+#else // overlay mode
+
+#include <sys/types.h>
+
+#endif // LLVM_LIBC_FULL_BUILD
+
+#endif // LLVM_LIBC_HDR_OFF_T_H
diff --git a/libc/src/sys/auxv/getauxval.h b/libc/src/sys/auxv/getauxval.h
index 7c9fb846e9198..40b7a934083c3 100644
--- a/libc/src/sys/auxv/getauxval.h
+++ b/libc/src/sys/auxv/getauxval.h
@@ -9,8 +9,6 @@
 #ifndef LLVM_LIBC_SRC_SYS_AUXV_GETAUXVAL_H
 #define LLVM_LIBC_SRC_SYS_AUXV_GETAUXVAL_H
 
-#include <sys/auxv.h>
-
 namespace LIBC_NAMESPACE {
 
 unsigned long getauxval(unsigned long id);
diff --git a/libc/src/sys/mman/linux/CMakeLists.txt b/libc/src/sys/mman/linux/CMakeLists.txt
index 00f4f0e64ec06..7bc5f16702c8e 100644
--- a/libc/src/sys/mman/linux/CMakeLists.txt
+++ b/libc/src/sys/mman/linux/CMakeLists.txt
@@ -18,9 +18,14 @@ add_entrypoint_object(
   HDRS
     ../mmap.h
   DEPENDS
+    libc.hdr.sys_auxv_macros
+    libc.hdr.sys_mman_macros
+    libc.hdr.types.off64_t
+    libc.hdr.types.off_t
     libc.include.sys_mman
     libc.include.sys_syscall
     libc.src.__support.OSUtil.osutil
+    libc.src.__support.block
     libc.src.errno.errno
 )
 
diff --git a/libc/src/sys/mman/linux/mmap.cpp b/libc/src/sys/mman/linux/mmap.cpp
index 16111c66859f5..f5b065411b994 100644
--- a/libc/src/sys/mman/linux/mmap.cpp
+++ b/libc/src/sys/mman/linux/mmap.cpp
@@ -8,56 +8,91 @@
 
 #include "src/sys/mman/mmap.h"
 
+#include "config/linux/app.h"    // app
+#include "hdr/sys_auxv_macros.h" // AT_PAGESZ
+#include "hdr/sys_mman_macros.h" // MAP_FAILED
+#include "hdr/types/off64_t.h"
+#include "hdr/types/off_t.h"
 #include "src/__support/OSUtil/syscall.h" // For internal syscall function.
+#include "src/__support/block.h"          // align_up
 #include "src/__support/common.h"
-
 #include "src/errno/libc_errno.h"
-#include <linux/param.h> // For EXEC_PAGESIZE.
+#include "src/sys/auxv/getauxval.h"
+
+#include <stddef.h>      // size_t
+#include <stdint.h>      // PTRDIFF_MAX
 #include <sys/syscall.h> // For syscall numbers.
 
 namespace LIBC_NAMESPACE {
 
+// Older 32b systems generally have a SYS_mmap2 that accepts a 32b value which
+// was a 64b value shifted down by 12; this magic constant isn't exposed via
+// UAPI headers, but its in kernel sources for mmap2 implementations.
+#ifdef SYS_mmap2
+
+// TODO: move these helpers to OSUtil?
+#ifdef LIBC_FULL_BUILD
+unsigned long get_page_size() { return app.page_size; }
+#else
+unsigned long get_page_size() {
+  // TODO: is it ok for mmap to call getauxval in overlay mode? Or is there a
+  // risk of infinite recursion?
+  return ::getauxval(AT_PAGESZ);
+}
+#endif // LIBC_FULL_BUILD
+
+void *mmap64(void *addr, size_t size, int prot, int flags, int fd,
+             off64_t offset) {
+  constexpr size_t MMAP2_SHIFT = 12;
+
+  if (offset < 0 || offset % (1UL << MMAP2_SHIFT)) {
+    libc_errno = EINVAL;
+    return MAP_FAILED;
+  }
+
+  // Prevent allocations large enough for `end - start` to overflow,
+  // to avoid security bugs.
+  size_t rounded = align_up(size, get_page_size());
+  if (rounded < size || rounded > PTRDIFF_MAX) {
+    libc_errno = ENOMEM;
+    return MAP_FAILED;
+  }
+
+  long ret = syscall_impl(SYS_mmap2, reinterpret_cast<long>(addr), size, prot,
+                          flags, fd, static_cast<long>(offset >> MMAP2_SHIFT));
+
+  if (ret < 0) {
+    libc_errno = static_cast<int>(-ret);
+    return MAP_FAILED;
+  }
+
+  return reinterpret_cast<void *>(ret);
+}
+#endif // SYS_mmap2
+
 // This function is currently linux only. It has to be refactored suitably if
 // mmap is to be supported on non-linux operating systems also.
 LLVM_LIBC_FUNCTION(void *, mmap,
                    (void *addr, size_t size, int prot, int flags, int fd,
                     off_t offset)) {
+#ifdef SYS_mmap2
+  return mmap64(addr, size, prot, flags, fd, static_cast<off64_t>(offset));
+#else
+
   // A lot of POSIX standard prescribed validation of the parameters is not
   // done in this function as modern linux versions do it in the syscall.
   // TODO: Perform argument validation not done by the linux syscall.
 
-  // EXEC_PAGESIZE is used for the page size. While this is OK for x86_64, it
-  // might not be correct in general.
-  // TODO: Use pagesize read from the ELF aux vector instead of EXEC_PAGESIZE.
+  long ret = LIBC_NAMESPACE::syscall_impl(
+      SYS_mmap, reinterpret_cast<long>(addr), size, prot, flags, fd, offset);
 
-#ifdef SYS_mmap2
-  offset /= EXEC_PAGESIZE;
-  long syscall_number = SYS_mmap2;
-#elif defined(SYS_mmap)
-  long syscall_number = SYS_mmap;
-#else
-#error "mmap or mmap2 syscalls not available."
-#endif
-
-  long ret =
-      LIBC_NAMESPACE::syscall_impl(syscall_number, reinterpret_cast<long>(addr),
-                                   size, prot, flags, fd, offset);
-
-  // The mmap/mmap2 syscalls return negative values on error. These negative
-  // values are actually the negative values of the error codes. So, fix them
-  // up in case an error code is detected.
-  //
-  // A point to keep in mind for the fix up is that a negative return value
-  // from the syscall can also be an error-free value returned by the syscall.
-  // However, since a valid return address cannot be within the last page, a
-  // return value corresponding to a location in the last page is an error
-  // value.
-  if (ret < 0 && ret > -EXEC_PAGESIZE) {
+  if (ret < 0) {
     libc_errno = static_cast<int>(-ret);
     return MAP_FAILED;
   }
 
   return reinterpret_cast<void *>(ret);
+#endif // SYS_mmap2
 }
 
 } // namespace LIBC_NAMESPACE
diff --git a/libc/src/sys/mman/mmap.h b/libc/src/sys/mman/mmap.h
index 4425019c4ee1a..cab8f863a66f0 100644
--- a/libc/src/sys/mman/mmap.h
+++ b/libc/src/sys/mman/mmap.h
@@ -9,7 +9,9 @@
 #ifndef LLVM_LIBC_SRC_SYS_MMAN_MMAP_H
 #define LLVM_LIBC_SRC_SYS_MMAN_MMAP_H
 
-#include <sys/mman.h> // For size_t and off_t
+#include "hdr/types/off_t.h"
+
+#include <stddef.h> // size_t
 
 namespace LIBC_NAMESPACE {
 
diff --git a/libc/test/src/sys/mman/linux/mmap_test.cpp b/libc/test/src/sys/mman/linux/mmap_test.cpp
index dcbc75808f13c..2d4fd4468f81e 100644
--- a/libc/test/src/sys/mman/linux/mmap_test.cpp
+++ b/libc/test/src/sys/mman/linux/mmap_test.cpp
@@ -41,3 +41,17 @@ TEST(LlvmLibcMMapTest, Error_InvalidSize) {
 
   EXPECT_THAT(LIBC_NAMESPACE::munmap(0, 0), Fails(EINVAL));
 }
+
+TEST(LlvmLibcMMapTest, Error_NegativeOffset) {
+  LIBC_NAMESPACE::libc_errno = 0;
+  void *addr = LIBC_NAMESPACE::mmap(nullptr, 128, PROT_READ,
+                                    MAP_ANONYMOUS | MAP_PRIVATE, -1, -42);
+  EXPECT_THAT(addr, Fails(EINVAL, MAP_FAILED));
+}
+
+TEST(LlvmLibcMMapTest, Error_NonPageSizeMultipleOffset) {
+  LIBC_NAMESPACE::libc_errno = 0;
+  void *addr = LIBC_NAMESPACE::mmap(nullptr, 128, PROT_READ,
+                                    MAP_ANONYMOUS | MAP_PRIVATE, -1, 7);
+  EXPECT_THAT(addr, Fails(EINVAL, MAP_FAILED));
+}

Comment on lines +48 to +59
if (offset < 0 || offset % (1UL << MMAP2_SHIFT)) {
libc_errno = EINVAL;
return MAP_FAILED;
}

// Prevent allocations large enough for `end - start` to overflow,
// to avoid security bugs.
size_t rounded = align_up(size, get_page_size());
if (rounded < size || rounded > PTRDIFF_MAX) {
libc_errno = ENOMEM;
return MAP_FAILED;
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@enh-google any sense how much validation needs to be done here vs is already handled by the kernel?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for LP64, you don't need any of this. (that's why you couldn't find bionic's LP64 code --- it's a trivial generated assembler stub.)

for ILP32 ... these are the checks you'll have to do yourself, because these are the checks that tell you whether your translation to mmap2(2) is valid or not.

given that you've got everything in the same file, #if __LP64__ might be clearer than the SYS_mmap2 check, which to the inexperienced eye might look like mmap2 is the new version of mmap, like faccessat2 or whatever.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like I should add some 32b only tests then.

Comment on lines +38 to +40
// TODO: is it ok for mmap to call getauxval in overlay mode? Or is there a
// risk of infinite recursion?
return ::getauxval(AT_PAGESZ);
Copy link
Member Author

@nickdesaulniers nickdesaulniers Jun 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably also isn't correct since we don't express in cmake a dependency on getauxval for mmap (which leads to inf recursion in our cmake). Hence the below question about parameter validation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you probably don't want to have to search auxv for the page size either. you could stash the page size at libc startup, but bionic currently just has this as

int getpagesize() {
  static const size_t page_size = getauxval(AT_PAGESZ);
  return page_size;
}

note also that getpagesize() is a public function that you'll end up implementing eventually anyway. (which is one reason why i went this route in bionic rather than the startup special case that iirc musl does.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could stash the page size at libc startup

We do, which is what's done for the LIBC_FULL_BUILD check. The issue becomes that llvm-libc supports building with literally just mmap as the only symbol in the resulting libc.a, so we also additionally need to support "getting the page size" without the ability to reach into the system libc's private book keeping.

@SchrodingerZhu
Copy link
Contributor

getauxval was introduced before we have a separate build scheme for LIBC_FULL_BUILD. The buffer is dynamically allocated using mmap to avoid extra static space for full build. Since LIBC_FULL_BUILD macro is introduced, I should rework getauxval:

  • remove dynamic allocation
  • make sure the async-signal-safety of getauxval

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants