Skip to content

Conversation

ColinKinloch
Copy link

@ColinKinloch ColinKinloch commented Oct 2, 2025

This MR implements in clang some of the overflow and over-read checks implemented in bionics FORTIFY headers.

In order to get the checks working with the existing overflow checker this MR defines the following as builtin:

  • read
  • pread/pread64
  • write
  • pwrite/pwrite64
  • getcwd
  • readlink
  • readlinkat

The MR also enables ssize_t in clangs AST and adds the off_t, off64_t and ssize_t types for use in builtin signatures. It also defines a new diagnostic warn_fortify_destination_size_mismatch to warn when the called function may read beyond the bounds of the given source buffer.

It also moves the comparison between source and destination sizes into each case of the switch block, this means that a given diagnostic can be triggered by buffer overflows or over-reads.

Tackles a few some items from #142230

Questions

  1. Do chk variants of these functions e.g. __builtin___getcwd_chk etc. makes sense to implement for fortify-source (LSB)?

  2. Is defining off_t as unsigned long robust enough?

  3. Are the names of warn_fortify_destination_size_mismatch and warn_fortify_source_size_mismatch easy enough to distinguish?

  4. Should the pread64/pwrite64 builtin be marked as GNULibBuiltin as it is linux specific?

Further Work

I plan to port more of the fortify checks from the bionic headers. I'd be interested to hear opinions on whether these belong alongside clangs fortify-source diagnostics or would be better suited elsewhere in Sema. Specifically I plan to work on:

Define as builtin and check for overflows and over-reads in:
* `read`
* `pread`/`pread64`
* `write`
* `pwrite`/`pwrite64`
* `getcwd`
* `readlink`
* `readlinkat`

It also enables `off_t`, `off64_t` and `ssize_t` for use in builtin
signatures.

Signed-off-by: Colin Kinloch <[email protected]>
Copy link

github-actions bot commented Oct 2, 2025

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" labels Oct 2, 2025
@llvmbot
Copy link
Member

llvmbot commented Oct 2, 2025

@llvm/pr-subscribers-clang

Author: Colin Kinloch (ColinKinloch)

Changes

This MR implements in clang some of the overflow and over-read checks implemented in bionics FORTIFY headers.

In order to get the checks working with the existing overflow checker this MR defines the following as builtin:

  • read
  • pread/pread64
  • write
  • pwrite/pwrite64
  • getcwd
  • readlink
  • readlinkat

The MR also enables ssize_t in clangs AST and adds the off_t, off64_t and ssize_t types for use in builtin signatures. It also defines a new diagnostic warn_fortify_destination_size_mismatch to warn when the called function may read beyond the bounds of the given source buffer.

It also moves the comparison between source and destination sizes into each case of the switch block, this means that a given diagnostic can be triggered by buffer overflows or over-reads.

Questions

Do chk variants of these functions e.g. __builtin___getcwd_chk etc. makes sense to implement for fortify-source (LSB)?

Is defining off_t as unsigned long robust enough?

Are the names of warn_fortify_destination_size_mismatch and warn_fortify_source_size_mismatch easy enough to distinguish?

Further Work

I plan to port more of the fortify checks from the bionic headers. I'd be interested to hear opinions on whether these belong alongside clangs fortify-source diagnostics or would be better suited elsewhere in Sema. Specifically I plan to work on:


Full diff: https://github.com/llvm/llvm-project/pull/161737.diff

6 Files Affected:

  • (modified) clang/include/clang/Basic/Builtins.td (+65)
  • (modified) clang/include/clang/Basic/DiagnosticSemaKinds.td (+4)
  • (modified) clang/lib/AST/ASTContext.cpp (+6-3)
  • (modified) clang/lib/Sema/SemaChecking.cpp (+68-2)
  • (modified) clang/test/Sema/warn-fortify-source.c (+55)
  • (modified) clang/utils/TableGen/ClangBuiltinsEmitter.cpp (+3)
diff --git a/clang/include/clang/Basic/Builtins.td b/clang/include/clang/Basic/Builtins.td
index 468121f7d20ab..3c3a5a82a28a6 100644
--- a/clang/include/clang/Basic/Builtins.td
+++ b/clang/include/clang/Basic/Builtins.td
@@ -3490,6 +3490,8 @@ def StrnCaseCmp : GNULibBuiltin<"strings.h"> {
   let RequiresUndef = 1;
 }
 
+// POSIX unistd.h
+
 def GNU_Exit : GNULibBuiltin<"unistd.h"> {
   let Spellings = ["_exit"];
   let Attributes = [NoReturn];
@@ -3502,6 +3504,69 @@ def VFork : LibBuiltin<"unistd.h"> {
   let Prototype = "pid_t()";
 }
 
+def Read : LibBuiltin<"unistd.h"> {
+  let Spellings = ["read"];
+  let Attributes = [NoThrow];
+  let Prototype = "ssize_t(int, void*, size_t)";
+  let AddBuiltinPrefixedAlias = 1;
+}
+
+def PRead : LibBuiltin<"unistd.h"> {
+  let Spellings = ["pread"];
+  let Attributes = [NoThrow];
+  let Prototype = "ssize_t(int, void*, size_t, off_t)";
+  let AddBuiltinPrefixedAlias = 1;
+}
+
+def PRead64 : LibBuiltin<"unistd.h"> {
+  let Spellings = ["pread64"];
+  let Attributes = [NoThrow];
+  let Prototype = "ssize_t(int, void*, size_t, off64_t)";
+  let AddBuiltinPrefixedAlias = 1;
+}
+
+def Write : LibBuiltin<"unistd.h"> {
+  let Spellings = ["write"];
+  let Attributes = [NoThrow];
+  let Prototype = "ssize_t(int, void const*, size_t)";
+  let AddBuiltinPrefixedAlias = 1;
+}
+
+def PWrite : LibBuiltin<"unistd.h"> {
+  let Spellings = ["pwrite"];
+  let Attributes = [NoThrow];
+  let Prototype = "ssize_t(int, void const*, size_t, off_t)";
+  let AddBuiltinPrefixedAlias = 1;
+}
+
+def PWrite64 : LibBuiltin<"unistd.h"> {
+  let Spellings = ["pwrite64"];
+  let Attributes = [NoThrow];
+  let Prototype = "ssize_t(int, void const*, size_t, off64_t)";
+  let AddBuiltinPrefixedAlias = 1;
+}
+
+def GetCWD : LibBuiltin<"unistd.h"> {
+  let Spellings = ["getcwd"];
+  let Attributes = [NoThrow];
+  let Prototype = "char*(char*, size_t)";
+  let AddBuiltinPrefixedAlias = 1;
+}
+
+def ReadLink : LibBuiltin<"unistd.h"> {
+  let Spellings = ["readlink"];
+  let Attributes = [NoThrow];
+  let Prototype = "ssize_t(char const* restrict, char* restrict, size_t)";
+  let AddBuiltinPrefixedAlias = 1;
+}
+
+def ReadLinkAt : LibBuiltin<"unistd.h"> {
+  let Spellings = ["readlinkat"];
+  let Attributes = [NoThrow];
+  let Prototype = "ssize_t(int, char const* restrict, char* restrict, size_t)";
+  let AddBuiltinPrefixedAlias = 1;
+}
+
 // POSIX pthread.h
 
 def PthreadCreate : GNULibBuiltin<"pthread.h"> {
diff --git a/clang/include/clang/Basic/DiagnosticSemaKinds.td b/clang/include/clang/Basic/DiagnosticSemaKinds.td
index b157cbb0b8069..f2206ad08414e 100644
--- a/clang/include/clang/Basic/DiagnosticSemaKinds.td
+++ b/clang/include/clang/Basic/DiagnosticSemaKinds.td
@@ -953,6 +953,10 @@ def warn_fortify_source_overflow
 def warn_fortify_source_size_mismatch : Warning<
   "'%0' size argument is too large; destination buffer has size %1,"
   " but size argument is %2">, InGroup<FortifySource>;
+def warn_fortify_destination_size_mismatch
+    : Warning<"'%0' size argument is too large; source buffer has size %2,"
+              " but size argument is %1">,
+      InGroup<FortifySource>;
 
 def warn_fortify_strlen_overflow: Warning<
   "'%0' will always overflow; destination buffer has size %1,"
diff --git a/clang/lib/AST/ASTContext.cpp b/clang/lib/AST/ASTContext.cpp
index 056bfe36b2a0a..d0432de14dbb3 100644
--- a/clang/lib/AST/ASTContext.cpp
+++ b/clang/lib/AST/ASTContext.cpp
@@ -12528,9 +12528,12 @@ static QualType DecodeTypeFromStr(const char *&Str, const ASTContext &Context,
     assert(HowLong == 0 && !Signed && !Unsigned && "Bad modifiers for 'b'!");
     Type = Context.BoolTy;
     break;
-  case 'z':  // size_t.
-    assert(HowLong == 0 && !Signed && !Unsigned && "Bad modifiers for 'z'!");
-    Type = Context.getSizeType();
+  case 'z': // size_t and ssize_t.
+    assert(HowLong == 0 && "Bad modifiers for 'z'!");
+    if (Signed)
+      Type = Context.getSignedSizeType();
+    else
+      Type = Context.getSizeType();
     break;
   case 'w':  // wchar_t.
     assert(HowLong == 0 && !Signed && !Unsigned && "Bad modifiers for 'w'!");
diff --git a/clang/lib/Sema/SemaChecking.cpp b/clang/lib/Sema/SemaChecking.cpp
index 7b37e0b8d5430..c7d6d0c142eee 100644
--- a/clang/lib/Sema/SemaChecking.cpp
+++ b/clang/lib/Sema/SemaChecking.cpp
@@ -1243,6 +1243,14 @@ void Sema::checkFortifiedBuiltinMemoryFunction(FunctionDecl *FD,
   std::optional<llvm::APSInt> DestinationSize;
   unsigned DiagID = 0;
   bool IsChkVariant = false;
+  bool IsTriggered = false;
+
+  auto CompareSizeSourceToDest = [&]() {
+    return SourceSize && DestinationSize
+               ? std::optional<int>{llvm::APSInt::compareValues(
+                     *SourceSize, *DestinationSize)}
+               : std::nullopt;
+  };
 
   auto GetFunctionName = [&]() {
     std::string FunctionNameStr =
@@ -1270,6 +1278,7 @@ void Sema::checkFortifiedBuiltinMemoryFunction(FunctionDecl *FD,
     DiagID = diag::warn_fortify_strlen_overflow;
     SourceSize = ComputeStrLenArgument(1);
     DestinationSize = ComputeSizeArgument(0);
+    IsTriggered = CompareSizeSourceToDest() > 0;
     break;
   }
 
@@ -1279,6 +1288,7 @@ void Sema::checkFortifiedBuiltinMemoryFunction(FunctionDecl *FD,
     SourceSize = ComputeStrLenArgument(1);
     DestinationSize = ComputeExplicitObjectSizeArgument(2);
     IsChkVariant = true;
+    IsTriggered = CompareSizeSourceToDest() > 0;
     break;
   }
 
@@ -1349,11 +1359,13 @@ void Sema::checkFortifiedBuiltinMemoryFunction(FunctionDecl *FD,
         } else {
           DestinationSize = ComputeSizeArgument(0);
         }
+        IsTriggered = CompareSizeSourceToDest() > 0;
         break;
       }
     }
     return;
   }
+
   case Builtin::BI__builtin___memcpy_chk:
   case Builtin::BI__builtin___memmove_chk:
   case Builtin::BI__builtin___memset_chk:
@@ -1369,6 +1381,7 @@ void Sema::checkFortifiedBuiltinMemoryFunction(FunctionDecl *FD,
     DestinationSize =
         ComputeExplicitObjectSizeArgument(TheCall->getNumArgs() - 1);
     IsChkVariant = true;
+    IsTriggered = CompareSizeSourceToDest() > 0;
     break;
   }
 
@@ -1378,6 +1391,7 @@ void Sema::checkFortifiedBuiltinMemoryFunction(FunctionDecl *FD,
     SourceSize = ComputeExplicitObjectSizeArgument(1);
     DestinationSize = ComputeExplicitObjectSizeArgument(3);
     IsChkVariant = true;
+    IsTriggered = CompareSizeSourceToDest() > 0;
     break;
   }
 
@@ -1395,6 +1409,7 @@ void Sema::checkFortifiedBuiltinMemoryFunction(FunctionDecl *FD,
     DiagID = diag::warn_fortify_source_size_mismatch;
     SourceSize = ComputeExplicitObjectSizeArgument(TheCall->getNumArgs() - 1);
     DestinationSize = ComputeSizeArgument(0);
+    IsTriggered = CompareSizeSourceToDest() > 0;
     break;
   }
 
@@ -1409,8 +1424,58 @@ void Sema::checkFortifiedBuiltinMemoryFunction(FunctionDecl *FD,
     DiagID = diag::warn_fortify_source_overflow;
     SourceSize = ComputeExplicitObjectSizeArgument(TheCall->getNumArgs() - 1);
     DestinationSize = ComputeSizeArgument(0);
+    IsTriggered = CompareSizeSourceToDest() > 0;
+    break;
+  }
+
+  case Builtin::BIread:
+  case Builtin::BI__builtin_read:
+  case Builtin::BIreadlink:
+  case Builtin::BI__builtin_readlink:
+  case Builtin::BIreadlinkat:
+  case Builtin::BI__builtin_readlinkat:
+  case Builtin::BIgetcwd:
+  case Builtin::BI__builtin_getcwd: {
+    DiagID = diag::warn_fortify_source_size_mismatch;
+    SourceSize = ComputeExplicitObjectSizeArgument(TheCall->getNumArgs() - 1);
+    DestinationSize = ComputeSizeArgument(TheCall->getNumArgs() - 2);
+    IsTriggered = CompareSizeSourceToDest() > 0;
+    break;
+  }
+
+  case Builtin::BIpread:
+  case Builtin::BI__builtin_pread:
+  case Builtin::BIpread64:
+  case Builtin::BI__builtin_pread64: {
+    DiagID = diag::warn_fortify_source_size_mismatch;
+    SourceSize = ComputeExplicitObjectSizeArgument(TheCall->getNumArgs() - 2);
+    DestinationSize = ComputeSizeArgument(TheCall->getNumArgs() - 3);
+    IsTriggered = CompareSizeSourceToDest() > 0;
+    break;
+  }
+
+  case Builtin::BIwrite:
+  case Builtin::BI__builtin_write: {
+    DiagID = diag::warn_fortify_destination_size_mismatch;
+    SourceSize = ComputeSizeArgument(TheCall->getNumArgs() - 2);
+    DestinationSize =
+        ComputeExplicitObjectSizeArgument(TheCall->getNumArgs() - 1);
+    IsTriggered = CompareSizeSourceToDest() < 0;
     break;
   }
+
+  case Builtin::BIpwrite:
+  case Builtin::BI__builtin_pwrite:
+  case Builtin::BIpwrite64:
+  case Builtin::BI__builtin_pwrite64: {
+    DiagID = diag::warn_fortify_destination_size_mismatch;
+    SourceSize = ComputeSizeArgument(TheCall->getNumArgs() - 3);
+    DestinationSize =
+        ComputeExplicitObjectSizeArgument(TheCall->getNumArgs() - 2);
+    IsTriggered = CompareSizeSourceToDest() < 0;
+    break;
+  }
+
   case Builtin::BIsnprintf:
   case Builtin::BI__builtin_snprintf:
   case Builtin::BIvsnprintf:
@@ -1446,11 +1511,12 @@ void Sema::checkFortifiedBuiltinMemoryFunction(FunctionDecl *FD,
       }
     }
     DestinationSize = ComputeSizeArgument(0);
+    IsTriggered = CompareSizeSourceToDest() > 0;
+    break;
   }
   }
 
-  if (!SourceSize || !DestinationSize ||
-      llvm::APSInt::compareValues(*SourceSize, *DestinationSize) <= 0)
+  if (!IsTriggered)
     return;
 
   std::string FunctionName = GetFunctionName();
diff --git a/clang/test/Sema/warn-fortify-source.c b/clang/test/Sema/warn-fortify-source.c
index 216878c0836d8..5d4a8d81b0b3e 100644
--- a/clang/test/Sema/warn-fortify-source.c
+++ b/clang/test/Sema/warn-fortify-source.c
@@ -96,6 +96,61 @@ void call_memset(void) {
   __builtin_memset(buf, 0xff, 11); // expected-warning {{'memset' will always overflow; destination buffer has size 10, but size argument is 11}}
 }
 
+void call_read(void) {
+  char buf[10];
+  __builtin_read(0, buf, 10);
+  __builtin_read(0, buf, 20); // expected-warning {{'read' size argument is too large; destination buffer has size 10, but size argument is 20}}
+}
+
+void call_pread(void) {
+  char buf[10];
+  __builtin_pread(0, buf, 10, 0);
+  __builtin_pread(0, buf, 20, 0); // expected-warning {{'pread' size argument is too large; destination buffer has size 10, but size argument is 20}}
+}
+
+void call_pread64(void) {
+  char buf[10];
+  __builtin_pread64(0, buf, 10, 0);
+  __builtin_pread64(0, buf, 20, 0); // expected-warning {{'pread64' size argument is too large; destination buffer has size 10, but size argument is 20}}
+}
+
+void call_write(void) {
+  char buf[10];
+  __builtin_write(0, buf, 10);
+  __builtin_write(0, buf, 20); // expected-warning {{'write' size argument is too large; source buffer has size 10, but size argument is 20}}
+}
+
+void call_pwrite(void) {
+  char buf[10];
+  __builtin_pwrite(0, buf, 10, 0);
+  __builtin_pwrite(0, buf, 20, 0); // expected-warning {{'pwrite' size argument is too large; source buffer has size 10, but size argument is 20}}
+}
+
+void call_pwrite64(void) {
+  char buf[10];
+  __builtin_pwrite64(0, buf, 10, 0);
+  __builtin_pwrite64(0, buf, 20, 0); // expected-warning {{'pwrite64' size argument is too large; source buffer has size 10, but size argument is 20}}
+}
+
+void call_getcwd(void) {
+  char buf[10];
+  __builtin_getcwd(buf, 10);
+  __builtin_getcwd(buf, 20); // expected-warning {{'getcwd' size argument is too large; destination buffer has size 10, but size argument is 20}}
+}
+
+void call_readlink(void) {
+  char buf[10];
+  __builtin_readlink("path", buf, 10);
+  __builtin_readlink("path", buf, 20); // expected-warning {{'readlink' size argument is too large; destination buffer has size 10, but size argument is 20}}
+}
+
+void call_readlinkat(void) {
+  char buf[10];
+  __builtin_readlinkat(0, "path", buf, 10);
+  __builtin_readlinkat(0, "path", buf, 20); // expected-warning {{'readlinkat' size argument is too large; destination buffer has size 10, but size argument is 20}}
+}
+
+
 void call_snprintf(double d, int n) {
   char buf[10];
   __builtin_snprintf(buf, 10, "merp");
diff --git a/clang/utils/TableGen/ClangBuiltinsEmitter.cpp b/clang/utils/TableGen/ClangBuiltinsEmitter.cpp
index 352765acf5338..30c7afd95a89a 100644
--- a/clang/utils/TableGen/ClangBuiltinsEmitter.cpp
+++ b/clang/utils/TableGen/ClangBuiltinsEmitter.cpp
@@ -347,12 +347,15 @@ class PrototypeParser {
                                .Case("msint32_t", "Ni")
                                .Case("msuint32_t", "UNi")
                                .Case("objc_super", "M")
+                               .Case("off_t", "Li")
+                               .Case("off64_t", "Wi")
                                .Case("pid_t", "p")
                                .Case("ptrdiff_t", "Y")
                                .Case("SEL", "H")
                                .Case("short", "s")
                                .Case("sigjmp_buf", "SJ")
                                .Case("size_t", "z")
+                               .Case("ssize_t", "Sz")
                                .Case("ucontext_t", "K")
                                .Case("uint32_t", "UZi")
                                .Case("uint64_t", "UWi")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants