Skip to content

Conversation

bozicrHT
Copy link
Contributor

Annotate printf/scanf and related builtins with the nonnull attribute on their format string parameters. This enables diagnostics when NULL is passed, matching GCC behavior. Updated existing Sema tests and added new one for coverage.

Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" labels Sep 15, 2025
@llvmbot
Copy link
Member

llvmbot commented Sep 15, 2025

@llvm/pr-subscribers-clang

Author: Radovan Božić (bozicrHT)

Changes

Annotate printf/scanf and related builtins with the nonnull attribute on their format string parameters. This enables diagnostics when NULL is passed, matching GCC behavior. Updated existing Sema tests and added new one for coverage.


Full diff: https://github.com/llvm/llvm-project/pull/158626.diff

10 Files Affected:

  • (modified) clang/include/clang/Basic/Builtins.def (+1)
  • (modified) clang/include/clang/Basic/Builtins.h (+4)
  • (modified) clang/include/clang/Basic/Builtins.td (+16-15)
  • (modified) clang/include/clang/Basic/BuiltinsBase.td (+1-1)
  • (modified) clang/lib/Basic/Builtins.cpp (+28)
  • (modified) clang/lib/Sema/SemaDecl.cpp (+9)
  • (added) clang/test/Sema/format-strings-nonnull.c (+74)
  • (modified) clang/test/Sema/format-strings.c (+2-4)
  • (modified) clang/test/SemaCXX/format-strings-0x.cpp (+2)
  • (modified) clang/test/SemaObjC/format-strings-objc.m (+2-1)
diff --git a/clang/include/clang/Basic/Builtins.def b/clang/include/clang/Basic/Builtins.def
index 48437c9397570..dfd54c974c322 100644
--- a/clang/include/clang/Basic/Builtins.def
+++ b/clang/include/clang/Basic/Builtins.def
@@ -67,6 +67,7 @@
 // The third value provided to the macro specifies information about attributes
 // of the function.  These must be kept in sync with the predicates in the
 // Builtin::Context class.  Currently we have:
+//  N -> nonnull
 //  n -> nothrow
 //  r -> noreturn
 //  U -> pure
diff --git a/clang/include/clang/Basic/Builtins.h b/clang/include/clang/Basic/Builtins.h
index 3a5e31de2bc50..68d7043cac1bf 100644
--- a/clang/include/clang/Basic/Builtins.h
+++ b/clang/include/clang/Basic/Builtins.h
@@ -392,6 +392,10 @@ class Context {
   bool performsCallback(unsigned ID,
                         llvm::SmallVectorImpl<int> &Encoding) const;
 
+  /// Return true if this builtin has parameters at fixed positions
+  /// that must be non-null.
+  bool IsNonNull(unsigned ID, llvm::SmallVectorImpl<int> &Indxs) const;
+
   /// Return true if this function has no side effects and doesn't
   /// read memory, except for possibly errno or raising FP exceptions.
   ///
diff --git a/clang/include/clang/Basic/Builtins.td b/clang/include/clang/Basic/Builtins.td
index 27639f06529cb..373715b07f732 100644
--- a/clang/include/clang/Basic/Builtins.td
+++ b/clang/include/clang/Basic/Builtins.td
@@ -3083,104 +3083,105 @@ def StrLen : LibBuiltin<"string.h"> {
 // FIXME: This list is incomplete.
 def Printf : LibBuiltin<"stdio.h"> {
   let Spellings = ["printf"];
-  let Attributes = [PrintfFormat<0>];
+  let Attributes = [PrintfFormat<0>, NonNull<[0]>];
   let Prototype = "int(char const*, ...)";
 }
 
 // FIXME: The builtin and library function should have the same signature.
 def BuiltinPrintf : Builtin {
   let Spellings = ["__builtin_printf"];
-  let Attributes = [NoThrow, PrintfFormat<0>, FunctionWithBuiltinPrefix];
+  let Attributes = [NoThrow, PrintfFormat<0>, FunctionWithBuiltinPrefix,
+                    NonNull<[0]>];
   let Prototype = "int(char const* restrict, ...)";
 }
 
 def FPrintf : LibBuiltin<"stdio.h"> {
   let Spellings = ["fprintf"];
-  let Attributes = [NoThrow, PrintfFormat<1>];
+  let Attributes = [NoThrow, PrintfFormat<1>, NonNull<[1]>];
   let Prototype = "int(FILE* restrict, char const* restrict, ...)";
   let AddBuiltinPrefixedAlias = 1;
 }
 
 def SnPrintf : LibBuiltin<"stdio.h"> {
   let Spellings = ["snprintf"];
-  let Attributes = [NoThrow, PrintfFormat<2>];
+  let Attributes = [NoThrow, PrintfFormat<2>, NonNull<[2]>];
   let Prototype = "int(char* restrict, size_t, char const* restrict, ...)";
   let AddBuiltinPrefixedAlias = 1;
 }
 
 def SPrintf : LibBuiltin<"stdio.h"> {
   let Spellings = ["sprintf"];
-  let Attributes = [NoThrow, PrintfFormat<1>];
+  let Attributes = [NoThrow, PrintfFormat<1>, NonNull<[1]>];
   let Prototype = "int(char* restrict, char const* restrict, ...)";
   let AddBuiltinPrefixedAlias = 1;
 }
 
 def VPrintf : LibBuiltin<"stdio.h"> {
   let Spellings = ["vprintf"];
-  let Attributes = [NoThrow, VPrintfFormat<0>];
+  let Attributes = [NoThrow, VPrintfFormat<0>, NonNull<[0]>];
   let Prototype = "int(char const* restrict, __builtin_va_list)";
   let AddBuiltinPrefixedAlias = 1;
 }
 
 def VfPrintf : LibBuiltin<"stdio.h"> {
   let Spellings = ["vfprintf"];
-  let Attributes = [NoThrow, VPrintfFormat<1>];
+  let Attributes = [NoThrow, VPrintfFormat<1>, NonNull<[1]>];
   let Prototype = "int(FILE* restrict, char const* restrict, __builtin_va_list)";
   let AddBuiltinPrefixedAlias = 1;
 }
 
 def VsnPrintf : LibBuiltin<"stdio.h"> {
   let Spellings = ["vsnprintf"];
-  let Attributes = [NoThrow, VPrintfFormat<2>];
+  let Attributes = [NoThrow, VPrintfFormat<2>, NonNull<[2]>];
   let Prototype = "int(char* restrict, size_t, char const* restrict, __builtin_va_list)";
   let AddBuiltinPrefixedAlias = 1;
 }
 
 def VsPrintf : LibBuiltin<"stdio.h"> {
   let Spellings = ["vsprintf"];
-  let Attributes = [NoThrow, VPrintfFormat<1>];
+  let Attributes = [NoThrow, VPrintfFormat<1>, NonNull<[1]>];
   let Prototype = "int(char* restrict, char const* restrict, __builtin_va_list)";
   let AddBuiltinPrefixedAlias = 1;
 }
 
 def Scanf : LibBuiltin<"stdio.h"> {
   let Spellings = ["scanf"];
-  let Attributes = [ScanfFormat<0>];
+  let Attributes = [ScanfFormat<0>, NonNull<[0]>];
   let Prototype = "int(char const* restrict, ...)";
   let AddBuiltinPrefixedAlias = 1;
 }
 
 def FScanf : LibBuiltin<"stdio.h"> {
   let Spellings = ["fscanf"];
-  let Attributes = [ScanfFormat<1>];
+  let Attributes = [ScanfFormat<1>, NonNull<[1]>];
   let Prototype = "int(FILE* restrict, char const* restrict, ...)";
   let AddBuiltinPrefixedAlias = 1;
 }
 
 def SScanf : LibBuiltin<"stdio.h"> {
   let Spellings = ["sscanf"];
-  let Attributes = [ScanfFormat<1>];
+  let Attributes = [ScanfFormat<1>, NonNull<[1]>];
   let Prototype = "int(char const* restrict, char const* restrict, ...)";
   let AddBuiltinPrefixedAlias = 1;
 }
 
 def VScanf : LibBuiltin<"stdio.h"> {
   let Spellings = ["vscanf"];
-  let Attributes = [VScanfFormat<0>];
+  let Attributes = [VScanfFormat<0>, NonNull<[0]>];
   let Prototype = "int(char const* restrict, __builtin_va_list)";
   let AddBuiltinPrefixedAlias = 1;
 }
 
 def VFScanf : LibBuiltin<"stdio.h"> {
   let Spellings = ["vfscanf"];
-  let Attributes = [VScanfFormat<1>];
+  let Attributes = [VScanfFormat<1>, NonNull<[1]>];
   let Prototype = "int(FILE* restrict, char const* restrict, __builtin_va_list)";
   let AddBuiltinPrefixedAlias = 1;
 }
 
 def VSScanf : LibBuiltin<"stdio.h"> {
   let Spellings = ["vsscanf"];
-  let Attributes = [VScanfFormat<1>];
+  let Attributes = [VScanfFormat<1>, NonNull<[1]>];
   let Prototype = "int(char const* restrict, char const* restrict, __builtin_va_list)";
   let AddBuiltinPrefixedAlias = 1;
 }
diff --git a/clang/include/clang/Basic/BuiltinsBase.td b/clang/include/clang/Basic/BuiltinsBase.td
index 09bc9f89059fe..73918ab167b8d 100644
--- a/clang/include/clang/Basic/BuiltinsBase.td
+++ b/clang/include/clang/Basic/BuiltinsBase.td
@@ -32,7 +32,6 @@ def Const : Attribute<"c">;
 def NoThrow : Attribute<"n">;
 def Pure : Attribute<"U">;
 def ReturnsTwice : Attribute<"j">;
-//  FIXME: gcc has nonnull
 
 // builtin-specific attributes
 // ---------------------------
@@ -85,6 +84,7 @@ def Consteval : Attribute<"EG">;
 // Callback behavior: the first index argument is called with the arguments
 // indicated by the remaining indices.
 class Callback<list<int> ArgIndices> : MultiIndexAttribute<"C", ArgIndices>;
+class NonNull<list<int> ArgIndices> : MultiIndexAttribute<"N", ArgIndices>;
 
 // Prefixes
 // ========
diff --git a/clang/lib/Basic/Builtins.cpp b/clang/lib/Basic/Builtins.cpp
index acd98fe84adf5..ae94a5b740540 100644
--- a/clang/lib/Basic/Builtins.cpp
+++ b/clang/lib/Basic/Builtins.cpp
@@ -293,6 +293,34 @@ bool Builtin::Context::isScanfLike(unsigned ID, unsigned &FormatIdx,
   return isLike(ID, FormatIdx, HasVAListArg, "sS");
 }
 
+bool Builtin::Context::IsNonNull(unsigned ID,
+                                 llvm::SmallVectorImpl<int> &Indxs) const {
+
+  const char *CalleePos = ::strchr(getAttributesString(ID), 'N');
+  if (!CalleePos)
+    return false;
+
+  ++CalleePos;
+  assert(*CalleePos == '<' &&
+         "Callback callee specifier must be followed by a '<'");
+  ++CalleePos;
+
+  char *EndPos;
+  int CalleeIdx = ::strtol(CalleePos, &EndPos, 10);
+  assert(CalleeIdx >= 0 && "Callee index is supposed to be positive!");
+  Indxs.push_back(CalleeIdx);
+
+  while (*EndPos == ',') {
+    const char *PayloadPos = EndPos + 1;
+
+    int PayloadIdx = ::strtol(PayloadPos, &EndPos, 10);
+    Indxs.push_back(PayloadIdx);
+  }
+
+  assert(*EndPos == '>' && "Callback callee specifier must end with a '>'");
+  return true;
+}
+
 bool Builtin::Context::performsCallback(unsigned ID,
                                         SmallVectorImpl<int> &Encoding) const {
   const char *CalleePos = ::strchr(getAttributesString(ID), 'C');
diff --git a/clang/lib/Sema/SemaDecl.cpp b/clang/lib/Sema/SemaDecl.cpp
index 365ebb63b1559..7c40f665ea7aa 100644
--- a/clang/lib/Sema/SemaDecl.cpp
+++ b/clang/lib/Sema/SemaDecl.cpp
@@ -17097,6 +17097,15 @@ void Sema::AddKnownFunctionAttributes(FunctionDecl *FD) {
       }
     }
 
+    SmallVector<int, 4> Indxs;
+    if (Context.BuiltinInfo.IsNonNull(BuiltinID, Indxs) &&
+        !FD->hasAttr<NonNullAttr>()) {
+      llvm::SmallVector<ParamIdx, 4> ParamIndxs;
+      for (int I : Indxs)
+        ParamIndxs.push_back(ParamIdx(I + 1, FD));
+      FD->addAttr(NonNullAttr::CreateImplicit(Context, ParamIndxs.data(),
+                                              ParamIndxs.size()));
+    }
     if (Context.BuiltinInfo.isReturnsTwice(BuiltinID) &&
         !FD->hasAttr<ReturnsTwiceAttr>())
       FD->addAttr(ReturnsTwiceAttr::CreateImplicit(Context,
diff --git a/clang/test/Sema/format-strings-nonnull.c b/clang/test/Sema/format-strings-nonnull.c
new file mode 100644
index 0000000000000..6e00dcb55cb3d
--- /dev/null
+++ b/clang/test/Sema/format-strings-nonnull.c
@@ -0,0 +1,74 @@
+// RUN: %clang_cc1 -fsyntax-only -verify -Wnonnull -Wno-format-security %s
+
+#include <stdarg.h>
+#include <stddef.h>
+
+typedef struct _FILE FILE;
+
+int printf(char const* restrict, ...);
+int __builtin_printf(char const* restrict, ...);
+int fprintf(FILE* restrict, char const* restrict, ...);
+int snprintf(char* restrict, size_t, char const* restrict, ...);
+int sprintf(char* restrict, char const* restrict, ...);
+int vprintf(char const* restrict, __builtin_va_list);
+int vfprintf(FILE* restrict, char const* restrict, __builtin_va_list);
+int vsnprintf(char* restrict, size_t, char const* restrict, __builtin_va_list);
+int vsprintf(char* restrict, char const* restrict, __builtin_va_list);
+
+int scanf(char const* restrict, ...);
+int fscanf(FILE* restrict, char const* restrict, ...);
+int sscanf(char const* restrict, char const* restrict, ...);
+int vscanf(char const* restrict, __builtin_va_list);
+int vfscanf(FILE* restrict, char const* restrict, __builtin_va_list);
+int vsscanf(char const* restrict, char const* restrict, __builtin_va_list);
+
+
+void check_format_string(FILE *fp, va_list ap) {
+    char buf[256];
+    char* const fmt = NULL;
+
+    printf(fmt);
+    // expected-warning@-1{{null passed to a callee that requires a non-null argument}}
+
+    __builtin_printf(NULL, "xxd");
+    // expected-warning@-1{{null passed to a callee that requires a non-null argument}}
+
+    fprintf(fp, NULL, 25);
+    // expected-warning@-1{{null passed to a callee that requires a non-null argument}}
+
+    sprintf(buf, NULL, 42);
+    // expected-warning@-1{{null passed to a callee that requires a non-null argument}}
+
+    snprintf(buf, 10, 0, 42);
+    // expected-warning@-1{{null passed to a callee that requires a non-null argument}}
+
+    vprintf(fmt, ap);
+    // expected-warning@-1{{null passed to a callee that requires a non-null argument}}
+
+    vfprintf(fp, 0, ap);
+    // expected-warning@-1{{null passed to a callee that requires a non-null argument}}
+
+    vsprintf(buf, NULL, ap);
+    // expected-warning@-1{{null passed to a callee that requires a non-null argument}}
+
+    vsnprintf(buf, 10, fmt, ap);
+    // expected-warning@-1{{null passed to a callee that requires a non-null argument}}
+
+    scanf(NULL);
+    // expected-warning@-1{{null passed to a callee that requires a non-null argument}}
+
+    fscanf(fp, NULL);
+    // expected-warning@-1{{null passed to a callee that requires a non-null argument}}
+
+    sscanf(buf, fmt);
+    // expected-warning@-1{{null passed to a callee that requires a non-null argument}}
+
+    vscanf(NULL, ap);
+    // expected-warning@-1{{null passed to a callee that requires a non-null argument}}
+
+    vfscanf(fp, fmt, ap);
+    // expected-warning@-1{{null passed to a callee that requires a non-null argument}}
+
+    vsscanf(buf, NULL, ap);
+    // expected-warning@-1{{null passed to a callee that requires a non-null argument}}
+}
\ No newline at end of file
diff --git a/clang/test/Sema/format-strings.c b/clang/test/Sema/format-strings.c
index af30ad5d15fe2..431f12d50a0f4 100644
--- a/clang/test/Sema/format-strings.c
+++ b/clang/test/Sema/format-strings.c
@@ -480,11 +480,9 @@ void pr7981(wint_t c, wchar_t c2) {
 #endif
 }
 
-// -Wformat-security says NULL is not a string literal
 void rdar8269537(void) {
-  // This is likely to crash in most cases, but -Wformat-nonliteral technically
-  // doesn't warn in this case.
-  printf(0); // no-warning
+  printf(0);
+  // expected-warning@-1{{null passed to a callee that requires a non-null argument}}
 }
 
 // Handle functions with multiple format attributes.
diff --git a/clang/test/SemaCXX/format-strings-0x.cpp b/clang/test/SemaCXX/format-strings-0x.cpp
index 7d37f8276f29f..e0ca7a270c993 100644
--- a/clang/test/SemaCXX/format-strings-0x.cpp
+++ b/clang/test/SemaCXX/format-strings-0x.cpp
@@ -14,6 +14,7 @@ void f(char **sp, float *fp) {
   printf("%a", 1.0);
   scanf("%afoobar", fp);
   printf(nullptr);
+  // expected-warning@-1{{null passed to a callee that requires a non-null argument}}
   printf(*sp); // expected-warning {{not a string literal}}
   // expected-note@-1{{treat the string as an argument to avoid this}}
 
@@ -32,4 +33,5 @@ void f(char **sp, float *fp) {
   printf("init list: %d", { 0 }); // expected-error {{cannot pass initializer list to variadic function; expected type from format string was 'int'}}
   printf("void: %d", f(sp, fp)); // expected-error {{cannot pass expression of type 'void' to variadic function; expected type from format string was 'int'}}
   printf(0, { 0 }); // expected-error {{cannot pass initializer list to variadic function}}
+  // expected-warning@-1{{null passed to a callee that requires a non-null argument}}
 }
diff --git a/clang/test/SemaObjC/format-strings-objc.m b/clang/test/SemaObjC/format-strings-objc.m
index 40c1d31b1fd4c..babbb40394267 100644
--- a/clang/test/SemaObjC/format-strings-objc.m
+++ b/clang/test/SemaObjC/format-strings-objc.m
@@ -130,7 +130,7 @@ void rdar10743758(id x) {
   printf(s2); // expected-warning {{more '%' conversions than data arguments}}
 
   const char * const s3 = (const char *)0;
-  printf(s3); // no-warning (NULL is a valid format string)
+  printf(s3); // expected-warning {{null passed to a callee that requires a non-null argument}}
 
   NSString * const ns1 = @"constant string %s"; // expected-note {{format string is defined here}}
   NSLog(ns1); // expected-warning {{more '%' conversions than data arguments}}
@@ -259,6 +259,7 @@ void testByValueObjectInFormat(Foo *obj) {
   printf("%d %d %d", 1L, *obj, 1L); // expected-error {{cannot pass object with interface type 'Foo' by value to variadic function; expected type from format string was 'int'}} expected-warning 2 {{format specifies type 'int' but the argument has type 'long'}}
   printf("%!", *obj); // expected-error {{cannot pass object with interface type 'Foo' by value through variadic function}} expected-warning {{invalid conversion specifier}}
   printf(0, *obj); // expected-error {{cannot pass object with interface type 'Foo' by value through variadic function}}
+  // expected-warning@-1{{null passed to a callee that requires a non-null argument}}
 
   [Bar log2:@"%d", *obj]; // expected-error {{cannot pass object with interface type 'Foo' by value to variadic method; expected type from format string was 'int'}}
 }

Copy link
Member

@Sirraide Sirraide left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m not sure it’s defined anywhere whether nullptr is a valid argument to printf, but if GCC diagnoses this then it should be fine for us to do so as well.

// The third value provided to the macro specifies information about attributes
// of the function. These must be kept in sync with the predicates in the
// Builtin::Context class. Currently we have:
// N -> nonnull
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should document that this option takes arguments, similarly to C below.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the same 'N' as 86? if not, we should be documenting them separately/make it clear that they're not (ie, that the 'N' below is a placeholder). Additionally, this probably needs to be together if so?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No in this case it’s a literal N, and yeah, that’s a bit unfortunate; it’d probably be clearer if we quoted everything that’s supposed to be a literal character as opposed to a variable/placeholder, but maybe that should be a separate NFC patch?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm OK if it is a separate NFC patch, but I think we need to do it BEFORE using N to mean something, else said NFC patch gets pretty annoying (and confusing for a while if that doesn't land). I'm open to whatever, but I'd like to see that patch merged before this one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that’s a good point; I don’t feel too strongly about this, so I suppose, @bozicrHT, feel free to make a separate patch for that or to integrate that into this one, whichever is easier for you.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went ahead and created a separate NFC PR #160080 for quoting the literal attribute letters.

Comment on lines 395 to 397
/// Return true if this builtin has parameters at fixed positions
/// that must be non-null.
bool IsNonNull(unsigned ID, llvm::SmallVectorImpl<int> &Indxs) const;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Return true if this builtin has parameters at fixed positions
/// that must be non-null.
bool IsNonNull(unsigned ID, llvm::SmallVectorImpl<int> &Indxs) const;
/// Return true if this builtin has parameters that must be non-null.
/// The parameter indices are appended into 'Indxs'.
bool isNonNull(unsigned ID, SmallVectorImpl<int> &Indxs) const;

The other functions here seem to use camelCase, so this one should too for consistency.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, should the SmallVector take unsigneds instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it should!

}

assert(*EndPos == '>' && "Callback callee specifier must end with a '>'");
return true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is copy-pasted from performsCallback(), which is 1. not great, we should factor out some parts of it (specifically, there should be a separate function that parses a comma-separated list of integers enclosed in <> and writes them to the SmallVector and which is called by both this and performsCallback()), and 2. the naming of some of these local variables doesn’t make sense for this function (e.g. a formatting function has no ‘callee’, so CalleePos isn’t really a good name for this).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds reasonable — I’ll refactor the parsing logic into a small helper function (so both places can share it) and update the local variable names to better reflect their meaning in this context.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I refactored the parsing into a static helper parseCommaSeparatedIndices. Since performsCallback uses SmallVector<int> and isNonNull should use SmallVector<unsigned>, there are two options:

  1. Change isNonNull to use int.
  2. Make the helper templated on the element type to handle both int and unsigned.

We keep them separate because CallbackAttr::CreateImplicit requires int*, and changing that would need a separate PR. What are you thoughts, should I leave it as int or template it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose if preexisting function already use int then I’d just stick w/ int for now.

Comment on lines 17100 to 17107
SmallVector<int, 4> Indxs;
if (Context.BuiltinInfo.IsNonNull(BuiltinID, Indxs) &&
!FD->hasAttr<NonNullAttr>()) {
llvm::SmallVector<ParamIdx, 4> ParamIndxs;
for (int I : Indxs)
ParamIndxs.push_back(ParamIdx(I + 1, FD));
FD->addAttr(NonNullAttr::CreateImplicit(Context, ParamIndxs.data(),
ParamIndxs.size()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it’d make more sense to create the ParamIdx’s in isNonNull() directly just to avoid any potential confusion about whether they’re supposed to be zero or one-based—also, I wonder if NonNull in TableGen shouldn’t just be one-based so we don’t have to remember to add 1 here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ParamIdx requires a Decl pointer as its second argument, which is only available in Sema, not in Builtins (which is included in the preamble). Because of that, constructing ParamIdx directly inside isNonNull() isn’t possible. That’s why I opted to create them later in Sema. We could consider a small refactor (e.g. a helper in Sema) if we want to make the one-based logic cleaner, but I think the current approach is the most practical.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I forgot it requires a decl, in that case that’s fine.

Comment on lines 3 to 4
#include <stdarg.h>
#include <stddef.h>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#include <stdarg.h>
#include <stddef.h>
typedef __SIZE_TYPE__ size_t;

Clang tests should generally never include stdlib headers. If you need definitions from those headers, define them yourself (for size_t in particular, you can do typedef __SIZE_TYPE__ size_t).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the heads-up!


void check_format_string(FILE *fp, va_list ap) {
char buf[256];
char* const fmt = NULL;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They should just work, but can you test C23 nullptr and GNU __null too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since __null is a GCC C++ extension, should I add a small .cpp test file just for that, or include the printf/scanf tests in it as well?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think just adding some C++ tests for __null would be enough

@Sirraide Sirraide requested a review from erichkeane September 15, 2025 17:51
@erichkeane
Copy link
Collaborator

I’m not sure it’s defined anywhere whether nullptr is a valid argument to printf, but if GCC diagnoses this then it should be fine for us to do so as well.

I agree with all you've said in the review.

Quite a few of the functions listed state that it is UB if they get passed 'nullptr' for the format string, but printf/fprintf don't do so explicitly (I only spot-checked). However, the entire standard section for fprintf (printf says "does what fprintf does, to stdout") seems to assume a valid format string, so I think this is reasonable.

I'm happy when you are.

@Sirraide
Copy link
Member

Oh also, before I forget, this should probably get a release note (in clang/docs/ReleaseNotes.rst, probably under the section ‘Improvements to Clang’s Diagnostics’).

Copy link
Contributor

@philnik777 philnik777 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a good idea? This doesn't just add diagnostics, but also makes it UB to pass a nullptr to these functions. What do actual libc implementations do when they are passed a nullptr? Do any of them define their behaviour?

@Sirraide
Copy link
Member

What do actual libc implementations do when they are passed a nullptr? Do any of them define their behaviour?

Glibc at least checks for null and returns -1; I haven’t been able to check other libc implementations. I also tried looking at the C standard, but from what I can tell it doesn’t talk about this situation at all, though maybe I missed something

boomanaiden154 and others added 20 commits September 26, 2025 15:12
Before the environment variable would have a LD_LIBRARY_PATH= prefix
which means we would be setting the variable incorrectly. Setting it
incorrectly does not seem to break anything but might impact test
fidelity.
In some cases, safe-divisor selects can be hoisted out of the vector
loop. Catching all cases in the legacy cost model isn't possible, in
particular checking if all conditions guarding a division are loop
invariant.

Instead, check in planContainsAdditionalSimplifications if there are any
hoisted safe-divisor selects. If so, don't compare to the more
inaccurate legacy cost model.

Fixes llvm#160354.
Fixes llvm#160356.
Enable InitList code to handle zero sized structs; this includes structs
with no fields and those with only unnamed bitfields.
Closes llvm#157920
This more closely matches what we have done for uimm20, and should allow
us to in future differentiate between places that accept %*lo(expr) and
those where that is not allowed.

I have not introduced a `simm12` node for the moment, so that downstream
users notice the change.
Replace early exit from builder methods with assert - they should be not called on a completed record.
Also removes commented-out code.
`FAKE_USE`s are essentially no-ops, so they have to be removed before
running ExplicitLocals so that `drop`s will be correctly inserted to
drop those values used by the `FAKE_USE`s.

Fixes emscripten-core/emscripten#25301.
This can happen when `xor cond, -1` is not combined.
The bots ml-opt-dev-x86-64 and ml-opt-devrel-x86-64 are testing a
configuration which isn't enabled by default, and revealed a dependence
on the register score changed in the prior commit.  As I can't find
anything in the tests which indicate the dependence on non-trivial
remat was *intentional*, I'm just updating the tests to match the
new behavior.
…vm#159887)

Currently, we always pass the "selected" execution context to the
statusline. When handling a process or thread event, we can be more
precise, and build an execution context from the event data. This PR
also adopts the new `StoppedExecutionContext` that was recently
introduced.
…nstructions (llvm#160155)

Vector to scalar movement instructions, as well as mask instructions
like vcpop and vfirst, should have a higher latency & occupancy on
SiFive7.

---------

Co-authored-by: Michael Maitland <[email protected]>
…0409)

Previously we only tested it as taking a pair of `module` and `name`,
e.g `--import-memory=mymodule,mymemory`.

However, rather confusingly, if you just specified `--import-memory=foo`
it would set the module to `foo` and the name to empty string.

This changes the interpretation of `--import-memory=foo` to mean import
`foo` from the default module (which is currently `env`, but one day we
make it configurable).
Avoiding any new inttoptr is unnecessarily restrictive for "plain"
non-integral pointers, but it is important for unstable pointers and
pointers with external state. Fixes another test codegen regression
from llvm#105735.

Reviewed By: nikic

Pull Request: llvm#159959
philnik777 and others added 10 commits September 26, 2025 15:15
…vm#158469)

These functions have never been used outside the dylib, so there is no
point in exporting them.
)

This patch is a first in a series of patches that add codegen support
for fcvt instructions that keep the result in 32-bit or 64-bit SIMD&FP
registers. For a long time, LLVM primarily generated fpcvt instructions,
which store the result in GPRs, resulting in extra moves when the value
was used by NEON instructions that operate on SIMD&FP registers.
Although patterns existed for generating the SIMD variants, they relied
on single-element vector types (such as v1i32 or v1i64) to decide
whether the SIMD variant should be selected. This was not useful,
because many NEON intrinsics and other LLVM IR operations use scalar
types (i32/i64) even though they expect the result to be stored in
SIMD&FP registers.

This patch is part of a series that addresses this and also adds support
for generating these instructions in GlobalISel. To fix this in
SelectionDAG, bitcasts of the result to a floating-point type serve as a
hint that the SIMD variant of the conversion should be used, rather than
relying on single-element vector types. These bitcasts are not currently
generated by LLVM, but the goal is to add explicit bitcasts to the
inputs and outputs of NEON intrinsics operating on integers in follow-up
patches.

For GlobalISel, the register bank selection algorithm is used to
determine which variant to generate
The build script copies lldb-defines.h into the staging area but it gets
overwritten by version-header-fix.py. This flow assumes that the
lldb-defines.h from the source was writable originally (thus the copy
maintains that permission). This is problematic for systems that
integrate LLVM source as read only.

This change skips the initial copy of lldb-defines.h, which prevents
lldb build failures when the source is not writable.
- This implementation is adapted from **SDAG
X86TargetLowering::LowerSET_ROUNDING**.
…pe. (llvm#158426)

cPTR is a wildcard CHERI capability value type, used analogously to iPTR. This allows TableGen patterns to abstract over CHERI capability widths.

Co-authored-by: Jessica Clarke <[email protected]>
…lvm#160697)

Fix a bug where ExecuteRegionOp bufferization dropped the "no_inline"
attribute.

Co-authored-by: Dor Arad <[email protected]>
…e same as the comparison (llvm#160358)

We match uadd's behavior here.

Codegen comparison: https://godbolt.org/z/x8j4EhGno
…docs

Clarify the documentation in Builtins.def by quoting literal attribute
markers (e.g. 'n', 'r', 'U', 'N') to distinguish them from placeholders
such as N in C<N,...>. This avoids confusion and makes the attribute
docs clearer.
…essary" (llvm#160640)

Reapplies llvm#159686

This reverts commit 4f33d7b.

The original landing of this patch had an issue where it would try and
hoist allocas into the entry block that were in the entry block. This
would end up actually moving them lower in the block potentially after
users, resulting in invalid IR.

This update fixes this by ensuring that we are only hoisting static
allocas that have been sunk into a split basic block. A regression test
has been added.

Integration tested using a three stage build of clang with IRPGO
enabled.
Refactor `isNonNull` parsing and add C++ test for GNU `__null`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.