Skip to content
Merged
Show file tree
Hide file tree
Changes from 14 commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions clang/docs/ReleaseNotes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -356,6 +356,12 @@ Attribute Changes in Clang
attribute, but `malloc_span` applies not to functions returning pointers, but to functions returning
span-like structures (i.e. those that contain a pointer field and a size integer field or two pointers).

- Added new attribute ``modular_format`` to allow dynamically selecting at link
time which aspects of a statically linked libc's printf (et al)
implementation are required. This can reduce code size without requiring e.g.
multilibs for printf features. Requires cooperation with the libc
implementation.

Improvements to Clang's diagnostics
-----------------------------------
- Diagnostics messages now refer to ``structured binding`` instead of ``decomposition``,
Expand Down
8 changes: 8 additions & 0 deletions clang/include/clang/Basic/Attr.td
Original file line number Diff line number Diff line change
Expand Up @@ -5323,3 +5323,11 @@ def NonString : InheritableAttr {
let Subjects = SubjectList<[Var, Field]>;
let Documentation = [NonStringDocs];
}

def ModularFormat : InheritableAttr {
let Spellings = [Clang<"modular_format">];
let Args = [IdentifierArgument<"ModularImplFn">, StringArgument<"ImplName">,
VariadicStringArgument<"Aspects">];
let Subjects = SubjectList<[Function]>;
let Documentation = [ModularFormatDocs];
}
36 changes: 36 additions & 0 deletions clang/include/clang/Basic/AttrDocs.td
Original file line number Diff line number Diff line change
Expand Up @@ -9630,3 +9630,39 @@ silence diagnostics with code like:
__attribute__((nonstring)) char NotAStr[3] = "foo"; // Not diagnosed
}];
}

def ModularFormatDocs : Documentation {
let Category = DocCatFunction;
let Content = [{
The ``modular_format`` attribute can be applied to a function that bears the
``format`` attribute (or standard library functions) to indicate that the
implementation is "modular", that is, that the implementation is logically
divided into a number of named aspects. When the compiler can determine that
not all aspects of the implementation are needed for a given call, the compiler
may redirect the call to the identifier given as the first argument to the
attribute (the modular implementation function).

The second argument is a implementation name, and the remaining arguments are
aspects of the format string for the compiler to report. The implementation
name is an unevaluated identifier be in the C namespace.

The compiler reports that a call requires an aspect by issuing a relocation for
the symbol ``<impl_name>_<aspect>`` at the point of the call. This arranges for
code and data needed to support the aspect of the implementation to be brought
into the link to satisfy weak references in the modular implemenation function.
If the compiler does not understand an aspect, it must summarily consider any
call to require that aspect.

For example, say ``printf`` is annotated with
``modular_format(__modular_printf, "__printf", "float")``. Then, a call to
``printf(var, 42)`` would be untouched. A call to ``printf("%d", 42)`` would
become a call to ``__modular_printf`` with the same arguments, as would
Comment on lines +9658 to +9659
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So will any call to printf with a constant format specifier string be rewritten to call __modular_printf?

Also, who is responsible for writing these attributes? Are they only in the libc implementation, or can a user write one of these themselves on their own declarations? I'm asking because I wonder about compatibility; e.g., the call dispatches to __modular_printf but that doesn't know about some particular extension being used in the format specifier and so the code appears to misbehave.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So will any call to printf with a constant format specifier string be rewritten to call __modular_printf?

That's correct.

Also, who is responsible for writing these attributes? Are they only in the libc implementation, or can a user write one of these themselves on their own declarations? I'm asking because I wonder about compatibility; e.g., the call dispatches to __modular_printf but that doesn't know about some particular extension being used in the format specifier and so the code appears to misbehave.

Users could use these for their own implementations, in particular to allow functions that e.g. wrap vsnprintf to do logging etc. As for compatibility, if the compiler understands aspect names that the implementation doesn't, there's no issue, as the compiler will not spontaneously emit them if not requested. If an implementation requests a verdict on an implementation aspect unknown to the compiler, the compiler will conservatively report that the aspect is required. The modular_format attribute provided by the code and the aspect references emitted by the compiler thus form a sort of two-phase handshake between the code and compiler.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So will any call to printf with a constant format specifier string be rewritten to call __modular_printf?

That's correct.

Good to know, thanks!

Also, who is responsible for writing these attributes? Are they only in the libc implementation, or can a user write one of these themselves on their own declarations? I'm asking because I wonder about compatibility; e.g., the call dispatches to __modular_printf but that doesn't know about some particular extension being used in the format specifier and so the code appears to misbehave.

Users could use these for their own implementations, in particular to allow functions that e.g. wrap vsnprintf to do logging etc. As for compatibility, if the compiler understands aspect names that the implementation doesn't, there's no issue, as the compiler will not spontaneously emit them if not requested. If an implementation requests a verdict on an implementation aspect unknown to the compiler, the compiler will conservatively report that the aspect is required. The modular_format attribute provided by the code and the aspect references emitted by the compiler thus form a sort of two-phase handshake between the code and compiler.

My concern is more about dispatching in ways the user may not anticipate and getting observably different behavior. e.g., the user calls printf("%I64d", 0LL) and they were getting the MSVC CRT printf call which supported that modifier but now calls __modular_printf which doesn't know about the modifier. What happens in that kind of situation?

Copy link
Contributor Author

@mysterymath mysterymath Jul 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern is more about dispatching in ways the user may not anticipate and getting observably different behavior. e.g., the user calls printf("%I64d", 0LL) and they were getting the MSVC CRT printf call which supported that modifier but now calls __modular_printf which doesn't know about the modifier. What happens in that kind of situation?

Ah, if I understand what you're getting at, that can't happen: it's explicitly out of scope for the feature.

The modular_format attribute exists to advertise to compiler that is compiling calls to a function that the implementation can be split by redirecting calls and emitting relocs to various symbols. A header file is the only plausible mechanism to tell the compiler this, and that means that the header would need to be provided by and intrinsically tied to a specific version of the implementation. Otherwise, it would be impossible to determine what aspects the implementation requires to be emitted to function correctly.

Accordingly, this feature would primarily be useful for cases where libc is statically linked in and paired with its own headers. (llvm-libc, various embedded libcs, etc.) I suppose it's technically possible to break out printf implementation parts into a family of individual dynamic libraries, but even then, any libc header set that required that the libc implementation be dynamically replaceable would not be able to include modular_format.

So, for implementations that use this feature, printf and __modular_printf would always be designed together. To avoid ever introducing two full printf implementations into the link, printf would be a thin wrapper around __modular_printf that also requests every possible aspect of the implementation. This would mean that the two could never diverge.

As an aside, this is my first time landing a RFC across so many components of LLVM. I wasn't sure how much detail to include in each change; my intuition was to try to provide links to the RFC instead. I don't want the above reasoning to get buried, and it gives me pause that it wasn't readily accessible. But I'm also not entirely sure where it should live going forward. Advice would be appreciated.

``printf("%f", 42.0)``. The latter would be accompanied with a strong
relocation against the symbol ``__printf_float``, which would bring floating
point support for ``printf`` into the link.

The following aspects are currently supported:

- ``float``: The call has a floating point argument
}];
}
3 changes: 3 additions & 0 deletions clang/include/clang/Basic/DiagnosticSemaKinds.td
Original file line number Diff line number Diff line change
Expand Up @@ -13063,6 +13063,9 @@ def err_get_vtable_pointer_requires_complete_type
: Error<"__builtin_get_vtable_pointer requires an argument with a complete "
"type, but %0 is incomplete">;

def err_modular_format_attribute_no_format
: Error<"'modular_format' attribute requires 'format' attribute">;

// SYCL-specific diagnostics
def warn_sycl_kernel_num_of_template_params : Warning<
"'sycl_kernel' attribute only applies to a function template with at least"
Expand Down
13 changes: 13 additions & 0 deletions clang/lib/CodeGen/CGCall.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2559,6 +2559,19 @@ void CodeGenModule::ConstructAttributeList(StringRef Name,

if (TargetDecl->hasAttr<ArmLocallyStreamingAttr>())
FuncAttrs.addAttribute("aarch64_pstate_sm_body");

if (auto *ModularFormat = TargetDecl->getAttr<ModularFormatAttr>()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Best I can tell, this is still only getting from ONE attribute. You probably have to do TargetDecl->specific_attrs<ModularFormatAttr> if you want to get aspects from ALL of them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made it so that attributes are merged together (trivially, allowing only duplicates), both across multiples per declaration and redeclarations, with the same semantics. That should allow getAttr, right?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getAttr will work if there is a single attribute only in the AST. It looks like that is the case, so this is fine.

FormatAttr *Format = TargetDecl->getAttr<FormatAttr>();
StringRef Type = Format->getType()->getName();
std::string FormatIdx = std::to_string(Format->getFormatIdx());
std::string FirstArg = std::to_string(Format->getFirstArg());
SmallVector<StringRef> Args = {
Type, FormatIdx, FirstArg,
ModularFormat->getModularImplFn()->getName(),
ModularFormat->getImplName()};
llvm::append_range(Args, ModularFormat->aspects());
FuncAttrs.addAttribute("modular-format", llvm::join(Args, ","));
}
}

// Attach "no-builtins" attributes to:
Expand Down
6 changes: 6 additions & 0 deletions clang/lib/Sema/SemaDecl.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -7217,6 +7217,11 @@ static void checkLifetimeBoundAttr(Sema &S, NamedDecl &ND) {
}
}

static void checkModularFormatAttr(Sema &S, NamedDecl &ND) {
if (ND.hasAttr<ModularFormatAttr>() && !ND.hasAttr<FormatAttr>())
S.Diag(ND.getLocation(), diag::err_modular_format_attribute_no_format);
}

static void checkAttributesAfterMerging(Sema &S, NamedDecl &ND) {
// Ensure that an auto decl is deduced otherwise the checks below might cache
// the wrong linkage.
Expand All @@ -7229,6 +7234,7 @@ static void checkAttributesAfterMerging(Sema &S, NamedDecl &ND) {
checkHybridPatchableAttr(S, ND);
checkInheritableAttr(S, ND);
checkLifetimeBoundAttr(S, ND);
checkModularFormatAttr(S, ND);
}

static void checkDLLAttributeRedeclaration(Sema &S, NamedDecl *OldDecl,
Expand Down
25 changes: 25 additions & 0 deletions clang/lib/Sema/SemaDeclAttr.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -6973,6 +6973,27 @@ static void handleVTablePointerAuthentication(Sema &S, Decl *D,
CustomDiscriminationValue));
}

static void handleModularFormat(Sema &S, Decl *D, const ParsedAttr &AL) {
StringRef ImplName;
if (!S.checkStringLiteralArgumentAttr(AL, 1, ImplName))
return;
SmallVector<StringRef> Aspects;
for (unsigned I = 2, E = AL.getNumArgs(); I != E; ++I) {
StringRef Aspect;
if (!S.checkStringLiteralArgumentAttr(AL, I, Aspect))
return;
Aspects.push_back(Aspect);
}

// Store aspects sorted and without duplicates.
llvm::sort(Aspects);
Aspects.erase(llvm::unique(Aspects), Aspects.end());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IS there a good reason not to diagnose dupes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, this was to allow a degree of tolerance for e.g. concatenation as set-union. One can define a preprocessor macro for a set of aspects, then union the aspects together by simple concatenation. Choosing such an interpretation of a corner case could only be desirable if there is exactly one plausible interpretation, and I think that's the case here. The only other non-failure behavior I could think of is having duplicated attributes remove themselves from the list, which is absurd.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What behavior should we expect if 2 modular_formats are added to the same declaration, does LLVM tolerate duplicated aspects? Should CG do some level of canonicalization?

As far as dropping duplicated attributes, I don't see a reason to keep one where the attribute is a duplicate of a previous one. IS there some behavior that should be expected that differs between:

[[clang::modular_format(<whatever>, <whatever>, "float", "float")]]

and
[[clang::modular_format(<whatever>, <whatever>, "float", "float")]][[clang::modular_format(<whatever>, <whatever>, "float", "float")]]

As-is, it looks like LLVM would get 2 'floats' in the second case, but only 1 in the first.

Copy link
Contributor Author

@mysterymath mysterymath Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point; I hadn't actually considered the two attribute case. My gut feeling is that we should disallow it with an error due to insufficient utility, and since we can always relax the error into some more subtle semantics later. And honestly that changes my mind about the duplicate-aspect case; I have no real evidence that concat-as-union is useful in practice, and it's more complex than just emitting an error. If we observe it to be useful, we can relax the error into that behavior if needed. So, I'll make both it and the two-attribute case errors.

Copy link
Contributor Author

@mysterymath mysterymath Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UGH, but! When I went looking through the codebase, it does appear that there's a surprising amount of convention for merging or even overwriting earlier attributes with later ones. I suppose this is just a part of the C tradition at this point: const const const int and what not. Advice would definitely be appreciated; this is my first time really designing an attribute. Otherwise, I'll just keep stewing on it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, depending on the attribute, we have a couple of strategies we use, but it is kind of up to the individual attribute which makes the most sense.

1- IN some cases, we just choose 'newest wins', others, 'oldest wins'. (Based on your code, you're doing one of these).

2- In other cases we attempt to merge to let them co-exist, but that sometimes requires some sort of diagnostics if the 'merge' can't happen nicely.

3- Some attributes we just reject the 2nd one.

IN THIS case, newest/oldest wins is the WRONG answer. Above, we were looking at 2, merging them. This CAN be a bit of a PITA (and is actually pretty reasonable to do in this same function, since you can pull the previous ones off the decl in this function, no need to actually merge them).

The other case is 3: ONLY allow 1 of these per declaration. This makes 'macro on a handful of these' not really work, but is the simplest. THIS is the strictest however, and could be relaxed later if necessary.

Copy link
Contributor Author

@mysterymath mysterymath Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The more I think about it, the less utility there seems to be in allowing these to be constructed piecemeal. Allowing only one per declaration seems reasonable, and it's pretty easy to rig up. I'll do that.

EDIT: It also seems okay to allow duplicates on the same decl with a warning; I'll do that too. But we definitely shouldn't allow conflicting ones, it's just too semantically nebulous for how this is expected to be used.

Also, thanks for spending the time on this! I'm learning a lot about the finer points of clang attributes; long overdue.


D->addAttr(::new (S.Context) ModularFormatAttr(
S.Context, AL, AL.getArgAsIdent(0)->getIdentifierInfo(), ImplName,
Aspects.data(), Aspects.size()));
}

//===----------------------------------------------------------------------===//
// Top Level Sema Entry Points
//===----------------------------------------------------------------------===//
Expand Down Expand Up @@ -7910,6 +7931,10 @@ ProcessDeclAttribute(Sema &S, Scope *scope, Decl *D, const ParsedAttr &AL,
case ParsedAttr::AT_VTablePointerAuthentication:
handleVTablePointerAuthentication(S, D, AL);
break;

case ParsedAttr::AT_ModularFormat:
handleModularFormat(S, D, AL);
break;
}
}

Expand Down
28 changes: 28 additions & 0 deletions clang/test/CodeGen/attr-modular-format.c
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
// RUN: %clang_cc1 -triple x86_64-unknown-unknown -emit-llvm %s -o - | FileCheck %s

int printf(const char *fmt, ...) __attribute__((modular_format(__modular_printf, "__printf", "float")));
int myprintf(const char *fmt, ...) __attribute__((modular_format(__modular_printf, "__printf", "float"), format(printf, 1, 2)));

// CHECK-LABEL: define dso_local void @test_inferred_format(
// CHECK: {{.*}} = call i32 (ptr, ...) @printf(ptr noundef @.str) #[[ATTR:[0-9]+]]
void test_inferred_format(void) {
printf("hello");
}

// CHECK-LABEL: define dso_local void @test_explicit_format(
// CHECK: {{.*}} = call i32 (ptr, ...) @myprintf(ptr noundef @.str) #[[ATTR:[0-9]+]]
void test_explicit_format(void) {
myprintf("hello");
}

int redecl(const char *fmt, ...) __attribute__((modular_format(__first_impl, "__first", "one"), format(printf, 1, 2)));
int redecl(const char *fmt, ...) __attribute__((modular_format(__second_impl, "__second", "two", "three")));

// CHECK-LABEL: define dso_local void @test_redecl(
// CHECK: {{.*}} = call i32 (ptr, ...) @redecl(ptr noundef @.str) #[[ATTR_REDECL:[0-9]+]]
void test_redecl(void) {
redecl("hello");
}

// CHECK: attributes #[[ATTR]] = { "modular-format"="printf,1,2,__modular_printf,__printf,float" }
// CHECK: attributes #[[ATTR_REDECL]] = { "modular-format"="printf,1,2,__second_impl,__second,three,two" }
Original file line number Diff line number Diff line change
Expand Up @@ -110,6 +110,7 @@
// CHECK-NEXT: Mips16 (SubjectMatchRule_function)
// CHECK-NEXT: MipsLongCall (SubjectMatchRule_function)
// CHECK-NEXT: MipsShortCall (SubjectMatchRule_function)
// CHECK-NEXT: ModularFormat (SubjectMatchRule_function)
// CHECK-NEXT: NSConsumed (SubjectMatchRule_variable_is_parameter)
// CHECK-NEXT: NSConsumesSelf (SubjectMatchRule_objc_method)
// CHECK-NEXT: NSErrorDomain (SubjectMatchRule_enum)
Expand Down
5 changes: 5 additions & 0 deletions clang/test/Sema/attr-modular-format.c
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
//RUN: %clang_cc1 -fsyntax-only -verify %s

int printf(const char *fmt, ...) __attribute__((modular_format(__modular_printf, "__printf", "float"))); // no-error
int myprintf(const char *fmt, ...) __attribute__((modular_format(__modular_printf, "__printf", "float"))); // expected-error {{'modular_format' attribute requires 'format' attribute}}