-
Notifications
You must be signed in to change notification settings - Fork 15.1k
[clang/LLVM] Add flatten_deep attribute for depth-limited inlining (1/2) #165777
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
@llvm/pr-subscribers-llvm-ir @llvm/pr-subscribers-clang Author: Grigory Pastukhov (grigorypas) ChangesIntroduces the The attribute takes a single unsigned integer argument representing the This is part 1 of 3. Future PRs will:
Full diff: https://github.com/llvm/llvm-project/pull/165777.diff 5 Files Affected:
diff --git a/clang/include/clang/Basic/Attr.td b/clang/include/clang/Basic/Attr.td
index 749f531ec9ab1..1ccd659e49e63 100644
--- a/clang/include/clang/Basic/Attr.td
+++ b/clang/include/clang/Basic/Attr.td
@@ -1984,6 +1984,13 @@ def Flatten : InheritableAttr {
let SimpleHandler = 1;
}
+def FlattenDeep : InheritableAttr {
+ let Spellings = [Clang<"flatten_deep">];
+ let Subjects = SubjectList<[Function], ErrorDiag>;
+ let Args = [UnsignedArgument<"MaxDepth">];
+ let Documentation = [FlattenDeepDocs];
+}
+
def Format : InheritableAttr {
let Spellings = [GCC<"format">];
let Args = [IdentifierArgument<"Type">, IntArgument<"FormatIdx">,
diff --git a/clang/include/clang/Basic/AttrDocs.td b/clang/include/clang/Basic/AttrDocs.td
index 2fdd041c1b46e..f4280531019f5 100644
--- a/clang/include/clang/Basic/AttrDocs.td
+++ b/clang/include/clang/Basic/AttrDocs.td
@@ -4032,6 +4032,29 @@ callee is unavailable or if the callee has the ``noinline`` attribute.
}];
}
+def FlattenDeepDocs : Documentation {
+ let Category = DocCatFunction;
+ let Content = [{
+The ``flatten_deep`` attribute causes calls within the attributed function and
+their transitive callees to be inlined up to a specified depth, unless it is
+impossible to do so (for example if the body of the callee is unavailable or if
+the callee has the ``noinline`` attribute).
+
+This attribute takes a single unsigned integer argument representing the maximum
+depth of the call tree to inline. For example, ``__attribute__((flatten_deep(3)))``
+will inline all calls within the function, then inline all calls within those
+inlined functions (depth 2), and then inline all calls within those functions
+(depth 3).
+
+.. code-block:: c++
+
+ __attribute__((flatten_deep(3)))
+ void process_data() {
+ // All calls up to 3 levels deep in the call tree will be inlined
+ }
+ }];
+}
+
def FormatDocs : Documentation {
let Category = DocCatFunction;
let Content = [{
diff --git a/clang/lib/Sema/SemaDeclAttr.cpp b/clang/lib/Sema/SemaDeclAttr.cpp
index 964a2a791e18f..1a78dfce6e1f3 100644
--- a/clang/lib/Sema/SemaDeclAttr.cpp
+++ b/clang/lib/Sema/SemaDeclAttr.cpp
@@ -3695,6 +3695,24 @@ static void handleInitPriorityAttr(Sema &S, Decl *D, const ParsedAttr &AL) {
D->addAttr(::new (S.Context) InitPriorityAttr(S.Context, AL, prioritynum));
}
+static void handleFlattenDeepAttr(Sema &S, Decl *D, const ParsedAttr &AL) {
+ Expr *E = AL.getArgAsExpr(0);
+ uint32_t maxDepth;
+ if (!S.checkUInt32Argument(AL, E, maxDepth)) {
+ AL.setInvalid();
+ return;
+ }
+
+ if (maxDepth == 0) {
+ S.Diag(AL.getLoc(), diag::err_attribute_argument_is_zero)
+ << AL << E->getSourceRange();
+ AL.setInvalid();
+ return;
+ }
+
+ D->addAttr(::new (S.Context) FlattenDeepAttr(S.Context, AL, maxDepth));
+}
+
ErrorAttr *Sema::mergeErrorAttr(Decl *D, const AttributeCommonInfo &CI,
StringRef NewUserDiagnostic) {
if (const auto *EA = D->getAttr<ErrorAttr>()) {
@@ -7236,6 +7254,9 @@ ProcessDeclAttribute(Sema &S, Scope *scope, Decl *D, const ParsedAttr &AL,
case ParsedAttr::AT_Format:
handleFormatAttr(S, D, AL);
break;
+ case ParsedAttr::AT_FlattenDeep:
+ handleFlattenDeepAttr(S, D, AL);
+ break;
case ParsedAttr::AT_FormatMatches:
handleFormatMatchesAttr(S, D, AL);
break;
diff --git a/clang/test/Misc/pragma-attribute-supported-attributes-list.test b/clang/test/Misc/pragma-attribute-supported-attributes-list.test
index ab4153a64f028..da6152dbff3a5 100644
--- a/clang/test/Misc/pragma-attribute-supported-attributes-list.test
+++ b/clang/test/Misc/pragma-attribute-supported-attributes-list.test
@@ -86,6 +86,7 @@
// CHECK-NEXT: ExternalSourceSymbol ((SubjectMatchRule_record, SubjectMatchRule_enum, SubjectMatchRule_enum_constant, SubjectMatchRule_field, SubjectMatchRule_function, SubjectMatchRule_namespace, SubjectMatchRule_objc_category, SubjectMatchRule_objc_implementation, SubjectMatchRule_objc_interface, SubjectMatchRule_objc_method, SubjectMatchRule_objc_property, SubjectMatchRule_objc_protocol, SubjectMatchRule_record, SubjectMatchRule_type_alias, SubjectMatchRule_variable))
// CHECK-NEXT: FlagEnum (SubjectMatchRule_enum)
// CHECK-NEXT: Flatten (SubjectMatchRule_function)
+// CHECK-NEXT: FlattenDeep (SubjectMatchRule_function)
// CHECK-NEXT: FunctionReturnThunks (SubjectMatchRule_function)
// CHECK-NEXT: GNUInline (SubjectMatchRule_function)
// CHECK-NEXT: HIPManaged (SubjectMatchRule_variable)
diff --git a/clang/test/Sema/attr-flatten-deep.c b/clang/test/Sema/attr-flatten-deep.c
new file mode 100644
index 0000000000000..92bc792424332
--- /dev/null
+++ b/clang/test/Sema/attr-flatten-deep.c
@@ -0,0 +1,14 @@
+// RUN: %clang_cc1 -fsyntax-only -verify %s
+
+// Test basic usage - valid
+__attribute__((flatten_deep(3)))
+void test_valid() {
+}
+
+// Test attribute on non-function - should error
+__attribute__((flatten_deep(3))) int x; // expected-error {{'flatten_deep' attribute only applies to functions}}
+
+// Test depth = 0 - should error (depth must be >= 1)
+__attribute__((flatten_deep(0))) // expected-error {{'flatten_deep' attribute must be greater than 0}}
+void test_depth_zero() {
+}
|
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
boomanaiden154
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What exactly is the motivation for this?
This also kind of changes the semantics of alwaysinline, which states the functions should be inlined always unless illegal. I guess this can be viewed as making inlining illegal in such circumstances, but a LangRef update would probably be good either way.
Can you please elaborate what do you mean by it "changes the semantics of alwaysinline"? I am introducing a new attribute flatten_deep both on clang side and LLVM side. |
|
I'm not sure this design is really practical for users. Trying to make the user count out the depth seems like it'll be difficult to use effectively: it's hard to count, and the user has no direct control over which calls are affected. And code refactoring is likely to break the intent. Could you describe a bit more what led you to this particular solution? |
Thank you for your feedback and for raising these concerns. |
You said patch 2 will update the alwaysinliner pass.
So you want to completely flatten functions but not completely flatten functions? What exactly is the use case of flattening these functions? |
Thank you for the feedback! Let me clarify the design:
|
Around
Whatever you want to call it, it doesn't impact correctness and is thus a form of cost heuristic. Whether it is a heuristic around runtime cost, code size cost, or compile time cost, it's still a cost heuristic.
This sounds like a nightmare to maintain. You probably want to remove it a new profile can be collected so the inliner can actually make proper decisions around cache pressure/call overhead removal/code folding due to inlining as this could very easily blow up icache pressure. Finding a good value that balances compile times and performance would also require tuning making maintenance even more difficult. |
Fair point on the max depth parameter - you're right that it's a cost heuristic in the sense that it's a compile-time cost consideration rather than a correctness issue. The key distinction I wanted to make is that it's not encoding runtime performance heuristics into the IR (like "inline if hot" or "inline if small"), but rather a compile-time safety mechanism. But I understand your characterization. Regarding the PGO and stale profile concerns - I appreciate the point about maintenance complexity. Let me provide some additional context on why this can be useful: In sampling-based PGO (AutoFDO) workflows, profiles are often collected continuously from production, which means they're inherently somewhat stale since they're based on previous builds. This is actually beneficial for reducing release latency and broader FDO adoption, but it does create a challenge when new functions are added to hot paths - we consistently see regressions in these cases. Without this feature, the only option is waiting for the next release cycle to get proper profile coverage, which can result in noticeable performance loss. For these scenarios, |
I'm familiar with how AutoFDO pipelines work and assumed that was your use case. I still don't think this is a very good solution. I feel like you're going to be much better off just rolling the profile. You can also selectively add I'm not going to block this, but I would like more agreement that this is a good idea and effective solution to the problem at hand, ideally through an RFC. |
This seems like a natural extension of existing flatten attribute.
I can see this being useful -- for some perf critical path, sometimes it's desirable to inline everything and avoid calls. Without such extension, this currently cannot be achieved reliably without side effects outside of a particular call tree. That said, I agree that using this to handle PGO stale profile may not be sustainable -- i won't do that. But disregard the PGO use case mentioned above, I think having the extension just for perf critical path is enough of a justification. FWIW, I've personally used existing flatten attempting to flatten the entire call tree for critical path, but only to realize it only inlines one level. I'd argue that the literal meaning of flatten is reduce to one level, not reduce by level. I think this extension at least would fill the gap for such use case to reduce to one level. |
This is going to be significantly worse at optimizing a hot function than the default inliner is going to be when using profile data. You're also inlining all of the cold call sites here, significantly increasing icache pressure, which will decrease performance. |
You are making assumption of how users use them. There are cases where adding alwaysinline would be worse than default inliner decision, if it's not used carefully. But alwaysinline is still a valuable tool provided by compiler to certain users. Same goes for this extension, the effectiveness depends on how and where users apply it. Not using it correctly can cause worse performance is not a reason to not give that tool to users, otherwise alwaysinline would not exist either. One example, using this for memory allocator fast path when PGO isn't available (e.g. pre-builds) is one of those cases can benefit from it. |
yuxuanchen1997
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally in favor of this change. Given how "normal" flatten is implemented (41af7c2fdc8cc), I think this patch provides a good foundation to move this feature into LLVM instead of attributing the calls in clang.
| def FlattenDeepDocs : Documentation { | ||
| let Category = DocCatFunction; | ||
| let Content = [{ | ||
| The ``flatten_deep`` attribute causes calls within the attributed function and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Grammatically, I am not a big fan of this name. It probably makes better sense if it's called "flatten_at_depth(3)".
| def FlattenDeep : InheritableAttr { | ||
| let Spellings = [Clang<"flatten_deep">]; | ||
| let Subjects = SubjectList<[Function], ErrorDiag>; | ||
| let Args = [UnsignedArgument<"MaxDepth">]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am actually wondering why we don't merge this with flatten. Is it because of compatibility with the GCC attribute? It seems possible to have an optional argument.
| if (const FlattenDeepAttr *FDA = FD->getAttr<FlattenDeepAttr>()) { | ||
| // Add the flatten_deep attribute with the max depth value as a typed int | ||
| // attribute | ||
| B.addRawIntAttr(llvm::Attribute::FlattenDeep, FDA->getMaxDepth()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addRawIntAttr doesn't seem to be widely used outside of llvm/IR/Attributes.cpp. Can you not use addFlattenDeepAttr?
I am not making an assumption about how users would use this. Not being able to take into account the hotness of a callsite when making inlining decisions will reduce performance compared to when you can take it into account.
These two statements seem to be in conflict to me. It's also not clear to me why adding |
|
Hi @boomanaiden154, I missed your last comment but please allow me to address some of your concerns.
This patch is not taking away the ability of the compiler to determining inline appropriateness based on the hotness of a callsite. This patch in my opinion is focused on one thing, that is to extend an existing compiler functionality - [[gnu::flatten]] already designed to do. Attributes are powerful tools where performance and capacity engineers use as part of their day job to experiment with different setups, independent from AutoFDO. Highly specialized code paths like one in malloc and C++ standard library have a decent amount of such attributes to hint compiler to do different things based on empirical evidence.
The fact that
I wouldn't say so. It would be an understatement to say it can be fulfilled by adding |
I never understood this patch as doing that.
I think those attributes are usually not very maintainable when used in standard libraries and we should rely on the inliner more. But I guess that's a separate issue.
Not using it to handle stale PGO profiles but using it to handle builds before PGO data is available is consistent to you? The solution to both is roll/collect a profile.
Why is this an understatement? Not sure what the exact motivating use case is, but looking at something actually hot, like I guess the implementation is not too big/complicated, so it's not that big of a deal if this goes in. I would like someone from outside of Meta to review this patch and agree that this is worth adding. |
|
alwaysinline can't always meet the needs -- it causes a particular function to be inlined into all callers, which is not selective based on calling context. What this does is make inlining happen under a particular call tree, and fully flatten a call tree. +@mtrofin hope you can see the merit of the extension -- we have real use case for it in our internal codebase where we don't want any calls for some paths and this is a simple, natural extension for existing flatten attribute. |
Introduces the
flatten_deepattribute as an extension to the existingflattenattribute. Whileflattenonly inlines immediate callsites,flatten_deepenables inlining of function calls and their transitivecallees up to a specified depth, effectively providing full flattening
of the call tree.
The attribute takes a single unsigned integer argument representing the
maximum call tree depth to inline. For example, flatten_deep(3) will
inline all calls within a function, then inline all calls within those
inlined functions (depth 2), and then inline all calls within those
functions (depth 3).
This is part 1 of 2. The next PR will implement inlining logic in the AlwaysInliner pass