Skip to content

Conversation

@phoebewang
Copy link
Contributor

Background: X86 APX feature adds 16 registers within the same 64-bit mode. PR #164638 is trying to extend such registers for FASTCC. However, a blocker issue is calling convention cannot be changeable with or without a feature.

The solution is to disable FASTCC if APX is not ready. This is an NFC change to the final code generation, becasue X86 doesn't define an alternative ABI for FASTCC in 64-bit mode. We can solve the potential compatibility issue of #164638 with this patch.

Background: X86 APX feature adds 16 registers within the same 64-bit
mode. PR llvm#164638 is trying to extend such registers for FASTCC. However,
a blocker issue is calling convention cannot be changeable with or
without a feature.

The solution is to disable FASTCC if APX is not ready. This is an NFC
change to the final code generation, becasue X86 doesn't define an
alternative ABI for FASTCC in 64-bit mode. We can solve the potential
compatibility issue of llvm#164638 with this patch.
@llvmbot llvmbot added backend:X86 llvm:analysis Includes value tracking, cost tables and constant folding llvm:transforms labels Oct 23, 2025
@llvmbot
Copy link
Member

llvmbot commented Oct 23, 2025

@llvm/pr-subscribers-llvm-ir
@llvm/pr-subscribers-clang
@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-backend-x86

Author: Phoebe Wang (phoebewang)

Changes

Background: X86 APX feature adds 16 registers within the same 64-bit mode. PR #164638 is trying to extend such registers for FASTCC. However, a blocker issue is calling convention cannot be changeable with or without a feature.

The solution is to disable FASTCC if APX is not ready. This is an NFC change to the final code generation, becasue X86 doesn't define an alternative ABI for FASTCC in 64-bit mode. We can solve the potential compatibility issue of #164638 with this patch.


Full diff: https://github.com/llvm/llvm-project/pull/164768.diff

9 Files Affected:

  • (modified) llvm/include/llvm/Analysis/TargetTransformInfo.h (+4)
  • (modified) llvm/include/llvm/Analysis/TargetTransformInfoImpl.h (+2)
  • (modified) llvm/lib/Analysis/TargetTransformInfo.cpp (+4)
  • (modified) llvm/lib/Target/X86/X86TargetTransformInfo.h (+4)
  • (modified) llvm/lib/Transforms/IPO/GlobalOpt.cpp (+9-6)
  • (modified) llvm/test/Transforms/GlobalOpt/null-check-is-use-pr35760.ll (+1-1)
  • (modified) llvm/test/Transforms/GlobalOpt/null-check-not-use-pr35760.ll (+1-1)
  • (modified) llvm/test/tools/gold/X86/merge-functions.ll (+2-2)
  • (modified) llvm/test/tools/gold/X86/unified-lto.ll (+2-2)
diff --git a/llvm/include/llvm/Analysis/TargetTransformInfo.h b/llvm/include/llvm/Analysis/TargetTransformInfo.h
index 5d3b233ed6b6a..f52fb448fc584 100644
--- a/llvm/include/llvm/Analysis/TargetTransformInfo.h
+++ b/llvm/include/llvm/Analysis/TargetTransformInfo.h
@@ -943,6 +943,10 @@ class TargetTransformInfo {
   ///  should use coldcc calling convention.
   LLVM_ABI bool useColdCCForColdCall(Function &F) const;
 
+  /// Return true if the input function is internal, should use fastcc calling
+  /// convention.
+  LLVM_ABI bool useFastCCForInternalCall(Function &F) const;
+
   LLVM_ABI bool isTargetIntrinsicTriviallyScalarizable(Intrinsic::ID ID) const;
 
   /// Identifies if the vector form of the intrinsic has a scalar operand.
diff --git a/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h b/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
index 4cd607c0d0c8d..064e28c504af4 100644
--- a/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
+++ b/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
@@ -431,6 +431,8 @@ class TargetTransformInfoImplBase {
 
   virtual bool useColdCCForColdCall(Function &F) const { return false; }
 
+  virtual bool useFastCCForInternalCall(Function &F) const { return true; }
+
   virtual bool isTargetIntrinsicTriviallyScalarizable(Intrinsic::ID ID) const {
     return false;
   }
diff --git a/llvm/lib/Analysis/TargetTransformInfo.cpp b/llvm/lib/Analysis/TargetTransformInfo.cpp
index bf62623099a97..dd65d8375828c 100644
--- a/llvm/lib/Analysis/TargetTransformInfo.cpp
+++ b/llvm/lib/Analysis/TargetTransformInfo.cpp
@@ -609,6 +609,10 @@ bool TargetTransformInfo::useColdCCForColdCall(Function &F) const {
   return TTIImpl->useColdCCForColdCall(F);
 }
 
+bool TargetTransformInfo::useFastCCForInternalCall(Function &F) const {
+  return TTIImpl->useFastCCForInternalCall(F);
+}
+
 bool TargetTransformInfo::isTargetIntrinsicTriviallyScalarizable(
     Intrinsic::ID ID) const {
   return TTIImpl->isTargetIntrinsicTriviallyScalarizable(ID);
diff --git a/llvm/lib/Target/X86/X86TargetTransformInfo.h b/llvm/lib/Target/X86/X86TargetTransformInfo.h
index 133b3668a46c8..609861a53a0a0 100644
--- a/llvm/lib/Target/X86/X86TargetTransformInfo.h
+++ b/llvm/lib/Target/X86/X86TargetTransformInfo.h
@@ -319,6 +319,10 @@ class X86TTIImpl final : public BasicTTIImplBase<X86TTIImpl> {
   unsigned getStoreMinimumVF(unsigned VF, Type *ScalarMemTy,
                              Type *ScalarValTy) const override;
 
+  bool useFastCCForInternalCall(Function &F) const override {
+    return !ST->is64Bit() || ST->hasEGPR();
+  }
+
 private:
   bool supportsGather() const;
   InstructionCost getGSVectorCost(unsigned Opcode, TTI::TargetCostKind CostKind,
diff --git a/llvm/lib/Transforms/IPO/GlobalOpt.cpp b/llvm/lib/Transforms/IPO/GlobalOpt.cpp
index 99c4982c58b47..1516a5bb7a6c2 100644
--- a/llvm/lib/Transforms/IPO/GlobalOpt.cpp
+++ b/llvm/lib/Transforms/IPO/GlobalOpt.cpp
@@ -2018,12 +2018,15 @@ OptimizeFunctions(Module &M,
 
     if (hasChangeableCC(&F, ChangeableCCCache)) {
       // If this function has a calling convention worth changing, is not a
-      // varargs function, and is only called directly, promote it to use the
-      // Fast calling convention.
-      F.setCallingConv(CallingConv::Fast);
-      ChangeCalleesToFastCall(&F);
-      ++NumFastCallFns;
-      Changed = true;
+      // varargs function, is only called directly, and is supported by the
+      // target, promote it to use the Fast calling convention.
+      TargetTransformInfo &TTI = GetTTI(F);
+      if (TTI.useFastCCForInternalCall(F)) {
+        F.setCallingConv(CallingConv::Fast);
+        ChangeCalleesToFastCall(&F);
+        ++NumFastCallFns;
+        Changed = true;
+      }
     }
 
     if (F.getAttributes().hasAttrSomewhere(Attribute::Nest) &&
diff --git a/llvm/test/Transforms/GlobalOpt/null-check-is-use-pr35760.ll b/llvm/test/Transforms/GlobalOpt/null-check-is-use-pr35760.ll
index 70923c547940c..4a0c93f09c7df 100644
--- a/llvm/test/Transforms/GlobalOpt/null-check-is-use-pr35760.ll
+++ b/llvm/test/Transforms/GlobalOpt/null-check-is-use-pr35760.ll
@@ -12,7 +12,7 @@ define dso_local i32 @main() {
 ; CHECK-LABEL: define {{[^@]+}}@main() local_unnamed_addr {
 ; CHECK-NEXT:  bb:
 ; CHECK-NEXT:    store ptr null, ptr @_ZL3g_i, align 8
-; CHECK-NEXT:    call fastcc void @_ZL13PutsSomethingv()
+; CHECK-NEXT:    call void @_ZL13PutsSomethingv()
 ; CHECK-NEXT:    ret i32 0
 ;
 bb:
diff --git a/llvm/test/Transforms/GlobalOpt/null-check-not-use-pr35760.ll b/llvm/test/Transforms/GlobalOpt/null-check-not-use-pr35760.ll
index a499fe1e4ad92..2b92d856d1848 100644
--- a/llvm/test/Transforms/GlobalOpt/null-check-not-use-pr35760.ll
+++ b/llvm/test/Transforms/GlobalOpt/null-check-not-use-pr35760.ll
@@ -15,7 +15,7 @@ define dso_local i32 @main() {
 ; CHECK-LABEL: define {{[^@]+}}@main() local_unnamed_addr {
 ; CHECK-NEXT:  bb:
 ; CHECK-NEXT:    store ptr null, ptr @_ZL3g_i, align 8
-; CHECK-NEXT:    call fastcc void @_ZL13PutsSomethingv()
+; CHECK-NEXT:    call void @_ZL13PutsSomethingv()
 ; CHECK-NEXT:    ret i32 0
 ;
 bb:
diff --git a/llvm/test/tools/gold/X86/merge-functions.ll b/llvm/test/tools/gold/X86/merge-functions.ll
index d4a49b1c40b47..296e7aa3f76f7 100644
--- a/llvm/test/tools/gold/X86/merge-functions.ll
+++ b/llvm/test/tools/gold/X86/merge-functions.ll
@@ -11,8 +11,8 @@
 
 ; Check that we've merged foo and bar
 ; CHECK:      define dso_local noundef i32 @main()
-; CHECK-NEXT:   tail call fastcc void @bar()
-; CHECK-NEXT:   tail call fastcc void @bar()
+; CHECK-NEXT:   tail call void @bar()
+; CHECK-NEXT:   tail call void @bar()
 
 target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
 target triple = "x86_64-pc-linux-gnu"
diff --git a/llvm/test/tools/gold/X86/unified-lto.ll b/llvm/test/tools/gold/X86/unified-lto.ll
index e5030e863a64a..24eb94a08de39 100644
--- a/llvm/test/tools/gold/X86/unified-lto.ll
+++ b/llvm/test/tools/gold/X86/unified-lto.ll
@@ -25,10 +25,10 @@
 ; Constant propagation is not supported by thin LTO.
 ; With full LTO we fold argument into constant 43
 ; CHECK:       define dso_local noundef i32 @main()
-; CHECK-NEXT:    tail call fastcc void @foo()
+; CHECK-NEXT:    tail call void @foo()
 ; CHECK-NEXT:    ret i32 43
 
-; CHECK:       define internal fastcc void @foo()
+; CHECK:       define internal void @foo()
 ; CHECK-NEXT:    store i32 43, ptr @_g, align 4
 
 ; ThinLTO doesn't import foo, because the latter has noinline attribute

@nikic nikic requested a review from efriedma-quic October 23, 2025 08:04
@@ -319,6 +319,10 @@ class X86TTIImpl final : public BasicTTIImplBase<X86TTIImpl> {
unsigned getStoreMinimumVF(unsigned VF, Type *ScalarMemTy,
Type *ScalarValTy) const override;

bool useFastCCForInternalCall(Function &F) const override {
return !ST->is64Bit() || ST->hasEGPR();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't you need to check that both caller and callee have EGPR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think single direction is enough. We can call a function without EGPR from EGPR enabled function, but not the opposite direction.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I don't really follow.

If we have no-EGPR -> EGPR, then the EGPR function may expect arguments to be passed in EGPR registers, while the no-EGPR function will push them to the stack.

If we have EGPR -> no-EGPR, then the EGPR function may pass arguments in EGPR registers, while the no-EGPR function will expect them to be on the stack.

Am I missing something here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I get the point. I was confused with the inlining logic.

@nikic
Copy link
Contributor

nikic commented Oct 23, 2025

Hm, I'm not sure I like this. Would it make sense to potentially have multiple fastcc calling conventions (e.g. fastcc_egpr) and have the hook select the best one? For this specific case (where fastcc doesn't do anything on the baseline target) this wouldn't make a difference, but this would also accommodate other cases in the future.

@phoebewang
Copy link
Contributor Author

Hm, I'm not sure I like this. Would it make sense to potentially have multiple fastcc calling conventions (e.g. fastcc_egpr) and have the hook select the best one? For this specific case (where fastcc doesn't do anything on the baseline target) this wouldn't make a difference, but this would also accommodate other cases in the future.

I don't think multiple fastcc calling conventions is useful, based on the fact that we cannot bind calling convention with a single feature. Assume we have multiple features in the future, the solution is still to disable fastcc instead of allowing multiple calling conventions coexist.

@llvmbot llvmbot added the clang Clang issues not falling into any other category label Oct 23, 2025
@efriedma-quic
Copy link
Collaborator

"fastcc" was originally invented in the context of targets with a single calling convention, but where that calling convention had significant limitations. Like, on 32-bit x86, the default "C" calling convention passes everything on the stack. So it's basically supposed to be a variation on the C calling convention: assuming the same underlying machine constraints, but ignoring bad decisions made when the ABI documents were originally written.

The x86-64 ABI developers learned those lessons, so fastcc is just the C calling convention; we didn't need to modify it.

If we have new architectural features, and we want a different calling convention for machines that have the feature, versus machines which don't have the feature, I think we need more specific names for those calling conventions. Having the C calling convention vary based on per-function target features is a constant source of problems for interprocedural optimizations. We don't want to extend those same problems to fastcc.

I think, if you want to pass arguments using APX registers, you should introduce a new "APX" calling convention. Probably from there, instead of a boolean useFastCCForInternalCall, you want a function CallingConv getOptimalCCForInternalCall(CallInst &CI), or something like that, to pick whatever calling convention the target prefers based on the available caller/callee target features, and maybe other heuristics.

@phoebewang
Copy link
Contributor Author

I don't fully agree with the points. If fastcc is just (and always) the C calling convention, it means its existence is meaningless. I agree with its original intention and a well designed ABI doesn't need it at all. But it is the exactly reason why we retargeting it for APX. We are solving (almost) the same problem: a single calling convention has its design limitations. The difference is just, for 32-bit, it's bad design at the beginning; for APX, the design cannot fit the new feature. So, IMO, using fastcc for APX is the best choice.

The way to use getOptimalCCForInternalCall looks like over engineering to me. I can't say for other targets, but I believe we don't have the need to introduce a second fastcc in 5 years on x86. Noting, we actually has a x86_fastcallcc that used for C's fastcall. We neither use it for fasscc as you suggested, nor invent a second fastcc for so many years.

@efriedma-quic
Copy link
Collaborator

I sort of understand the logic of "we need a calling convention, fastcc isn't useful on x86-64, let's repurpose it". But there's a significant benefit to giving a calling convention with unusual semantics (like, it only works on targets that have APX enabled) its own name; it's less confusing, and it cleanly allows future extensions for all targets. And introducing a new calling convention really isn't that much more work.

It isn't likely the set of integer registers on x86 will change again, but floating-point/vector registers have been getting constant updates.

@phoebewang
Copy link
Contributor Author

I sort of understand the logic of "we need a calling convention, fastcc isn't useful on x86-64, let's repurpose it". But there's a significant benefit to giving a calling convention with unusual semantics (like, it only works on targets that have APX enabled) its own name; it's less confusing, and it cleanly allows future extensions for all targets. And introducing a new calling convention really isn't that much more work.

I'm afraid I disagree. Notice the minor difference between X86_FastCall and Fast in X86CallingConv.td. Out of cautious, I think mixing up a internal only fastcc with an external named calling convention is not a good idea. A internal only calling convention has more freedom, since it's limited to the single CU. It dones't have backward and cross compilation compatibility constraints the traditional calling convention have.

It isn't likely the set of integer registers on x86 will change again, but floating-point/vector registers have been getting constant updates.

The current floating-point/vector calling convention uses 8 registers for argument passing, and they are less common than GPRs (since we use them to pass PTR and/or large type like i128 etc.). So I don't think it's a problem is short time. AVX512 has extended total register number from 16 to 32 for more than a decade. I don't see a request to change fastcc for it :)

Nevertheless, I have moved the caller check logic to target code, and kept 64-bit generating fastcc as is. Besides, it keeps the ability to be forward compatible to newer calling conventions and a target independent fastcc in the middle end. Please take another look. Thanks!

@zuban32
Copy link
Contributor

zuban32 commented Oct 29, 2025

I sort of understand the logic of "we need a calling convention, fastcc isn't useful on x86-64, let's repurpose it". But there's a significant benefit to giving a calling convention with unusual semantics (like, it only works on targets that have APX enabled) its own name; it's less confusing, and it cleanly allows future extensions for all targets. And introducing a new calling convention really isn't that much more work.

It isn't likely the set of integer registers on x86 will change again, but floating-point/vector registers have been getting constant updates.

@efriedma-quic I'm confused here a little bit, if fastcc is an internal LLVM convention which can only appear in the code when GlobalOpt considers it safe, why are we discussing it here as a sort of an established ABI that's been here for years?

Aren't it being confused here with MS fastcall which is x86_fastcall, entirely different beast?

From the LangRef:

“fastcc” - The fast calling convention
This calling convention attempts to make calls as fast as possible (e.g. by passing things in registers). This calling convention allows the target to use whatever tricks it wants to produce fast code for the target, without having to conform to an externally specified ABI (Application Binary Interface)

@efriedma-quic
Copy link
Collaborator

Nevertheless, I have moved the caller check logic to target code, and kept 64-bit generating fastcc as is. Besides, it keeps the ability to be forward compatible to newer calling conventions and a target independent fastcc in the middle end. Please take another look. Thanks!

So the rule is essentially, you can only use fastcc if the caller and the callee have the same target features? I guess that works, but if we're going that route, I'd like to document it, and enforce it in a more consistently.

@efriedma-quic I'm confused here a little bit, if fastcc is an internal LLVM convention which can only appear in the code when GlobalOpt considers it safe, why are we discussing it here as a sort of an established ABI that's been here for years?

The problem here isn't that the ABI has to stay the same across LLVM versions; we explicitly don't guarantee that. But I do care about other forms of consistency:

  • Consistency across targets. If a calling convention is available on all targets, it should have a similar meaning on all those targets.
  • Consistency across subtargets. If a calling convention shows up in multiple places in a module, it should have the same meaning in all of those places.
  • Autoupgrade. If you read bitcode generated by an old version of LLVM, does it still have the same meaning?

@phoebewang
Copy link
Contributor Author

So the rule is essentially, you can only use fastcc if the caller and the callee have the same target features?

Not necessarily the same feature. Target code can decide by itself. E.g., we only distinguish APX or not so far. If there are more in the future, we can classify them in legacy cluster, APX cluster, other new clusters etc.

if we're going that route, I'd like to document it, and enforce it in a more consistently.

Document updated.

But I do care about other forms of consistency:

I think 1&3 are not real problems. fastcc are internal only calling convention within a module. So we don't have the consistency problem;

2 only happens when users use __attribute__((target("xxxx"))), but user cannot use a xxxx if it doesn't exist at all. As long as we define a new fastcc implementation together when defining the new feature, there won't be a problem.

@phoebewang
Copy link
Contributor Author

Ping. @nikic @efriedma-quic I think I have addressed/explained to your concerns. Does it sound good to you? Do you have any other concerns?

@topperc
Copy link
Collaborator

topperc commented Nov 6, 2025

2 only happens when users use attribute((target("xxxx"))), but user cannot use a xxxx if it doesn't exist at all. As long as we define a new fastcc implementation together when defining the new feature, there won't be a problem.

It can also happen in LTO if different modules being linked together used different cpu or features on their command lines.

@phoebewang
Copy link
Contributor Author

2 only happens when users use attribute((target("xxxx"))), but user cannot use a xxxx if it doesn't exist at all. As long as we define a new fastcc implementation together when defining the new feature, there won't be a problem.

It can also happen in LTO if different modules being linked together used different cpu or features on their command lines.

Oh, yes. But the fastcc of functions across modules are computed during LTO. There's no consistency issue no matter partial or all generated by an old version of LLVM.

@zuban32
Copy link
Contributor

zuban32 commented Nov 6, 2025

Autoupgrade. If you read bitcode generated by an old version of LLVM, does it still have the same meaning?

I think 1&3 are not real problems. fastcc are internal only calling convention within a module.

I think technically it's still possible to have a problem if we later change the definition of the CC we're substituting fastcc with, and try to link with the older object file. That means the implementation of fastcc for that particular feature we're setting now should be set once and for all, never to be touched later.

And what if in the future we'll have another GPR extension on top of EGPR affecting calling conventions?. It's highly unlikely that one'd want to link against the old bitcode though.

@phoebewang
Copy link
Contributor Author

Autoupgrade. If you read bitcode generated by an old version of LLVM, does it still have the same meaning?

I think 1&3 are not real problems. fastcc are internal only calling convention within a module.

I think technically it's still possible to have a problem if we later change the definition of the CC we're substituting fastcc with, and try to link with the older object file. That means the implementation of fastcc for that particular feature we're setting now should be set once and for all, never to be touched later.

And what if in the future we'll have another GPR extension on top of EGPR affecting calling conventions?. It's highly unlikely that one'd want to link against the old bitcode though.

No, there's not a problem AFAICT. The internal means all the functions' call graph are limited to the same object file. Different CCs can coexist, because there's no interconnection between one and another. Bitcode is not a problem either, because it will be compiled with the new CC. Neverthless, we should not change it arbitrarily.

@zuban32
Copy link
Contributor

zuban32 commented Nov 6, 2025

A function could be external before LTO stage, and become internal only after LTO symbol resolution. I.e. we're linking against some older object file which has the function compiled with some older definition of the target feature, and the calls to that function in a new module would still become fastcc.

That's still quite an artificial case IMO.

@phoebewang
Copy link
Contributor Author

phoebewang commented Nov 7, 2025

A function could be external before LTO stage, and become internal only after LTO symbol resolution. I.e. we're linking against some older object file which has the function compiled with some older definition of the target feature, and the calls to that function in a new module would still become fastcc.

That's still quite an artificial case IMO.

LTO works on LLVM bitcode files, which must be (re-)compiled during linking. The LTO stage changes default CC to fastcc on both sides, and the backend fastcc lowering happens after it, so both will use the new fastcc. There's not a chance to link to the old protocol.

@zuban32
Copy link
Contributor

zuban32 commented Nov 7, 2025

A function could be external before LTO stage, and become internal only after LTO symbol resolution. I.e. we're linking against some older object file which has the function compiled with some older definition of the target feature, and the calls to that function in a new module would still become fastcc.
That's still quite an artificial case IMO.

LTO works on LLVM bitcode files, which must be (re-)compiled during linking. The LTO stage changes default CC to fastcc on both sides, and the backend fastcc lowering happens after it, so both will use the new fastcc. There's not a chance to link to the old protocol.

Ok, just let me describe what I'm talking about in more details, I'm not convinced that's impossible yet.

Assume we have two modules: mod1 containing foo and mod2 containing bar, where foo calls bar. Then

  1. We compile mod2 with -c -flto and F target feature enabled, and store it somewhere as some sort of a library (not a shared library, just the bitcode), e.g. some offload compilation builtin library as a part of a compiler package.
  2. After 2 months (assume in that time we have slightly changed the definition of the feature F) we compile mod1 with a new compiler and feature F still enabled, and try to link it with LTO enabled against that old mod2 module we compiled at step 1. Correct me if I'm wrong, but isn't the foo->bar call still going to be set to fastcc?

@phoebewang
Copy link
Contributor Author

A function could be external before LTO stage, and become internal only after LTO symbol resolution. I.e. we're linking against some older object file which has the function compiled with some older definition of the target feature, and the calls to that function in a new module would still become fastcc.
That's still quite an artificial case IMO.

LTO works on LLVM bitcode files, which must be (re-)compiled during linking. The LTO stage changes default CC to fastcc on both sides, and the backend fastcc lowering happens after it, so both will use the new fastcc. There's not a chance to link to the old protocol.

Ok, just let me describe what I'm talking about in more details, I'm not convinced that's impossible yet.

Assume we have two modules: mod1 containing foo and mod2 containing bar, where foo calls bar. Then

  1. We compile mod2 with -c -flto and F target feature enabled, and store it somewhere as some sort of a library (not a shared library, just the bitcode), e.g. some offload compilation builtin library as a part of a compiler package.

The key here is the definition (so as the lowering part) of the fastcc in old compiler doesn't affect bar in the bitcode, because it's a not yet executed stage, which only happens in the LTO stage with the new compiler. bar is in mod2, so it was kept default calling convention. It's not affected by the old fastcc conversion logic either.

  1. After 2 months (assume in that time we have slightly changed the definition of the feature F) we compile mod1 with a new compiler and feature F still enabled, and try to link it with LTO enabled against that old mod2 module we compiled at step 1. Correct me if I'm wrong, but isn't the foo->bar call still going to be set to fastcc?

I'm not sure what slightly change would be. I assume you mean the definition of its fastcc. As I explained above, the lowering part only happens in LTO with the new compiler. We are always using the same definition for foo and bar.

Any change of the feature F won't affect the set of fastcc given both foo and bar have the same F.

If you worry about the change in the target useFastCCForInternalCall. It's also safe, because its new implementation must match with the new fastcc limitation in the new compiler.

In a word. Bitcode files just carry the feature F on foo and bar. The specific lowering and whether or not to use fastcc are all decided by the new compiler. The difference in the old compiler has no effects to them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend:X86 clang Clang issues not falling into any other category llvm:analysis Includes value tracking, cost tables and constant folding llvm:ir llvm:transforms

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants