Skip to content

Conversation

@labrinea
Copy link
Collaborator

@labrinea labrinea commented Jul 23, 2025

Implements ARM-software/acle#404

This allows the user to specify "featA+featB;priority=[1-255]" where priority=255 means highest priority. If the explicit priority string is omitted then the priority of "featA+featB" is implied, which is lower than priority=1.

Internally this gets expanded using special FMV features P0 ... P7 which can encode up to 256-1 priority levels (excluding all zeros). Those do not have corresponding detection bit at pos FEAT_#enum so I made this field optional in FMVInfo. Also they don't affect the codegen or name mangling of versioned functions.

@llvmbot llvmbot added clang Clang issues not falling into any other category backend:ARM backend:AArch64 backend:RISC-V clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:codegen IR generation bugs: mangling, exceptions, etc. tablegen llvm:analysis Includes value tracking, cost tables and constant folding llvm:transforms labels Jul 23, 2025
@llvmbot
Copy link
Member

llvmbot commented Jul 23, 2025

@llvm/pr-subscribers-tablegen
@llvm/pr-subscribers-backend-risc-v
@llvm/pr-subscribers-clang-codegen

@llvm/pr-subscribers-backend-arm

Author: Alexandros Lamprineas (labrinea)

Changes

Implements ARM-software/acle#404

This allows the user to specify "priority=[1-32];featA+featB" where priority=31 means highest priority. If the explicit priority string is omitted then the priority of "featA+featB" is implied, which is lower than priority=1.

Internally this gets expanded using special FMV features P0 ... P5 which can encode up to 31 priority levels (excluding all zeros). Those do not have corresponding detection bit at pos FEAT_#enum so I made this field optional in FMVInfo. Also they don't affect the codegen or name mangling of versioned functions.


Patch is 39.33 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/150267.diff

23 Files Affected:

  • (modified) clang/include/clang/Basic/DiagnosticSemaKinds.td (+6)
  • (modified) clang/include/clang/Sema/SemaARM.h (+2-1)
  • (modified) clang/include/clang/Sema/SemaRISCV.h (+2-1)
  • (modified) clang/lib/CodeGen/Targets/AArch64.cpp (+4-3)
  • (modified) clang/lib/Sema/SemaARM.cpp (+54-8)
  • (modified) clang/lib/Sema/SemaDeclAttr.cpp (+4-3)
  • (modified) clang/lib/Sema/SemaRISCV.cpp (+3-1)
  • (modified) clang/test/AST/attr-target-version.c (+20-4)
  • (modified) clang/test/CodeGen/AArch64/fmv-duplicate-mangled-name.c (+16)
  • (added) clang/test/CodeGen/AArch64/fmv-explicit-priority.c (+193)
  • (modified) clang/test/Sema/attr-target-clones-aarch64.c (+10)
  • (modified) clang/test/Sema/attr-target-version.c (+9)
  • (modified) llvm/include/llvm/Analysis/TargetTransformInfo.h (+5-1)
  • (modified) llvm/include/llvm/Analysis/TargetTransformInfoImpl.h (+4)
  • (modified) llvm/include/llvm/TargetParser/AArch64FeatPriorities.inc (+7-1)
  • (modified) llvm/include/llvm/TargetParser/AArch64TargetParser.h (+7-4)
  • (modified) llvm/lib/Analysis/TargetTransformInfo.cpp (+4)
  • (modified) llvm/lib/Target/AArch64/AArch64FMV.td (+8)
  • (modified) llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp (+14-3)
  • (modified) llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h (+1)
  • (modified) llvm/lib/TargetParser/AArch64TargetParser.cpp (+20-11)
  • (modified) llvm/lib/Transforms/IPO/GlobalOpt.cpp (+15-7)
  • (modified) llvm/utils/TableGen/Basic/ARMTargetDefEmitter.cpp (+6-3)
diff --git a/clang/include/clang/Basic/DiagnosticSemaKinds.td b/clang/include/clang/Basic/DiagnosticSemaKinds.td
index b2ea65ae111be..05fecee73270f 100644
--- a/clang/include/clang/Basic/DiagnosticSemaKinds.td
+++ b/clang/include/clang/Basic/DiagnosticSemaKinds.td
@@ -12747,6 +12747,12 @@ def warn_target_clone_duplicate_options
 def warn_target_clone_no_impact_options
     : Warning<"version list contains entries that don't impact code generation">,
       InGroup<FunctionMultiVersioning>;
+def warn_version_priority_out_of_range
+    : Warning<"version priority '%0' is outside the allowed range [1-31]; ignoring priority">,
+      InGroup<FunctionMultiVersioning>;
+def warn_invalid_default_version_priority
+    : Warning<"priority of default version cannot be overridden; ignoring priority">,
+      InGroup<FunctionMultiVersioning>;
 
 // three-way comparison operator diagnostics
 def err_implied_comparison_category_type_not_found : Error<
diff --git a/clang/include/clang/Sema/SemaARM.h b/clang/include/clang/Sema/SemaARM.h
index e77d65f9362d8..1a2775d7b050e 100644
--- a/clang/include/clang/Sema/SemaARM.h
+++ b/clang/include/clang/Sema/SemaARM.h
@@ -92,7 +92,8 @@ class SemaARM : public SemaBase {
   /// false otherwise.
   bool areLaxCompatibleSveTypes(QualType FirstType, QualType SecondType);
 
-  bool checkTargetVersionAttr(const StringRef Str, const SourceLocation Loc);
+  bool checkTargetVersionAttr(const StringRef Param, const SourceLocation Loc,
+                              SmallString<64> &NewParam);
   bool checkTargetClonesAttr(SmallVectorImpl<StringRef> &Params,
                              SmallVectorImpl<SourceLocation> &Locs,
                              SmallVectorImpl<SmallString<64>> &NewParams);
diff --git a/clang/include/clang/Sema/SemaRISCV.h b/clang/include/clang/Sema/SemaRISCV.h
index 844cc3ce4a440..863b8a143f48a 100644
--- a/clang/include/clang/Sema/SemaRISCV.h
+++ b/clang/include/clang/Sema/SemaRISCV.h
@@ -56,7 +56,8 @@ class SemaRISCV : public SemaBase {
 
   std::unique_ptr<sema::RISCVIntrinsicManager> IntrinsicManager;
 
-  bool checkTargetVersionAttr(const StringRef Param, const SourceLocation Loc);
+  bool checkTargetVersionAttr(const StringRef Param, const SourceLocation Loc,
+                              SmallString<64> &NewParam);
   bool checkTargetClonesAttr(SmallVectorImpl<StringRef> &Params,
                              SmallVectorImpl<SourceLocation> &Locs,
                              SmallVectorImpl<SmallString<64>> &NewParams);
diff --git a/clang/lib/CodeGen/Targets/AArch64.cpp b/clang/lib/CodeGen/Targets/AArch64.cpp
index b82c46966cf0b..f451562b86ec3 100644
--- a/clang/lib/CodeGen/Targets/AArch64.cpp
+++ b/clang/lib/CodeGen/Targets/AArch64.cpp
@@ -1337,9 +1337,10 @@ void AArch64ABIInfo::appendAttributeMangling(StringRef AttrStr,
 
   llvm::SmallDenseSet<StringRef, 8> UniqueFeats;
   for (auto &Feat : Features)
-    if (auto Ext = llvm::AArch64::parseFMVExtension(Feat))
-      if (UniqueFeats.insert(Ext->Name).second)
-        Out << 'M' << Ext->Name;
+    if (getTarget().doesFeatureAffectCodeGen(Feat))
+      if (auto Ext = llvm::AArch64::parseFMVExtension(Feat))
+        if (UniqueFeats.insert(Ext->Name).second)
+          Out << 'M' << Ext->Name;
 }
 
 std::unique_ptr<TargetCodeGenInfo>
diff --git a/clang/lib/Sema/SemaARM.cpp b/clang/lib/Sema/SemaARM.cpp
index 8e27fabccd583..d3110dd18e927 100644
--- a/clang/lib/Sema/SemaARM.cpp
+++ b/clang/lib/Sema/SemaARM.cpp
@@ -1535,19 +1535,52 @@ bool SemaARM::areLaxCompatibleSveTypes(QualType FirstType,
          IsLaxCompatible(SecondType, FirstType);
 }
 
+static void appendFeature(StringRef Feat, SmallString<64> &Buffer) {
+  if (!Buffer.empty())
+    Buffer.append("+");
+  Buffer.append(Feat);
+}
+
+static void convertPriorityString(unsigned Priority,
+                                  SmallString<64> &NewParam) {
+  StringRef PriorityString[5] = {"P0", "P1", "P2", "P3", "P4"};
+
+  assert(Priority > 0 && Priority < 32 && "priority out of range");
+  // Convert priority=[1-31] -> P0 + ... + P4
+  for (unsigned BitPos = 0; BitPos < 5; ++BitPos)
+    if (Priority & (1U << BitPos))
+      appendFeature(PriorityString[BitPos], NewParam);
+}
+
 bool SemaARM::checkTargetVersionAttr(const StringRef Param,
-                                     const SourceLocation Loc) {
+                                     const SourceLocation Loc,
+                                     SmallString<64> &NewParam) {
   using namespace DiagAttrParams;
 
+  auto [LHS, RHS] = Param.split(';');
+  bool IsDefault = false;
   llvm::SmallVector<StringRef, 8> Features;
-  Param.split(Features, '+');
+  LHS.split(Features, '+');
   for (StringRef Feat : Features) {
     Feat = Feat.trim();
     if (Feat == "default")
-      continue;
-    if (!getASTContext().getTargetInfo().validateCpuSupports(Feat))
+      IsDefault = true;
+    else if (!getASTContext().getTargetInfo().validateCpuSupports(Feat))
       return Diag(Loc, diag::warn_unsupported_target_attribute)
              << Unsupported << None << Feat << TargetVersion;
+    appendFeature(Feat, NewParam);
+  }
+
+  if (!RHS.empty() && RHS.consume_front("priority=")) {
+    if (IsDefault)
+      Diag(Loc, diag::warn_invalid_default_version_priority);
+    else {
+      unsigned Digit;
+      if (RHS.getAsInteger(0, Digit) || Digit < 1 || Digit > 31)
+        Diag(Loc, diag::warn_version_priority_out_of_range) << RHS;
+      else
+        convertPriorityString(Digit, NewParam);
+    }
   }
   return false;
 }
@@ -1569,15 +1602,20 @@ bool SemaARM::checkTargetClonesAttr(
     const StringRef Param = Params[I].trim();
     const SourceLocation &Loc = Locs[I];
 
-    if (Param.empty())
+    auto [LHS, RHS] = Param.split(';');
+    bool HasPriority = !RHS.empty() && RHS.consume_front("priority=");
+
+    if (LHS.empty())
       return Diag(Loc, diag::warn_unsupported_target_attribute)
              << Unsupported << None << "" << TargetClones;
 
-    if (Param == "default") {
+    if (LHS == "default") {
       if (HasDefault)
         Diag(Loc, diag::warn_target_clone_duplicate_options);
       else {
-        NewParams.push_back(Param);
+        if (HasPriority)
+          Diag(Loc, diag::warn_invalid_default_version_priority);
+        NewParams.push_back(LHS);
         HasDefault = true;
       }
       continue;
@@ -1586,7 +1624,7 @@ bool SemaARM::checkTargetClonesAttr(
     bool HasCodeGenImpact = false;
     llvm::SmallVector<StringRef, 8> Features;
     llvm::SmallVector<StringRef, 8> ValidFeatures;
-    Param.split(Features, '+');
+    LHS.split(Features, '+');
     for (StringRef Feat : Features) {
       Feat = Feat.trim();
       if (!getASTContext().getTargetInfo().validateCpuSupports(Feat)) {
@@ -1616,6 +1654,14 @@ bool SemaARM::checkTargetClonesAttr(
       continue;
     }
 
+    if (HasPriority) {
+      unsigned Digit;
+      if (RHS.getAsInteger(0, Digit) || Digit < 1 || Digit > 31)
+        Diag(Loc, diag::warn_version_priority_out_of_range) << RHS;
+      else
+        convertPriorityString(Digit, NewParam);
+    }
+
     // Valid non-default argument.
     NewParams.push_back(NewParam);
     HasNonDefault = true;
diff --git a/clang/lib/Sema/SemaDeclAttr.cpp b/clang/lib/Sema/SemaDeclAttr.cpp
index 9a2950cf1648e..46f4bab9edc71 100644
--- a/clang/lib/Sema/SemaDeclAttr.cpp
+++ b/clang/lib/Sema/SemaDeclAttr.cpp
@@ -3333,19 +3333,20 @@ bool Sema::checkTargetAttr(SourceLocation LiteralLoc, StringRef AttrStr) {
 static void handleTargetVersionAttr(Sema &S, Decl *D, const ParsedAttr &AL) {
   StringRef Param;
   SourceLocation Loc;
+  SmallString<64> NewParam;
   if (!S.checkStringLiteralArgumentAttr(AL, 0, Param, &Loc))
     return;
 
   if (S.Context.getTargetInfo().getTriple().isAArch64()) {
-    if (S.ARM().checkTargetVersionAttr(Param, Loc))
+    if (S.ARM().checkTargetVersionAttr(Param, Loc, NewParam))
       return;
   } else if (S.Context.getTargetInfo().getTriple().isRISCV()) {
-    if (S.RISCV().checkTargetVersionAttr(Param, Loc))
+    if (S.RISCV().checkTargetVersionAttr(Param, Loc, NewParam))
       return;
   }
 
   TargetVersionAttr *NewAttr =
-      ::new (S.Context) TargetVersionAttr(S.Context, AL, Param);
+      ::new (S.Context) TargetVersionAttr(S.Context, AL, NewParam);
   D->addAttr(NewAttr);
 }
 
diff --git a/clang/lib/Sema/SemaRISCV.cpp b/clang/lib/Sema/SemaRISCV.cpp
index 994cd07c1e263..bb91bd7aefbb4 100644
--- a/clang/lib/Sema/SemaRISCV.cpp
+++ b/clang/lib/Sema/SemaRISCV.cpp
@@ -1636,7 +1636,8 @@ bool SemaRISCV::isValidFMVExtension(StringRef Ext) {
 }
 
 bool SemaRISCV::checkTargetVersionAttr(const StringRef Param,
-                                       const SourceLocation Loc) {
+                                       const SourceLocation Loc,
+                                       SmallString<64> &NewParam) {
   using namespace DiagAttrParams;
 
   llvm::SmallVector<StringRef, 8> AttrStrs;
@@ -1682,6 +1683,7 @@ bool SemaRISCV::checkTargetVersionAttr(const StringRef Param,
     return Diag(Loc, diag::warn_unsupported_target_attribute)
            << Unsupported << None << Param << TargetVersion;
 
+  NewParam = Param;
   return false;
 }
 
diff --git a/clang/test/AST/attr-target-version.c b/clang/test/AST/attr-target-version.c
index b537f5e685a31..c7b83bef1b91b 100644
--- a/clang/test/AST/attr-target-version.c
+++ b/clang/test/AST/attr-target-version.c
@@ -2,7 +2,23 @@
 
 int __attribute__((target_version("sve2-bitperm + sha2"))) foov(void) { return 1; }
 int __attribute__((target_clones(" lse + fp + sha3 ", "default"))) fooc(void) { return 2; }
-// CHECK: TargetVersionAttr
-// CHECK: sve2-bitperm + sha2
-// CHECK: TargetClonesAttr
-// CHECK: fp+lse+sha3 default
+
+int __attribute__((target_version("aes;priority=1"))) explicit_priority(void) { return 1; }
+int __attribute__((target_version("bf16;priority=2"))) explicit_priority(void) { return 2; }
+int __attribute__((target_version("crc;priority=4"))) explicit_priority(void) { return 4; }
+int __attribute__((target_version("dpb2;priority=8"))) explicit_priority(void) { return 8; }
+int __attribute__((target_version("fp16fml;priority=16"))) explicit_priority(void) { return 16; }
+
+int __attribute__((target_clones("simd;priority=31", "default"))) explicit_priority(void) {
+  return 0;
+}
+
+// CHECK: TargetVersionAttr {{.*}} "sve2-bitperm+sha2"
+// CHECK: TargetClonesAttr {{.*}} fp+lse+sha3 default
+
+// CHECK: TargetVersionAttr {{.*}} "aes+P0"
+// CHECK: TargetVersionAttr {{.*}} "bf16+P1"
+// CHECK: TargetVersionAttr {{.*}} "crc+P2"
+// CHECK: TargetVersionAttr {{.*}} "dpb2+P3"
+// CHECK: TargetVersionAttr {{.*}} "fp16fml+P4"
+// CHECK: TargetClonesAttr {{.*}} simd+P0+P1+P2+P3+P4 default
diff --git a/clang/test/CodeGen/AArch64/fmv-duplicate-mangled-name.c b/clang/test/CodeGen/AArch64/fmv-duplicate-mangled-name.c
index e7e611e09542e..ebe5b75cf7946 100644
--- a/clang/test/CodeGen/AArch64/fmv-duplicate-mangled-name.c
+++ b/clang/test/CodeGen/AArch64/fmv-duplicate-mangled-name.c
@@ -1,5 +1,7 @@
 // RUN: %clang_cc1 -triple aarch64-linux-gnu -verify -emit-llvm-only %s -DCHECK_IMPLICIT_DEFAULT
 // RUN: %clang_cc1 -triple aarch64-linux-gnu -verify -emit-llvm-only %s -DCHECK_EXPLICIT_DEFAULT
+// RUN: %clang_cc1 -triple aarch64-linux-gnu -verify -emit-llvm-only %s -DCHECK_EXPLICIT_VERSION_PRIORITY
+// RUN: %clang_cc1 -triple aarch64-linux-gnu -verify -emit-llvm-only %s -DCHECK_EXPLICIT_CLONES_PRIORITY
 
 #if defined(CHECK_IMPLICIT_DEFAULT)
 
@@ -21,4 +23,18 @@ __attribute__((target_version("default"))) int explicit_default_bad(void) { retu
 // expected-note@-2 {{previous definition is here}}
 __attribute__((target_clones("aes", "lse", "default"))) int explicit_default_bad(void) { return 1; }
 
+#elif defined(CHECK_EXPLICIT_VERSION_PRIORITY)
+
+__attribute__((target_version("aes"))) int explicit_version_priority(void) { return 0; }
+// expected-error@+2 {{definition with same mangled name 'explicit_version_priority._Maes' as another definition}}
+// expected-note@-2 {{previous definition is here}}
+__attribute__((target_version("aes;priority=10"))) int explicit_version_priority(void) { return 1; }
+
+#elif defined(CHECK_EXPLICIT_CLONES_PRIORITY)
+
+__attribute__((target_version("aes;priority=20"))) int explicit_clones_priority(void) { return 0; }
+// expected-error@+2 {{definition with same mangled name 'explicit_clones_priority._Maes' as another definition}}
+// expected-note@-2 {{previous definition is here}}
+__attribute__((target_clones("aes;priority=5", "lse"))) int explicit_clones_priority(void) { return 1; }
+
 #endif
diff --git a/clang/test/CodeGen/AArch64/fmv-explicit-priority.c b/clang/test/CodeGen/AArch64/fmv-explicit-priority.c
new file mode 100644
index 0000000000000..437221c95542b
--- /dev/null
+++ b/clang/test/CodeGen/AArch64/fmv-explicit-priority.c
@@ -0,0 +1,193 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --function-signature --check-attributes --check-globals --include-generated-funcs
+// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -O3 -fno-inline -emit-llvm -o - %s | FileCheck %s
+
+__attribute__((target_version("lse;priority=30"))) int foo(void) { return 1; }
+__attribute__((target_version("sve2;priority=20"))) int foo(void) { return 2; }
+__attribute__((target_version("sve;priority=10"))) int foo(void) { return 3; }
+__attribute__((target_version( "default"))) int foo(void) { return 0; }
+
+__attribute__((target_clones("lse+sve2;priority=3", "lse;priority=2", "sve;priority=1", "default")))
+int fmv_caller(void) { return foo(); }
+
+
+__attribute__((target_version("aes"))) int bar(void) { return 1; }
+__attribute__((target_version("sm4;priority=5"))) int bar(void) { return 2; }
+__attribute__((target_version("default"))) int bar(void) { return 0; }
+
+__attribute__((target("aes"))) int regular_caller_aes() { return bar(); }
+__attribute__((target("sm4"))) int regular_caller_sm4() { return bar(); }
+//.
+// CHECK: @__aarch64_cpu_features = external dso_local local_unnamed_addr global { i64 }
+// CHECK: @foo = weak_odr ifunc i32 (), ptr @foo.resolver
+// CHECK: @fmv_caller = weak_odr ifunc i32 (), ptr @fmv_caller.resolver
+// CHECK: @bar = weak_odr ifunc i32 (), ptr @bar.resolver
+//.
+// CHECK: Function Attrs: mustprogress nofree noinline norecurse nosync nounwind willreturn memory(none)
+// CHECK-LABEL: define {{[^@]+}}@foo._Mlse
+// CHECK-SAME: () #[[ATTR0:[0-9]+]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    ret i32 1
+//
+//
+// CHECK: Function Attrs: mustprogress nofree noinline norecurse nosync nounwind willreturn memory(none) vscale_range(1,16)
+// CHECK-LABEL: define {{[^@]+}}@foo._Msve2
+// CHECK-SAME: () #[[ATTR1:[0-9]+]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    ret i32 2
+//
+//
+// CHECK: Function Attrs: mustprogress nofree noinline norecurse nosync nounwind willreturn memory(none) vscale_range(1,16)
+// CHECK-LABEL: define {{[^@]+}}@foo._Msve
+// CHECK-SAME: () #[[ATTR2:[0-9]+]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    ret i32 3
+//
+//
+// CHECK: Function Attrs: mustprogress nofree noinline norecurse nosync nounwind willreturn memory(none)
+// CHECK-LABEL: define {{[^@]+}}@foo.default
+// CHECK-SAME: () #[[ATTR3:[0-9]+]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    ret i32 0
+//
+//
+// CHECK: Function Attrs: mustprogress nofree noinline norecurse nosync nounwind willreturn memory(none) vscale_range(1,16)
+// CHECK-LABEL: define {{[^@]+}}@fmv_caller._MlseMsve2
+// CHECK-SAME: () #[[ATTR4:[0-9]+]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[CALL:%.*]] = tail call i32 @foo._Mlse()
+// CHECK-NEXT:    ret i32 [[CALL]]
+//
+//
+// CHECK: Function Attrs: mustprogress nofree noinline norecurse nosync nounwind willreturn memory(none) vscale_range(1,16)
+// CHECK-LABEL: define {{[^@]+}}@fmv_caller._Mlse
+// CHECK-SAME: () #[[ATTR5:[0-9]+]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[CALL:%.*]] = tail call i32 @foo._Mlse()
+// CHECK-NEXT:    ret i32 [[CALL]]
+//
+//
+// CHECK: Function Attrs: noinline nounwind vscale_range(1,16)
+// CHECK-LABEL: define {{[^@]+}}@fmv_caller._Msve
+// CHECK-SAME: () #[[ATTR6:[0-9]+]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[CALL:%.*]] = tail call i32 @foo() #[[ATTR12:[0-9]+]]
+// CHECK-NEXT:    ret i32 [[CALL]]
+//
+//
+// CHECK: Function Attrs: mustprogress nofree noinline norecurse nosync nounwind willreturn memory(none) vscale_range(1,16)
+// CHECK-LABEL: define {{[^@]+}}@fmv_caller.default
+// CHECK-SAME: () #[[ATTR7:[0-9]+]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[CALL:%.*]] = tail call i32 @foo.default()
+// CHECK-NEXT:    ret i32 [[CALL]]
+//
+//
+// CHECK: Function Attrs: mustprogress nofree noinline norecurse nosync nounwind willreturn memory(none)
+// CHECK-LABEL: define {{[^@]+}}@bar._Maes
+// CHECK-SAME: () #[[ATTR8:[0-9]+]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    ret i32 1
+//
+//
+// CHECK: Function Attrs: mustprogress nofree noinline norecurse nosync nounwind willreturn memory(none)
+// CHECK-LABEL: define {{[^@]+}}@bar._Msm4
+// CHECK-SAME: () #[[ATTR9:[0-9]+]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    ret i32 2
+//
+//
+// CHECK: Function Attrs: mustprogress nofree noinline norecurse nosync nounwind willreturn memory(none)
+// CHECK-LABEL: define {{[^@]+}}@bar.default
+// CHECK-SAME: () #[[ATTR3]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    ret i32 0
+//
+//
+// CHECK: Function Attrs: noinline nounwind
+// CHECK-LABEL: define {{[^@]+}}@regular_caller_aes
+// CHECK-SAME: () local_unnamed_addr #[[ATTR10:[0-9]+]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[CALL:%.*]] = tail call i32 @bar() #[[ATTR12]]
+// CHECK-NEXT:    ret i32 [[CALL]]
+//
+//
+// CHECK: Function Attrs: mustprogress nofree noinline norecurse nosync nounwind willreturn memory(none)
+// CHECK-LABEL: define {{[^@]+}}@regular_caller_sm4
+// CHECK-SAME: () local_unnamed_addr #[[ATTR11:[0-9]+]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[CALL:%.*]] = tail call i32 @bar._Msm4()
+// CHECK-NEXT:    ret i32 [[CALL]]
+//
+//
+// CHECK-LABEL: define {{[^@]+}}@foo.resolver() comdat {
+// CHECK-NEXT:  resolver_entry:
+// CHECK-NEXT:    tail call void @__init_cpu_features_resolver()
+// CHECK-NEXT:    [[TMP0:%.*]] = load i64, ptr @__aarch64_cpu_features, align 8
+// CHECK-NEXT:    [[TMP1:%.*]] = and i64 [[TMP0]], 128
+// CHECK-NEXT:    [[DOTNOT:%.*]] = icmp eq i64 [[TMP1]], 0
+// CHECK-NEXT:    br i1 [[DOTNOT]], label [[RESOLVER_ELSE:%.*]], label [[COMMON_RET:%.*]]
+// CHECK:       common.ret:
+// CHECK-NEXT:    [[COMMON_RET_OP:%.*]] = phi ptr [ @foo._Mlse, [[RESOLVER_ENTRY:%.*]] ], [ @foo._Msve2, [[RESOLVER_ELSE]] ], [ [[FOO__MSVE_FOO_DEFAULT:%.*]], [[RESOLVER_ELSE2:%.*]] ]
+// CHECK-NEXT:    ret ptr [[COMMON_RET_OP]]
+// CHECK:       resolver_else:
+// CHECK-NEXT:    [[TMP2:%.*]] = and i64 [[TMP0]], 69793284352
+// CHECK-NEXT:    [[TMP3:%.*]] = icmp eq i64 [[TMP2]], 69793284352
+// CHECK-NEXT:    br i1 [[TMP3]], label [[COMMON_RET]], label [[RESOLVER_ELSE2]]
+// CHECK:       resolver_else2:
+// CHECK-NEXT:    [[TMP4:%.*]] = and i64 [[TMP0]], 1073807616
+// CHECK-NEXT:    [[TMP5:%.*]] = icmp eq i64 [[TMP4]], 1073807616
+// CHECK-NEXT:    [[FOO__MSVE_FOO_DEFAULT]] = select i1 [[TMP5]], ptr @foo._Msve, ptr @foo.default
+// CHECK-NEXT:    br label [[COMMON_RET]]
+//
+//
+// CHECK-LABEL: define {{[^@]+}}@fmv_caller.resolver() comdat {
+// CHECK-NEXT:  resolver_entry:
+// CHECK-NEXT:    tail call void @__init_cpu_features_resolver()
+// CHECK-NEXT:    [[TMP0:%.*]] = load i64, ptr @__aarch64_cpu_features, align 8
+// CHECK-NEXT:    [[TMP1:%.*]] = and i64 [[TMP0]], 69793284480
+// CHECK-NEXT:    [[TMP2:%.*]] = icmp eq i64 [[TMP1]], 69793284480
+// CHECK-NEXT:    br i1 [[TMP2]], label [[COMMON_RET:%.*]], label [[RESOLVER_ELSE:%.*]]
+// CHECK:       common.ret:
+// CHECK-NEXT:    [[COMMON_RET_OP:%.*]] = phi ptr [ @fmv_caller._MlseMsve2, [[RESOLVER_ENTRY:%.*]] ], [ @fmv_caller._Mlse, [[RESOLVER_ELSE]] ], [ [[FMV_CALLER__MSVE_FMV_CALLER_DEFAULT:%.*]], [[RESOLVER_ELSE2:%.*]] ]
+// CHECK-NEXT:    ret ptr [[COMMON_RET_OP]]
+// CHECK:       resolver_else:
+// CHECK-NEXT:    [[TMP3:%.*]] = and i64 [[TMP0]], 128
+// CHECK-NEXT:    [[DOTNOT:%.*]] = icmp eq i64 [[TMP3]], 0
+// CHECK-NEXT:    br i1 [[DOTNOT]], label [[RESOLVER_ELSE2]], label [[COMMON_RET]]
+// CHECK:       resolver_else2:
+// CHECK-NEXT:    [[TMP4:%.*]] = and i64 [[TMP0]], 1073807616
+// CHECK-NEXT:    [[TMP5:%.*]] = icmp eq i64 [[TMP4]], 1073807616
+// CHECK-NEXT:    [[FMV_CALLER__MSVE_FMV_CALLER_DEFAULT]] = select i1 [[TMP5]], ptr @fmv_call...
[truncated]

@llvmbot
Copy link
Member

llvmbot commented Jul 23, 2025

@llvm/pr-subscribers-llvm-transforms

Author: Alexandros Lamprineas (labrinea)

Changes

Implements ARM-software/acle#404

This allows the user to specify "priority=[1-32];featA+featB" where priority=31 means highest priority. If the explicit priority string is omitted then the priority of "featA+featB" is implied, which is lower than priority=1.

Internally this gets expanded using special FMV features P0 ... P5 which can encode up to 31 priority levels (excluding all zeros). Those do not have corresponding detection bit at pos FEAT_#enum so I made this field optional in FMVInfo. Also they don't affect the codegen or name mangling of versioned functions.


Patch is 39.33 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/150267.diff

23 Files Affected:

  • (modified) clang/include/clang/Basic/DiagnosticSemaKinds.td (+6)
  • (modified) clang/include/clang/Sema/SemaARM.h (+2-1)
  • (modified) clang/include/clang/Sema/SemaRISCV.h (+2-1)
  • (modified) clang/lib/CodeGen/Targets/AArch64.cpp (+4-3)
  • (modified) clang/lib/Sema/SemaARM.cpp (+54-8)
  • (modified) clang/lib/Sema/SemaDeclAttr.cpp (+4-3)
  • (modified) clang/lib/Sema/SemaRISCV.cpp (+3-1)
  • (modified) clang/test/AST/attr-target-version.c (+20-4)
  • (modified) clang/test/CodeGen/AArch64/fmv-duplicate-mangled-name.c (+16)
  • (added) clang/test/CodeGen/AArch64/fmv-explicit-priority.c (+193)
  • (modified) clang/test/Sema/attr-target-clones-aarch64.c (+10)
  • (modified) clang/test/Sema/attr-target-version.c (+9)
  • (modified) llvm/include/llvm/Analysis/TargetTransformInfo.h (+5-1)
  • (modified) llvm/include/llvm/Analysis/TargetTransformInfoImpl.h (+4)
  • (modified) llvm/include/llvm/TargetParser/AArch64FeatPriorities.inc (+7-1)
  • (modified) llvm/include/llvm/TargetParser/AArch64TargetParser.h (+7-4)
  • (modified) llvm/lib/Analysis/TargetTransformInfo.cpp (+4)
  • (modified) llvm/lib/Target/AArch64/AArch64FMV.td (+8)
  • (modified) llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp (+14-3)
  • (modified) llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h (+1)
  • (modified) llvm/lib/TargetParser/AArch64TargetParser.cpp (+20-11)
  • (modified) llvm/lib/Transforms/IPO/GlobalOpt.cpp (+15-7)
  • (modified) llvm/utils/TableGen/Basic/ARMTargetDefEmitter.cpp (+6-3)
diff --git a/clang/include/clang/Basic/DiagnosticSemaKinds.td b/clang/include/clang/Basic/DiagnosticSemaKinds.td
index b2ea65ae111be..05fecee73270f 100644
--- a/clang/include/clang/Basic/DiagnosticSemaKinds.td
+++ b/clang/include/clang/Basic/DiagnosticSemaKinds.td
@@ -12747,6 +12747,12 @@ def warn_target_clone_duplicate_options
 def warn_target_clone_no_impact_options
     : Warning<"version list contains entries that don't impact code generation">,
       InGroup<FunctionMultiVersioning>;
+def warn_version_priority_out_of_range
+    : Warning<"version priority '%0' is outside the allowed range [1-31]; ignoring priority">,
+      InGroup<FunctionMultiVersioning>;
+def warn_invalid_default_version_priority
+    : Warning<"priority of default version cannot be overridden; ignoring priority">,
+      InGroup<FunctionMultiVersioning>;
 
 // three-way comparison operator diagnostics
 def err_implied_comparison_category_type_not_found : Error<
diff --git a/clang/include/clang/Sema/SemaARM.h b/clang/include/clang/Sema/SemaARM.h
index e77d65f9362d8..1a2775d7b050e 100644
--- a/clang/include/clang/Sema/SemaARM.h
+++ b/clang/include/clang/Sema/SemaARM.h
@@ -92,7 +92,8 @@ class SemaARM : public SemaBase {
   /// false otherwise.
   bool areLaxCompatibleSveTypes(QualType FirstType, QualType SecondType);
 
-  bool checkTargetVersionAttr(const StringRef Str, const SourceLocation Loc);
+  bool checkTargetVersionAttr(const StringRef Param, const SourceLocation Loc,
+                              SmallString<64> &NewParam);
   bool checkTargetClonesAttr(SmallVectorImpl<StringRef> &Params,
                              SmallVectorImpl<SourceLocation> &Locs,
                              SmallVectorImpl<SmallString<64>> &NewParams);
diff --git a/clang/include/clang/Sema/SemaRISCV.h b/clang/include/clang/Sema/SemaRISCV.h
index 844cc3ce4a440..863b8a143f48a 100644
--- a/clang/include/clang/Sema/SemaRISCV.h
+++ b/clang/include/clang/Sema/SemaRISCV.h
@@ -56,7 +56,8 @@ class SemaRISCV : public SemaBase {
 
   std::unique_ptr<sema::RISCVIntrinsicManager> IntrinsicManager;
 
-  bool checkTargetVersionAttr(const StringRef Param, const SourceLocation Loc);
+  bool checkTargetVersionAttr(const StringRef Param, const SourceLocation Loc,
+                              SmallString<64> &NewParam);
   bool checkTargetClonesAttr(SmallVectorImpl<StringRef> &Params,
                              SmallVectorImpl<SourceLocation> &Locs,
                              SmallVectorImpl<SmallString<64>> &NewParams);
diff --git a/clang/lib/CodeGen/Targets/AArch64.cpp b/clang/lib/CodeGen/Targets/AArch64.cpp
index b82c46966cf0b..f451562b86ec3 100644
--- a/clang/lib/CodeGen/Targets/AArch64.cpp
+++ b/clang/lib/CodeGen/Targets/AArch64.cpp
@@ -1337,9 +1337,10 @@ void AArch64ABIInfo::appendAttributeMangling(StringRef AttrStr,
 
   llvm::SmallDenseSet<StringRef, 8> UniqueFeats;
   for (auto &Feat : Features)
-    if (auto Ext = llvm::AArch64::parseFMVExtension(Feat))
-      if (UniqueFeats.insert(Ext->Name).second)
-        Out << 'M' << Ext->Name;
+    if (getTarget().doesFeatureAffectCodeGen(Feat))
+      if (auto Ext = llvm::AArch64::parseFMVExtension(Feat))
+        if (UniqueFeats.insert(Ext->Name).second)
+          Out << 'M' << Ext->Name;
 }
 
 std::unique_ptr<TargetCodeGenInfo>
diff --git a/clang/lib/Sema/SemaARM.cpp b/clang/lib/Sema/SemaARM.cpp
index 8e27fabccd583..d3110dd18e927 100644
--- a/clang/lib/Sema/SemaARM.cpp
+++ b/clang/lib/Sema/SemaARM.cpp
@@ -1535,19 +1535,52 @@ bool SemaARM::areLaxCompatibleSveTypes(QualType FirstType,
          IsLaxCompatible(SecondType, FirstType);
 }
 
+static void appendFeature(StringRef Feat, SmallString<64> &Buffer) {
+  if (!Buffer.empty())
+    Buffer.append("+");
+  Buffer.append(Feat);
+}
+
+static void convertPriorityString(unsigned Priority,
+                                  SmallString<64> &NewParam) {
+  StringRef PriorityString[5] = {"P0", "P1", "P2", "P3", "P4"};
+
+  assert(Priority > 0 && Priority < 32 && "priority out of range");
+  // Convert priority=[1-31] -> P0 + ... + P4
+  for (unsigned BitPos = 0; BitPos < 5; ++BitPos)
+    if (Priority & (1U << BitPos))
+      appendFeature(PriorityString[BitPos], NewParam);
+}
+
 bool SemaARM::checkTargetVersionAttr(const StringRef Param,
-                                     const SourceLocation Loc) {
+                                     const SourceLocation Loc,
+                                     SmallString<64> &NewParam) {
   using namespace DiagAttrParams;
 
+  auto [LHS, RHS] = Param.split(';');
+  bool IsDefault = false;
   llvm::SmallVector<StringRef, 8> Features;
-  Param.split(Features, '+');
+  LHS.split(Features, '+');
   for (StringRef Feat : Features) {
     Feat = Feat.trim();
     if (Feat == "default")
-      continue;
-    if (!getASTContext().getTargetInfo().validateCpuSupports(Feat))
+      IsDefault = true;
+    else if (!getASTContext().getTargetInfo().validateCpuSupports(Feat))
       return Diag(Loc, diag::warn_unsupported_target_attribute)
              << Unsupported << None << Feat << TargetVersion;
+    appendFeature(Feat, NewParam);
+  }
+
+  if (!RHS.empty() && RHS.consume_front("priority=")) {
+    if (IsDefault)
+      Diag(Loc, diag::warn_invalid_default_version_priority);
+    else {
+      unsigned Digit;
+      if (RHS.getAsInteger(0, Digit) || Digit < 1 || Digit > 31)
+        Diag(Loc, diag::warn_version_priority_out_of_range) << RHS;
+      else
+        convertPriorityString(Digit, NewParam);
+    }
   }
   return false;
 }
@@ -1569,15 +1602,20 @@ bool SemaARM::checkTargetClonesAttr(
     const StringRef Param = Params[I].trim();
     const SourceLocation &Loc = Locs[I];
 
-    if (Param.empty())
+    auto [LHS, RHS] = Param.split(';');
+    bool HasPriority = !RHS.empty() && RHS.consume_front("priority=");
+
+    if (LHS.empty())
       return Diag(Loc, diag::warn_unsupported_target_attribute)
              << Unsupported << None << "" << TargetClones;
 
-    if (Param == "default") {
+    if (LHS == "default") {
       if (HasDefault)
         Diag(Loc, diag::warn_target_clone_duplicate_options);
       else {
-        NewParams.push_back(Param);
+        if (HasPriority)
+          Diag(Loc, diag::warn_invalid_default_version_priority);
+        NewParams.push_back(LHS);
         HasDefault = true;
       }
       continue;
@@ -1586,7 +1624,7 @@ bool SemaARM::checkTargetClonesAttr(
     bool HasCodeGenImpact = false;
     llvm::SmallVector<StringRef, 8> Features;
     llvm::SmallVector<StringRef, 8> ValidFeatures;
-    Param.split(Features, '+');
+    LHS.split(Features, '+');
     for (StringRef Feat : Features) {
       Feat = Feat.trim();
       if (!getASTContext().getTargetInfo().validateCpuSupports(Feat)) {
@@ -1616,6 +1654,14 @@ bool SemaARM::checkTargetClonesAttr(
       continue;
     }
 
+    if (HasPriority) {
+      unsigned Digit;
+      if (RHS.getAsInteger(0, Digit) || Digit < 1 || Digit > 31)
+        Diag(Loc, diag::warn_version_priority_out_of_range) << RHS;
+      else
+        convertPriorityString(Digit, NewParam);
+    }
+
     // Valid non-default argument.
     NewParams.push_back(NewParam);
     HasNonDefault = true;
diff --git a/clang/lib/Sema/SemaDeclAttr.cpp b/clang/lib/Sema/SemaDeclAttr.cpp
index 9a2950cf1648e..46f4bab9edc71 100644
--- a/clang/lib/Sema/SemaDeclAttr.cpp
+++ b/clang/lib/Sema/SemaDeclAttr.cpp
@@ -3333,19 +3333,20 @@ bool Sema::checkTargetAttr(SourceLocation LiteralLoc, StringRef AttrStr) {
 static void handleTargetVersionAttr(Sema &S, Decl *D, const ParsedAttr &AL) {
   StringRef Param;
   SourceLocation Loc;
+  SmallString<64> NewParam;
   if (!S.checkStringLiteralArgumentAttr(AL, 0, Param, &Loc))
     return;
 
   if (S.Context.getTargetInfo().getTriple().isAArch64()) {
-    if (S.ARM().checkTargetVersionAttr(Param, Loc))
+    if (S.ARM().checkTargetVersionAttr(Param, Loc, NewParam))
       return;
   } else if (S.Context.getTargetInfo().getTriple().isRISCV()) {
-    if (S.RISCV().checkTargetVersionAttr(Param, Loc))
+    if (S.RISCV().checkTargetVersionAttr(Param, Loc, NewParam))
       return;
   }
 
   TargetVersionAttr *NewAttr =
-      ::new (S.Context) TargetVersionAttr(S.Context, AL, Param);
+      ::new (S.Context) TargetVersionAttr(S.Context, AL, NewParam);
   D->addAttr(NewAttr);
 }
 
diff --git a/clang/lib/Sema/SemaRISCV.cpp b/clang/lib/Sema/SemaRISCV.cpp
index 994cd07c1e263..bb91bd7aefbb4 100644
--- a/clang/lib/Sema/SemaRISCV.cpp
+++ b/clang/lib/Sema/SemaRISCV.cpp
@@ -1636,7 +1636,8 @@ bool SemaRISCV::isValidFMVExtension(StringRef Ext) {
 }
 
 bool SemaRISCV::checkTargetVersionAttr(const StringRef Param,
-                                       const SourceLocation Loc) {
+                                       const SourceLocation Loc,
+                                       SmallString<64> &NewParam) {
   using namespace DiagAttrParams;
 
   llvm::SmallVector<StringRef, 8> AttrStrs;
@@ -1682,6 +1683,7 @@ bool SemaRISCV::checkTargetVersionAttr(const StringRef Param,
     return Diag(Loc, diag::warn_unsupported_target_attribute)
            << Unsupported << None << Param << TargetVersion;
 
+  NewParam = Param;
   return false;
 }
 
diff --git a/clang/test/AST/attr-target-version.c b/clang/test/AST/attr-target-version.c
index b537f5e685a31..c7b83bef1b91b 100644
--- a/clang/test/AST/attr-target-version.c
+++ b/clang/test/AST/attr-target-version.c
@@ -2,7 +2,23 @@
 
 int __attribute__((target_version("sve2-bitperm + sha2"))) foov(void) { return 1; }
 int __attribute__((target_clones(" lse + fp + sha3 ", "default"))) fooc(void) { return 2; }
-// CHECK: TargetVersionAttr
-// CHECK: sve2-bitperm + sha2
-// CHECK: TargetClonesAttr
-// CHECK: fp+lse+sha3 default
+
+int __attribute__((target_version("aes;priority=1"))) explicit_priority(void) { return 1; }
+int __attribute__((target_version("bf16;priority=2"))) explicit_priority(void) { return 2; }
+int __attribute__((target_version("crc;priority=4"))) explicit_priority(void) { return 4; }
+int __attribute__((target_version("dpb2;priority=8"))) explicit_priority(void) { return 8; }
+int __attribute__((target_version("fp16fml;priority=16"))) explicit_priority(void) { return 16; }
+
+int __attribute__((target_clones("simd;priority=31", "default"))) explicit_priority(void) {
+  return 0;
+}
+
+// CHECK: TargetVersionAttr {{.*}} "sve2-bitperm+sha2"
+// CHECK: TargetClonesAttr {{.*}} fp+lse+sha3 default
+
+// CHECK: TargetVersionAttr {{.*}} "aes+P0"
+// CHECK: TargetVersionAttr {{.*}} "bf16+P1"
+// CHECK: TargetVersionAttr {{.*}} "crc+P2"
+// CHECK: TargetVersionAttr {{.*}} "dpb2+P3"
+// CHECK: TargetVersionAttr {{.*}} "fp16fml+P4"
+// CHECK: TargetClonesAttr {{.*}} simd+P0+P1+P2+P3+P4 default
diff --git a/clang/test/CodeGen/AArch64/fmv-duplicate-mangled-name.c b/clang/test/CodeGen/AArch64/fmv-duplicate-mangled-name.c
index e7e611e09542e..ebe5b75cf7946 100644
--- a/clang/test/CodeGen/AArch64/fmv-duplicate-mangled-name.c
+++ b/clang/test/CodeGen/AArch64/fmv-duplicate-mangled-name.c
@@ -1,5 +1,7 @@
 // RUN: %clang_cc1 -triple aarch64-linux-gnu -verify -emit-llvm-only %s -DCHECK_IMPLICIT_DEFAULT
 // RUN: %clang_cc1 -triple aarch64-linux-gnu -verify -emit-llvm-only %s -DCHECK_EXPLICIT_DEFAULT
+// RUN: %clang_cc1 -triple aarch64-linux-gnu -verify -emit-llvm-only %s -DCHECK_EXPLICIT_VERSION_PRIORITY
+// RUN: %clang_cc1 -triple aarch64-linux-gnu -verify -emit-llvm-only %s -DCHECK_EXPLICIT_CLONES_PRIORITY
 
 #if defined(CHECK_IMPLICIT_DEFAULT)
 
@@ -21,4 +23,18 @@ __attribute__((target_version("default"))) int explicit_default_bad(void) { retu
 // expected-note@-2 {{previous definition is here}}
 __attribute__((target_clones("aes", "lse", "default"))) int explicit_default_bad(void) { return 1; }
 
+#elif defined(CHECK_EXPLICIT_VERSION_PRIORITY)
+
+__attribute__((target_version("aes"))) int explicit_version_priority(void) { return 0; }
+// expected-error@+2 {{definition with same mangled name 'explicit_version_priority._Maes' as another definition}}
+// expected-note@-2 {{previous definition is here}}
+__attribute__((target_version("aes;priority=10"))) int explicit_version_priority(void) { return 1; }
+
+#elif defined(CHECK_EXPLICIT_CLONES_PRIORITY)
+
+__attribute__((target_version("aes;priority=20"))) int explicit_clones_priority(void) { return 0; }
+// expected-error@+2 {{definition with same mangled name 'explicit_clones_priority._Maes' as another definition}}
+// expected-note@-2 {{previous definition is here}}
+__attribute__((target_clones("aes;priority=5", "lse"))) int explicit_clones_priority(void) { return 1; }
+
 #endif
diff --git a/clang/test/CodeGen/AArch64/fmv-explicit-priority.c b/clang/test/CodeGen/AArch64/fmv-explicit-priority.c
new file mode 100644
index 0000000000000..437221c95542b
--- /dev/null
+++ b/clang/test/CodeGen/AArch64/fmv-explicit-priority.c
@@ -0,0 +1,193 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --function-signature --check-attributes --check-globals --include-generated-funcs
+// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -O3 -fno-inline -emit-llvm -o - %s | FileCheck %s
+
+__attribute__((target_version("lse;priority=30"))) int foo(void) { return 1; }
+__attribute__((target_version("sve2;priority=20"))) int foo(void) { return 2; }
+__attribute__((target_version("sve;priority=10"))) int foo(void) { return 3; }
+__attribute__((target_version( "default"))) int foo(void) { return 0; }
+
+__attribute__((target_clones("lse+sve2;priority=3", "lse;priority=2", "sve;priority=1", "default")))
+int fmv_caller(void) { return foo(); }
+
+
+__attribute__((target_version("aes"))) int bar(void) { return 1; }
+__attribute__((target_version("sm4;priority=5"))) int bar(void) { return 2; }
+__attribute__((target_version("default"))) int bar(void) { return 0; }
+
+__attribute__((target("aes"))) int regular_caller_aes() { return bar(); }
+__attribute__((target("sm4"))) int regular_caller_sm4() { return bar(); }
+//.
+// CHECK: @__aarch64_cpu_features = external dso_local local_unnamed_addr global { i64 }
+// CHECK: @foo = weak_odr ifunc i32 (), ptr @foo.resolver
+// CHECK: @fmv_caller = weak_odr ifunc i32 (), ptr @fmv_caller.resolver
+// CHECK: @bar = weak_odr ifunc i32 (), ptr @bar.resolver
+//.
+// CHECK: Function Attrs: mustprogress nofree noinline norecurse nosync nounwind willreturn memory(none)
+// CHECK-LABEL: define {{[^@]+}}@foo._Mlse
+// CHECK-SAME: () #[[ATTR0:[0-9]+]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    ret i32 1
+//
+//
+// CHECK: Function Attrs: mustprogress nofree noinline norecurse nosync nounwind willreturn memory(none) vscale_range(1,16)
+// CHECK-LABEL: define {{[^@]+}}@foo._Msve2
+// CHECK-SAME: () #[[ATTR1:[0-9]+]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    ret i32 2
+//
+//
+// CHECK: Function Attrs: mustprogress nofree noinline norecurse nosync nounwind willreturn memory(none) vscale_range(1,16)
+// CHECK-LABEL: define {{[^@]+}}@foo._Msve
+// CHECK-SAME: () #[[ATTR2:[0-9]+]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    ret i32 3
+//
+//
+// CHECK: Function Attrs: mustprogress nofree noinline norecurse nosync nounwind willreturn memory(none)
+// CHECK-LABEL: define {{[^@]+}}@foo.default
+// CHECK-SAME: () #[[ATTR3:[0-9]+]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    ret i32 0
+//
+//
+// CHECK: Function Attrs: mustprogress nofree noinline norecurse nosync nounwind willreturn memory(none) vscale_range(1,16)
+// CHECK-LABEL: define {{[^@]+}}@fmv_caller._MlseMsve2
+// CHECK-SAME: () #[[ATTR4:[0-9]+]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[CALL:%.*]] = tail call i32 @foo._Mlse()
+// CHECK-NEXT:    ret i32 [[CALL]]
+//
+//
+// CHECK: Function Attrs: mustprogress nofree noinline norecurse nosync nounwind willreturn memory(none) vscale_range(1,16)
+// CHECK-LABEL: define {{[^@]+}}@fmv_caller._Mlse
+// CHECK-SAME: () #[[ATTR5:[0-9]+]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[CALL:%.*]] = tail call i32 @foo._Mlse()
+// CHECK-NEXT:    ret i32 [[CALL]]
+//
+//
+// CHECK: Function Attrs: noinline nounwind vscale_range(1,16)
+// CHECK-LABEL: define {{[^@]+}}@fmv_caller._Msve
+// CHECK-SAME: () #[[ATTR6:[0-9]+]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[CALL:%.*]] = tail call i32 @foo() #[[ATTR12:[0-9]+]]
+// CHECK-NEXT:    ret i32 [[CALL]]
+//
+//
+// CHECK: Function Attrs: mustprogress nofree noinline norecurse nosync nounwind willreturn memory(none) vscale_range(1,16)
+// CHECK-LABEL: define {{[^@]+}}@fmv_caller.default
+// CHECK-SAME: () #[[ATTR7:[0-9]+]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[CALL:%.*]] = tail call i32 @foo.default()
+// CHECK-NEXT:    ret i32 [[CALL]]
+//
+//
+// CHECK: Function Attrs: mustprogress nofree noinline norecurse nosync nounwind willreturn memory(none)
+// CHECK-LABEL: define {{[^@]+}}@bar._Maes
+// CHECK-SAME: () #[[ATTR8:[0-9]+]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    ret i32 1
+//
+//
+// CHECK: Function Attrs: mustprogress nofree noinline norecurse nosync nounwind willreturn memory(none)
+// CHECK-LABEL: define {{[^@]+}}@bar._Msm4
+// CHECK-SAME: () #[[ATTR9:[0-9]+]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    ret i32 2
+//
+//
+// CHECK: Function Attrs: mustprogress nofree noinline norecurse nosync nounwind willreturn memory(none)
+// CHECK-LABEL: define {{[^@]+}}@bar.default
+// CHECK-SAME: () #[[ATTR3]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    ret i32 0
+//
+//
+// CHECK: Function Attrs: noinline nounwind
+// CHECK-LABEL: define {{[^@]+}}@regular_caller_aes
+// CHECK-SAME: () local_unnamed_addr #[[ATTR10:[0-9]+]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[CALL:%.*]] = tail call i32 @bar() #[[ATTR12]]
+// CHECK-NEXT:    ret i32 [[CALL]]
+//
+//
+// CHECK: Function Attrs: mustprogress nofree noinline norecurse nosync nounwind willreturn memory(none)
+// CHECK-LABEL: define {{[^@]+}}@regular_caller_sm4
+// CHECK-SAME: () local_unnamed_addr #[[ATTR11:[0-9]+]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[CALL:%.*]] = tail call i32 @bar._Msm4()
+// CHECK-NEXT:    ret i32 [[CALL]]
+//
+//
+// CHECK-LABEL: define {{[^@]+}}@foo.resolver() comdat {
+// CHECK-NEXT:  resolver_entry:
+// CHECK-NEXT:    tail call void @__init_cpu_features_resolver()
+// CHECK-NEXT:    [[TMP0:%.*]] = load i64, ptr @__aarch64_cpu_features, align 8
+// CHECK-NEXT:    [[TMP1:%.*]] = and i64 [[TMP0]], 128
+// CHECK-NEXT:    [[DOTNOT:%.*]] = icmp eq i64 [[TMP1]], 0
+// CHECK-NEXT:    br i1 [[DOTNOT]], label [[RESOLVER_ELSE:%.*]], label [[COMMON_RET:%.*]]
+// CHECK:       common.ret:
+// CHECK-NEXT:    [[COMMON_RET_OP:%.*]] = phi ptr [ @foo._Mlse, [[RESOLVER_ENTRY:%.*]] ], [ @foo._Msve2, [[RESOLVER_ELSE]] ], [ [[FOO__MSVE_FOO_DEFAULT:%.*]], [[RESOLVER_ELSE2:%.*]] ]
+// CHECK-NEXT:    ret ptr [[COMMON_RET_OP]]
+// CHECK:       resolver_else:
+// CHECK-NEXT:    [[TMP2:%.*]] = and i64 [[TMP0]], 69793284352
+// CHECK-NEXT:    [[TMP3:%.*]] = icmp eq i64 [[TMP2]], 69793284352
+// CHECK-NEXT:    br i1 [[TMP3]], label [[COMMON_RET]], label [[RESOLVER_ELSE2]]
+// CHECK:       resolver_else2:
+// CHECK-NEXT:    [[TMP4:%.*]] = and i64 [[TMP0]], 1073807616
+// CHECK-NEXT:    [[TMP5:%.*]] = icmp eq i64 [[TMP4]], 1073807616
+// CHECK-NEXT:    [[FOO__MSVE_FOO_DEFAULT]] = select i1 [[TMP5]], ptr @foo._Msve, ptr @foo.default
+// CHECK-NEXT:    br label [[COMMON_RET]]
+//
+//
+// CHECK-LABEL: define {{[^@]+}}@fmv_caller.resolver() comdat {
+// CHECK-NEXT:  resolver_entry:
+// CHECK-NEXT:    tail call void @__init_cpu_features_resolver()
+// CHECK-NEXT:    [[TMP0:%.*]] = load i64, ptr @__aarch64_cpu_features, align 8
+// CHECK-NEXT:    [[TMP1:%.*]] = and i64 [[TMP0]], 69793284480
+// CHECK-NEXT:    [[TMP2:%.*]] = icmp eq i64 [[TMP1]], 69793284480
+// CHECK-NEXT:    br i1 [[TMP2]], label [[COMMON_RET:%.*]], label [[RESOLVER_ELSE:%.*]]
+// CHECK:       common.ret:
+// CHECK-NEXT:    [[COMMON_RET_OP:%.*]] = phi ptr [ @fmv_caller._MlseMsve2, [[RESOLVER_ENTRY:%.*]] ], [ @fmv_caller._Mlse, [[RESOLVER_ELSE]] ], [ [[FMV_CALLER__MSVE_FMV_CALLER_DEFAULT:%.*]], [[RESOLVER_ELSE2:%.*]] ]
+// CHECK-NEXT:    ret ptr [[COMMON_RET_OP]]
+// CHECK:       resolver_else:
+// CHECK-NEXT:    [[TMP3:%.*]] = and i64 [[TMP0]], 128
+// CHECK-NEXT:    [[DOTNOT:%.*]] = icmp eq i64 [[TMP3]], 0
+// CHECK-NEXT:    br i1 [[DOTNOT]], label [[RESOLVER_ELSE2]], label [[COMMON_RET]]
+// CHECK:       resolver_else2:
+// CHECK-NEXT:    [[TMP4:%.*]] = and i64 [[TMP0]], 1073807616
+// CHECK-NEXT:    [[TMP5:%.*]] = icmp eq i64 [[TMP4]], 1073807616
+// CHECK-NEXT:    [[FMV_CALLER__MSVE_FMV_CALLER_DEFAULT]] = select i1 [[TMP5]], ptr @fmv_call...
[truncated]

@llvmbot
Copy link
Member

llvmbot commented Jul 23, 2025

@llvm/pr-subscribers-llvm-analysis

Author: Alexandros Lamprineas (labrinea)

Changes

Implements ARM-software/acle#404

This allows the user to specify "priority=[1-32];featA+featB" where priority=31 means highest priority. If the explicit priority string is omitted then the priority of "featA+featB" is implied, which is lower than priority=1.

Internally this gets expanded using special FMV features P0 ... P5 which can encode up to 31 priority levels (excluding all zeros). Those do not have corresponding detection bit at pos FEAT_#enum so I made this field optional in FMVInfo. Also they don't affect the codegen or name mangling of versioned functions.


Patch is 39.33 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/150267.diff

23 Files Affected:

  • (modified) clang/include/clang/Basic/DiagnosticSemaKinds.td (+6)
  • (modified) clang/include/clang/Sema/SemaARM.h (+2-1)
  • (modified) clang/include/clang/Sema/SemaRISCV.h (+2-1)
  • (modified) clang/lib/CodeGen/Targets/AArch64.cpp (+4-3)
  • (modified) clang/lib/Sema/SemaARM.cpp (+54-8)
  • (modified) clang/lib/Sema/SemaDeclAttr.cpp (+4-3)
  • (modified) clang/lib/Sema/SemaRISCV.cpp (+3-1)
  • (modified) clang/test/AST/attr-target-version.c (+20-4)
  • (modified) clang/test/CodeGen/AArch64/fmv-duplicate-mangled-name.c (+16)
  • (added) clang/test/CodeGen/AArch64/fmv-explicit-priority.c (+193)
  • (modified) clang/test/Sema/attr-target-clones-aarch64.c (+10)
  • (modified) clang/test/Sema/attr-target-version.c (+9)
  • (modified) llvm/include/llvm/Analysis/TargetTransformInfo.h (+5-1)
  • (modified) llvm/include/llvm/Analysis/TargetTransformInfoImpl.h (+4)
  • (modified) llvm/include/llvm/TargetParser/AArch64FeatPriorities.inc (+7-1)
  • (modified) llvm/include/llvm/TargetParser/AArch64TargetParser.h (+7-4)
  • (modified) llvm/lib/Analysis/TargetTransformInfo.cpp (+4)
  • (modified) llvm/lib/Target/AArch64/AArch64FMV.td (+8)
  • (modified) llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp (+14-3)
  • (modified) llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h (+1)
  • (modified) llvm/lib/TargetParser/AArch64TargetParser.cpp (+20-11)
  • (modified) llvm/lib/Transforms/IPO/GlobalOpt.cpp (+15-7)
  • (modified) llvm/utils/TableGen/Basic/ARMTargetDefEmitter.cpp (+6-3)
diff --git a/clang/include/clang/Basic/DiagnosticSemaKinds.td b/clang/include/clang/Basic/DiagnosticSemaKinds.td
index b2ea65ae111be..05fecee73270f 100644
--- a/clang/include/clang/Basic/DiagnosticSemaKinds.td
+++ b/clang/include/clang/Basic/DiagnosticSemaKinds.td
@@ -12747,6 +12747,12 @@ def warn_target_clone_duplicate_options
 def warn_target_clone_no_impact_options
     : Warning<"version list contains entries that don't impact code generation">,
       InGroup<FunctionMultiVersioning>;
+def warn_version_priority_out_of_range
+    : Warning<"version priority '%0' is outside the allowed range [1-31]; ignoring priority">,
+      InGroup<FunctionMultiVersioning>;
+def warn_invalid_default_version_priority
+    : Warning<"priority of default version cannot be overridden; ignoring priority">,
+      InGroup<FunctionMultiVersioning>;
 
 // three-way comparison operator diagnostics
 def err_implied_comparison_category_type_not_found : Error<
diff --git a/clang/include/clang/Sema/SemaARM.h b/clang/include/clang/Sema/SemaARM.h
index e77d65f9362d8..1a2775d7b050e 100644
--- a/clang/include/clang/Sema/SemaARM.h
+++ b/clang/include/clang/Sema/SemaARM.h
@@ -92,7 +92,8 @@ class SemaARM : public SemaBase {
   /// false otherwise.
   bool areLaxCompatibleSveTypes(QualType FirstType, QualType SecondType);
 
-  bool checkTargetVersionAttr(const StringRef Str, const SourceLocation Loc);
+  bool checkTargetVersionAttr(const StringRef Param, const SourceLocation Loc,
+                              SmallString<64> &NewParam);
   bool checkTargetClonesAttr(SmallVectorImpl<StringRef> &Params,
                              SmallVectorImpl<SourceLocation> &Locs,
                              SmallVectorImpl<SmallString<64>> &NewParams);
diff --git a/clang/include/clang/Sema/SemaRISCV.h b/clang/include/clang/Sema/SemaRISCV.h
index 844cc3ce4a440..863b8a143f48a 100644
--- a/clang/include/clang/Sema/SemaRISCV.h
+++ b/clang/include/clang/Sema/SemaRISCV.h
@@ -56,7 +56,8 @@ class SemaRISCV : public SemaBase {
 
   std::unique_ptr<sema::RISCVIntrinsicManager> IntrinsicManager;
 
-  bool checkTargetVersionAttr(const StringRef Param, const SourceLocation Loc);
+  bool checkTargetVersionAttr(const StringRef Param, const SourceLocation Loc,
+                              SmallString<64> &NewParam);
   bool checkTargetClonesAttr(SmallVectorImpl<StringRef> &Params,
                              SmallVectorImpl<SourceLocation> &Locs,
                              SmallVectorImpl<SmallString<64>> &NewParams);
diff --git a/clang/lib/CodeGen/Targets/AArch64.cpp b/clang/lib/CodeGen/Targets/AArch64.cpp
index b82c46966cf0b..f451562b86ec3 100644
--- a/clang/lib/CodeGen/Targets/AArch64.cpp
+++ b/clang/lib/CodeGen/Targets/AArch64.cpp
@@ -1337,9 +1337,10 @@ void AArch64ABIInfo::appendAttributeMangling(StringRef AttrStr,
 
   llvm::SmallDenseSet<StringRef, 8> UniqueFeats;
   for (auto &Feat : Features)
-    if (auto Ext = llvm::AArch64::parseFMVExtension(Feat))
-      if (UniqueFeats.insert(Ext->Name).second)
-        Out << 'M' << Ext->Name;
+    if (getTarget().doesFeatureAffectCodeGen(Feat))
+      if (auto Ext = llvm::AArch64::parseFMVExtension(Feat))
+        if (UniqueFeats.insert(Ext->Name).second)
+          Out << 'M' << Ext->Name;
 }
 
 std::unique_ptr<TargetCodeGenInfo>
diff --git a/clang/lib/Sema/SemaARM.cpp b/clang/lib/Sema/SemaARM.cpp
index 8e27fabccd583..d3110dd18e927 100644
--- a/clang/lib/Sema/SemaARM.cpp
+++ b/clang/lib/Sema/SemaARM.cpp
@@ -1535,19 +1535,52 @@ bool SemaARM::areLaxCompatibleSveTypes(QualType FirstType,
          IsLaxCompatible(SecondType, FirstType);
 }
 
+static void appendFeature(StringRef Feat, SmallString<64> &Buffer) {
+  if (!Buffer.empty())
+    Buffer.append("+");
+  Buffer.append(Feat);
+}
+
+static void convertPriorityString(unsigned Priority,
+                                  SmallString<64> &NewParam) {
+  StringRef PriorityString[5] = {"P0", "P1", "P2", "P3", "P4"};
+
+  assert(Priority > 0 && Priority < 32 && "priority out of range");
+  // Convert priority=[1-31] -> P0 + ... + P4
+  for (unsigned BitPos = 0; BitPos < 5; ++BitPos)
+    if (Priority & (1U << BitPos))
+      appendFeature(PriorityString[BitPos], NewParam);
+}
+
 bool SemaARM::checkTargetVersionAttr(const StringRef Param,
-                                     const SourceLocation Loc) {
+                                     const SourceLocation Loc,
+                                     SmallString<64> &NewParam) {
   using namespace DiagAttrParams;
 
+  auto [LHS, RHS] = Param.split(';');
+  bool IsDefault = false;
   llvm::SmallVector<StringRef, 8> Features;
-  Param.split(Features, '+');
+  LHS.split(Features, '+');
   for (StringRef Feat : Features) {
     Feat = Feat.trim();
     if (Feat == "default")
-      continue;
-    if (!getASTContext().getTargetInfo().validateCpuSupports(Feat))
+      IsDefault = true;
+    else if (!getASTContext().getTargetInfo().validateCpuSupports(Feat))
       return Diag(Loc, diag::warn_unsupported_target_attribute)
              << Unsupported << None << Feat << TargetVersion;
+    appendFeature(Feat, NewParam);
+  }
+
+  if (!RHS.empty() && RHS.consume_front("priority=")) {
+    if (IsDefault)
+      Diag(Loc, diag::warn_invalid_default_version_priority);
+    else {
+      unsigned Digit;
+      if (RHS.getAsInteger(0, Digit) || Digit < 1 || Digit > 31)
+        Diag(Loc, diag::warn_version_priority_out_of_range) << RHS;
+      else
+        convertPriorityString(Digit, NewParam);
+    }
   }
   return false;
 }
@@ -1569,15 +1602,20 @@ bool SemaARM::checkTargetClonesAttr(
     const StringRef Param = Params[I].trim();
     const SourceLocation &Loc = Locs[I];
 
-    if (Param.empty())
+    auto [LHS, RHS] = Param.split(';');
+    bool HasPriority = !RHS.empty() && RHS.consume_front("priority=");
+
+    if (LHS.empty())
       return Diag(Loc, diag::warn_unsupported_target_attribute)
              << Unsupported << None << "" << TargetClones;
 
-    if (Param == "default") {
+    if (LHS == "default") {
       if (HasDefault)
         Diag(Loc, diag::warn_target_clone_duplicate_options);
       else {
-        NewParams.push_back(Param);
+        if (HasPriority)
+          Diag(Loc, diag::warn_invalid_default_version_priority);
+        NewParams.push_back(LHS);
         HasDefault = true;
       }
       continue;
@@ -1586,7 +1624,7 @@ bool SemaARM::checkTargetClonesAttr(
     bool HasCodeGenImpact = false;
     llvm::SmallVector<StringRef, 8> Features;
     llvm::SmallVector<StringRef, 8> ValidFeatures;
-    Param.split(Features, '+');
+    LHS.split(Features, '+');
     for (StringRef Feat : Features) {
       Feat = Feat.trim();
       if (!getASTContext().getTargetInfo().validateCpuSupports(Feat)) {
@@ -1616,6 +1654,14 @@ bool SemaARM::checkTargetClonesAttr(
       continue;
     }
 
+    if (HasPriority) {
+      unsigned Digit;
+      if (RHS.getAsInteger(0, Digit) || Digit < 1 || Digit > 31)
+        Diag(Loc, diag::warn_version_priority_out_of_range) << RHS;
+      else
+        convertPriorityString(Digit, NewParam);
+    }
+
     // Valid non-default argument.
     NewParams.push_back(NewParam);
     HasNonDefault = true;
diff --git a/clang/lib/Sema/SemaDeclAttr.cpp b/clang/lib/Sema/SemaDeclAttr.cpp
index 9a2950cf1648e..46f4bab9edc71 100644
--- a/clang/lib/Sema/SemaDeclAttr.cpp
+++ b/clang/lib/Sema/SemaDeclAttr.cpp
@@ -3333,19 +3333,20 @@ bool Sema::checkTargetAttr(SourceLocation LiteralLoc, StringRef AttrStr) {
 static void handleTargetVersionAttr(Sema &S, Decl *D, const ParsedAttr &AL) {
   StringRef Param;
   SourceLocation Loc;
+  SmallString<64> NewParam;
   if (!S.checkStringLiteralArgumentAttr(AL, 0, Param, &Loc))
     return;
 
   if (S.Context.getTargetInfo().getTriple().isAArch64()) {
-    if (S.ARM().checkTargetVersionAttr(Param, Loc))
+    if (S.ARM().checkTargetVersionAttr(Param, Loc, NewParam))
       return;
   } else if (S.Context.getTargetInfo().getTriple().isRISCV()) {
-    if (S.RISCV().checkTargetVersionAttr(Param, Loc))
+    if (S.RISCV().checkTargetVersionAttr(Param, Loc, NewParam))
       return;
   }
 
   TargetVersionAttr *NewAttr =
-      ::new (S.Context) TargetVersionAttr(S.Context, AL, Param);
+      ::new (S.Context) TargetVersionAttr(S.Context, AL, NewParam);
   D->addAttr(NewAttr);
 }
 
diff --git a/clang/lib/Sema/SemaRISCV.cpp b/clang/lib/Sema/SemaRISCV.cpp
index 994cd07c1e263..bb91bd7aefbb4 100644
--- a/clang/lib/Sema/SemaRISCV.cpp
+++ b/clang/lib/Sema/SemaRISCV.cpp
@@ -1636,7 +1636,8 @@ bool SemaRISCV::isValidFMVExtension(StringRef Ext) {
 }
 
 bool SemaRISCV::checkTargetVersionAttr(const StringRef Param,
-                                       const SourceLocation Loc) {
+                                       const SourceLocation Loc,
+                                       SmallString<64> &NewParam) {
   using namespace DiagAttrParams;
 
   llvm::SmallVector<StringRef, 8> AttrStrs;
@@ -1682,6 +1683,7 @@ bool SemaRISCV::checkTargetVersionAttr(const StringRef Param,
     return Diag(Loc, diag::warn_unsupported_target_attribute)
            << Unsupported << None << Param << TargetVersion;
 
+  NewParam = Param;
   return false;
 }
 
diff --git a/clang/test/AST/attr-target-version.c b/clang/test/AST/attr-target-version.c
index b537f5e685a31..c7b83bef1b91b 100644
--- a/clang/test/AST/attr-target-version.c
+++ b/clang/test/AST/attr-target-version.c
@@ -2,7 +2,23 @@
 
 int __attribute__((target_version("sve2-bitperm + sha2"))) foov(void) { return 1; }
 int __attribute__((target_clones(" lse + fp + sha3 ", "default"))) fooc(void) { return 2; }
-// CHECK: TargetVersionAttr
-// CHECK: sve2-bitperm + sha2
-// CHECK: TargetClonesAttr
-// CHECK: fp+lse+sha3 default
+
+int __attribute__((target_version("aes;priority=1"))) explicit_priority(void) { return 1; }
+int __attribute__((target_version("bf16;priority=2"))) explicit_priority(void) { return 2; }
+int __attribute__((target_version("crc;priority=4"))) explicit_priority(void) { return 4; }
+int __attribute__((target_version("dpb2;priority=8"))) explicit_priority(void) { return 8; }
+int __attribute__((target_version("fp16fml;priority=16"))) explicit_priority(void) { return 16; }
+
+int __attribute__((target_clones("simd;priority=31", "default"))) explicit_priority(void) {
+  return 0;
+}
+
+// CHECK: TargetVersionAttr {{.*}} "sve2-bitperm+sha2"
+// CHECK: TargetClonesAttr {{.*}} fp+lse+sha3 default
+
+// CHECK: TargetVersionAttr {{.*}} "aes+P0"
+// CHECK: TargetVersionAttr {{.*}} "bf16+P1"
+// CHECK: TargetVersionAttr {{.*}} "crc+P2"
+// CHECK: TargetVersionAttr {{.*}} "dpb2+P3"
+// CHECK: TargetVersionAttr {{.*}} "fp16fml+P4"
+// CHECK: TargetClonesAttr {{.*}} simd+P0+P1+P2+P3+P4 default
diff --git a/clang/test/CodeGen/AArch64/fmv-duplicate-mangled-name.c b/clang/test/CodeGen/AArch64/fmv-duplicate-mangled-name.c
index e7e611e09542e..ebe5b75cf7946 100644
--- a/clang/test/CodeGen/AArch64/fmv-duplicate-mangled-name.c
+++ b/clang/test/CodeGen/AArch64/fmv-duplicate-mangled-name.c
@@ -1,5 +1,7 @@
 // RUN: %clang_cc1 -triple aarch64-linux-gnu -verify -emit-llvm-only %s -DCHECK_IMPLICIT_DEFAULT
 // RUN: %clang_cc1 -triple aarch64-linux-gnu -verify -emit-llvm-only %s -DCHECK_EXPLICIT_DEFAULT
+// RUN: %clang_cc1 -triple aarch64-linux-gnu -verify -emit-llvm-only %s -DCHECK_EXPLICIT_VERSION_PRIORITY
+// RUN: %clang_cc1 -triple aarch64-linux-gnu -verify -emit-llvm-only %s -DCHECK_EXPLICIT_CLONES_PRIORITY
 
 #if defined(CHECK_IMPLICIT_DEFAULT)
 
@@ -21,4 +23,18 @@ __attribute__((target_version("default"))) int explicit_default_bad(void) { retu
 // expected-note@-2 {{previous definition is here}}
 __attribute__((target_clones("aes", "lse", "default"))) int explicit_default_bad(void) { return 1; }
 
+#elif defined(CHECK_EXPLICIT_VERSION_PRIORITY)
+
+__attribute__((target_version("aes"))) int explicit_version_priority(void) { return 0; }
+// expected-error@+2 {{definition with same mangled name 'explicit_version_priority._Maes' as another definition}}
+// expected-note@-2 {{previous definition is here}}
+__attribute__((target_version("aes;priority=10"))) int explicit_version_priority(void) { return 1; }
+
+#elif defined(CHECK_EXPLICIT_CLONES_PRIORITY)
+
+__attribute__((target_version("aes;priority=20"))) int explicit_clones_priority(void) { return 0; }
+// expected-error@+2 {{definition with same mangled name 'explicit_clones_priority._Maes' as another definition}}
+// expected-note@-2 {{previous definition is here}}
+__attribute__((target_clones("aes;priority=5", "lse"))) int explicit_clones_priority(void) { return 1; }
+
 #endif
diff --git a/clang/test/CodeGen/AArch64/fmv-explicit-priority.c b/clang/test/CodeGen/AArch64/fmv-explicit-priority.c
new file mode 100644
index 0000000000000..437221c95542b
--- /dev/null
+++ b/clang/test/CodeGen/AArch64/fmv-explicit-priority.c
@@ -0,0 +1,193 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --function-signature --check-attributes --check-globals --include-generated-funcs
+// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -O3 -fno-inline -emit-llvm -o - %s | FileCheck %s
+
+__attribute__((target_version("lse;priority=30"))) int foo(void) { return 1; }
+__attribute__((target_version("sve2;priority=20"))) int foo(void) { return 2; }
+__attribute__((target_version("sve;priority=10"))) int foo(void) { return 3; }
+__attribute__((target_version( "default"))) int foo(void) { return 0; }
+
+__attribute__((target_clones("lse+sve2;priority=3", "lse;priority=2", "sve;priority=1", "default")))
+int fmv_caller(void) { return foo(); }
+
+
+__attribute__((target_version("aes"))) int bar(void) { return 1; }
+__attribute__((target_version("sm4;priority=5"))) int bar(void) { return 2; }
+__attribute__((target_version("default"))) int bar(void) { return 0; }
+
+__attribute__((target("aes"))) int regular_caller_aes() { return bar(); }
+__attribute__((target("sm4"))) int regular_caller_sm4() { return bar(); }
+//.
+// CHECK: @__aarch64_cpu_features = external dso_local local_unnamed_addr global { i64 }
+// CHECK: @foo = weak_odr ifunc i32 (), ptr @foo.resolver
+// CHECK: @fmv_caller = weak_odr ifunc i32 (), ptr @fmv_caller.resolver
+// CHECK: @bar = weak_odr ifunc i32 (), ptr @bar.resolver
+//.
+// CHECK: Function Attrs: mustprogress nofree noinline norecurse nosync nounwind willreturn memory(none)
+// CHECK-LABEL: define {{[^@]+}}@foo._Mlse
+// CHECK-SAME: () #[[ATTR0:[0-9]+]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    ret i32 1
+//
+//
+// CHECK: Function Attrs: mustprogress nofree noinline norecurse nosync nounwind willreturn memory(none) vscale_range(1,16)
+// CHECK-LABEL: define {{[^@]+}}@foo._Msve2
+// CHECK-SAME: () #[[ATTR1:[0-9]+]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    ret i32 2
+//
+//
+// CHECK: Function Attrs: mustprogress nofree noinline norecurse nosync nounwind willreturn memory(none) vscale_range(1,16)
+// CHECK-LABEL: define {{[^@]+}}@foo._Msve
+// CHECK-SAME: () #[[ATTR2:[0-9]+]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    ret i32 3
+//
+//
+// CHECK: Function Attrs: mustprogress nofree noinline norecurse nosync nounwind willreturn memory(none)
+// CHECK-LABEL: define {{[^@]+}}@foo.default
+// CHECK-SAME: () #[[ATTR3:[0-9]+]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    ret i32 0
+//
+//
+// CHECK: Function Attrs: mustprogress nofree noinline norecurse nosync nounwind willreturn memory(none) vscale_range(1,16)
+// CHECK-LABEL: define {{[^@]+}}@fmv_caller._MlseMsve2
+// CHECK-SAME: () #[[ATTR4:[0-9]+]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[CALL:%.*]] = tail call i32 @foo._Mlse()
+// CHECK-NEXT:    ret i32 [[CALL]]
+//
+//
+// CHECK: Function Attrs: mustprogress nofree noinline norecurse nosync nounwind willreturn memory(none) vscale_range(1,16)
+// CHECK-LABEL: define {{[^@]+}}@fmv_caller._Mlse
+// CHECK-SAME: () #[[ATTR5:[0-9]+]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[CALL:%.*]] = tail call i32 @foo._Mlse()
+// CHECK-NEXT:    ret i32 [[CALL]]
+//
+//
+// CHECK: Function Attrs: noinline nounwind vscale_range(1,16)
+// CHECK-LABEL: define {{[^@]+}}@fmv_caller._Msve
+// CHECK-SAME: () #[[ATTR6:[0-9]+]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[CALL:%.*]] = tail call i32 @foo() #[[ATTR12:[0-9]+]]
+// CHECK-NEXT:    ret i32 [[CALL]]
+//
+//
+// CHECK: Function Attrs: mustprogress nofree noinline norecurse nosync nounwind willreturn memory(none) vscale_range(1,16)
+// CHECK-LABEL: define {{[^@]+}}@fmv_caller.default
+// CHECK-SAME: () #[[ATTR7:[0-9]+]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[CALL:%.*]] = tail call i32 @foo.default()
+// CHECK-NEXT:    ret i32 [[CALL]]
+//
+//
+// CHECK: Function Attrs: mustprogress nofree noinline norecurse nosync nounwind willreturn memory(none)
+// CHECK-LABEL: define {{[^@]+}}@bar._Maes
+// CHECK-SAME: () #[[ATTR8:[0-9]+]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    ret i32 1
+//
+//
+// CHECK: Function Attrs: mustprogress nofree noinline norecurse nosync nounwind willreturn memory(none)
+// CHECK-LABEL: define {{[^@]+}}@bar._Msm4
+// CHECK-SAME: () #[[ATTR9:[0-9]+]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    ret i32 2
+//
+//
+// CHECK: Function Attrs: mustprogress nofree noinline norecurse nosync nounwind willreturn memory(none)
+// CHECK-LABEL: define {{[^@]+}}@bar.default
+// CHECK-SAME: () #[[ATTR3]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    ret i32 0
+//
+//
+// CHECK: Function Attrs: noinline nounwind
+// CHECK-LABEL: define {{[^@]+}}@regular_caller_aes
+// CHECK-SAME: () local_unnamed_addr #[[ATTR10:[0-9]+]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[CALL:%.*]] = tail call i32 @bar() #[[ATTR12]]
+// CHECK-NEXT:    ret i32 [[CALL]]
+//
+//
+// CHECK: Function Attrs: mustprogress nofree noinline norecurse nosync nounwind willreturn memory(none)
+// CHECK-LABEL: define {{[^@]+}}@regular_caller_sm4
+// CHECK-SAME: () local_unnamed_addr #[[ATTR11:[0-9]+]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[CALL:%.*]] = tail call i32 @bar._Msm4()
+// CHECK-NEXT:    ret i32 [[CALL]]
+//
+//
+// CHECK-LABEL: define {{[^@]+}}@foo.resolver() comdat {
+// CHECK-NEXT:  resolver_entry:
+// CHECK-NEXT:    tail call void @__init_cpu_features_resolver()
+// CHECK-NEXT:    [[TMP0:%.*]] = load i64, ptr @__aarch64_cpu_features, align 8
+// CHECK-NEXT:    [[TMP1:%.*]] = and i64 [[TMP0]], 128
+// CHECK-NEXT:    [[DOTNOT:%.*]] = icmp eq i64 [[TMP1]], 0
+// CHECK-NEXT:    br i1 [[DOTNOT]], label [[RESOLVER_ELSE:%.*]], label [[COMMON_RET:%.*]]
+// CHECK:       common.ret:
+// CHECK-NEXT:    [[COMMON_RET_OP:%.*]] = phi ptr [ @foo._Mlse, [[RESOLVER_ENTRY:%.*]] ], [ @foo._Msve2, [[RESOLVER_ELSE]] ], [ [[FOO__MSVE_FOO_DEFAULT:%.*]], [[RESOLVER_ELSE2:%.*]] ]
+// CHECK-NEXT:    ret ptr [[COMMON_RET_OP]]
+// CHECK:       resolver_else:
+// CHECK-NEXT:    [[TMP2:%.*]] = and i64 [[TMP0]], 69793284352
+// CHECK-NEXT:    [[TMP3:%.*]] = icmp eq i64 [[TMP2]], 69793284352
+// CHECK-NEXT:    br i1 [[TMP3]], label [[COMMON_RET]], label [[RESOLVER_ELSE2]]
+// CHECK:       resolver_else2:
+// CHECK-NEXT:    [[TMP4:%.*]] = and i64 [[TMP0]], 1073807616
+// CHECK-NEXT:    [[TMP5:%.*]] = icmp eq i64 [[TMP4]], 1073807616
+// CHECK-NEXT:    [[FOO__MSVE_FOO_DEFAULT]] = select i1 [[TMP5]], ptr @foo._Msve, ptr @foo.default
+// CHECK-NEXT:    br label [[COMMON_RET]]
+//
+//
+// CHECK-LABEL: define {{[^@]+}}@fmv_caller.resolver() comdat {
+// CHECK-NEXT:  resolver_entry:
+// CHECK-NEXT:    tail call void @__init_cpu_features_resolver()
+// CHECK-NEXT:    [[TMP0:%.*]] = load i64, ptr @__aarch64_cpu_features, align 8
+// CHECK-NEXT:    [[TMP1:%.*]] = and i64 [[TMP0]], 69793284480
+// CHECK-NEXT:    [[TMP2:%.*]] = icmp eq i64 [[TMP1]], 69793284480
+// CHECK-NEXT:    br i1 [[TMP2]], label [[COMMON_RET:%.*]], label [[RESOLVER_ELSE:%.*]]
+// CHECK:       common.ret:
+// CHECK-NEXT:    [[COMMON_RET_OP:%.*]] = phi ptr [ @fmv_caller._MlseMsve2, [[RESOLVER_ENTRY:%.*]] ], [ @fmv_caller._Mlse, [[RESOLVER_ELSE]] ], [ [[FMV_CALLER__MSVE_FMV_CALLER_DEFAULT:%.*]], [[RESOLVER_ELSE2:%.*]] ]
+// CHECK-NEXT:    ret ptr [[COMMON_RET_OP]]
+// CHECK:       resolver_else:
+// CHECK-NEXT:    [[TMP3:%.*]] = and i64 [[TMP0]], 128
+// CHECK-NEXT:    [[DOTNOT:%.*]] = icmp eq i64 [[TMP3]], 0
+// CHECK-NEXT:    br i1 [[DOTNOT]], label [[RESOLVER_ELSE2]], label [[COMMON_RET]]
+// CHECK:       resolver_else2:
+// CHECK-NEXT:    [[TMP4:%.*]] = and i64 [[TMP0]], 1073807616
+// CHECK-NEXT:    [[TMP5:%.*]] = icmp eq i64 [[TMP4]], 1073807616
+// CHECK-NEXT:    [[FMV_CALLER__MSVE_FMV_CALLER_DEFAULT]] = select i1 [[TMP5]], ptr @fmv_call...
[truncated]

@github-actions
Copy link

github-actions bot commented Jul 23, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

Implements ARM-software/acle#404

This allows the user to specify "priority=[1-255];featA+featB"
where priority=255 means highest priority. If the explicit
priority string is omitted then the priority of "featA+featB"
is implied, which is lower than priority=1.

Internally this gets expanded using special FMV features P0 ... P7
which can encode up to 256-1 priority levels (excluding all zeros).
Those do not have corresponding detection bit at pos FEAT_#enum
so I made this field optional in FMVInfo. Also they don't affect
the codegen or name mangling of versioned functions.
@labrinea labrinea force-pushed the fmv-explicit-priority branch from 6101d58 to 0575957 Compare August 13, 2025 08:59
@labrinea labrinea requested a review from jroelofs August 13, 2025 09:41
update comment
@labrinea
Copy link
Collaborator Author

labrinea commented Sep 4, 2025

ping

Copy link
Collaborator

@efriedma-quic efriedma-quic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drive-by comment: I'd like to see some optimizer tests for the optimizer changes.

I can try to find time to do a more detailed review on this at some point, but not sure when that will be.

; CHECK: [[CALL:%.*]] = tail call i32 @test_explicit_priority._Msve()
;
entry:
%call = tail call i32 @test_explicit_priority()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what's happening here.

The semantics of a call shouldn't depend on whether the caller is multiversioned; the caller should have a target-features attribute, and the optimization should check those target features. In other words, the following two examples should have the same semantics:

void caller8() __attribute((target_version("sve,priority=100"))) {
  test_explicit_priority();
}
static inline void wrapper() __attribute((target("sve"))) {
  test_explicit_priority();
}
void caller8() __attribute((target_version("sve,priority=100"))) {
  wrapper();
}

Given that, we should never statically select the "sve" version, when better versions exist.

Copy link
Collaborator Author

@labrinea labrinea Sep 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intention is not to alter the semantics of a call when optimizing it with GlobalOpt compared to what the resolver function would have selected at runtime. See https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/IPO/GlobalOpt.cpp#L2628, if the caller is not multiversioned (like the wrapper in your example) then the highest priority callee will be picked statically if their attributes match, otherwise the call will not be optimized. In my example test_explicit_priority the highest priority version (or best version) is the sve(see attribute 20). There is no better version that could have been selected at runtime.

the caller should have a target-features attribute, and the optimization should check those target features

That's exactly what we do for non-FMV callers like the wrapper. See AArch64TTIImpl::getPriorityMask above in llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp

Perhaps my presentation helps https://www.youtube.com/watch?v=k54MFimPz-A&t=867s (slide 10 onwards)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm following your explanation correctly, the theory here is that if the caller is an fmv function, we already did feature detection, and detection for the callee has to come up with the same result. So we have two groups of functions: test_explicit_priority.*, and caller8.*. We prove they're all related to each other: test_explicit_priority.* are all variants of the ifunc test_explicit_priority, and caller8.* are all variants of the ifunc caller8. Given that, we not only prove that certain features are enabled in caller8.*, but also that certain features are disabled (because the resolver would have chosen a different implementation if the feature were enabled).

That makes sense, I guess, but I see a few significant issues here:

  • You don't actually have any code that proves that the caller8.* definitions are related, as far as I can tell. There's no resolver function, and you don't have any code to demangle the functions, so you're just blindly assuming that every fmv function which calls test_explicit_priority() is related.
  • The testcase is confusing because it seems to assume that sve2 doesn't imply sve.
  • What happens if priorities aren't in the same order? Just sorting based on the priority means the FeatureMasks aren't sorted, so I'm not sure just iterating over the callers works the way you want it to if the priorities are reversed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't actually have any code that proves that the caller8.* definitions are related, as far as I can tell.

The callers follow the mangling rules as defined by the ACLE https://arm-software.github.io/acle/main/acle.html#name-mangling. Clang is responsible for generating those correctly, that's why I originally added the integration test in clang/test/CodeGen/AArch64/fmv-explicit-priority.c compiled with -O3 so that GlobalOpt is tested there.

Oh wait! You are saying GlobalOpt itself is not checking that the callers are associated with each other (https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/IPO/GlobalOpt.cpp#L2572), like it does for callees (https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/IPO/GlobalOpt.cpp#L2548). Ouch! Thanks for catching this, it is indeed an issue we have to fix, unrelated to this patch.

The testcase is confusing because it seems to assume that sve2 doesn't imply sve.

I am not sure I am following. Please elaborate. Perhaps you are asking why caller8._MmopsMsve2 calls test_explicit_priority._Msve2 instead of test_explicit_priority._Msve? Because originally in ACLE an FMV feature has higher priority than the features it implies, but we are bending this rule with the explicit override. Hmm, good point, not sure how to properly deal with this. Maybe the algorithm needs adjustment.

What happens if priorities aren't in the same order? Just sorting based on the priority means the FeatureMasks aren't sorted, so I'm not sure just iterating over the callers works the way you want it to if the priorities are reversed.

Is this the same issue as the previous bulletpoint, or do you mean something else? You mean that the premise https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/IPO/GlobalOpt.cpp#L2618 no longer holds? For example caller8._Mmops should be calling test_explicit_priority._Mmops on the basis that since caller8._Msve was not selected it means sve2 is also unavailable so test_explicit_priority._Msve2 ought to be skipped?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've raised #160011 to address the said issues.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the last two points are similar, yes; the sorting doesn't really work properly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend:AArch64 backend:ARM backend:RISC-V clang:codegen IR generation bugs: mangling, exceptions, etc. clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category llvm:analysis Includes value tracking, cost tables and constant folding llvm:transforms tablegen

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants