Skip to content

[IR] Introduce the ptrtoaddr instruction #139357

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

arichardson
Copy link
Member

This introduces a new ptrtoaddr instruction which is similar to
ptrtoint but has two differences:

  1. Unlike ptrtoint, ptrtoaddr does not capture provenance
  2. ptrtoaddr only extracts (and then extends/truncates) the low
    index-width bits of the pointer

For most architectures, difference 2) does not matter since index (address)
width and pointer representation width are the same, but this does make a
difference for architectures that have pointers that aren't just plain
integer addresses such as AMDGPU fat pointers or CHERI capabilities.

This commit introduces textual and bitcode IR support as well as basic code
generation, but optimization passes do not handle the new instruction yet
so it may result in worse code than using ptrtoint. Follow-up changes will
update capture tracking, etc. for the new instruction.

RFC: https://discourse.llvm.org/t/clarifiying-the-semantics-of-ptrtoint/83987/54

Created using spr 1.3.6-beta.1
Created using spr 1.3.6-beta.1

[skip ci]
@llvmbot llvmbot added backend:X86 llvm:globalisel llvm:SelectionDAG SelectionDAGISel as well llvm:ir llvm:analysis Includes value tracking, cost tables and constant folding llvm:SandboxIR labels May 10, 2025
@llvmbot
Copy link
Member

llvmbot commented May 10, 2025

@llvm/pr-subscribers-llvm-transforms
@llvm/pr-subscribers-llvm-analysis
@llvm/pr-subscribers-llvm-ir
@llvm/pr-subscribers-llvm-globalisel
@llvm/pr-subscribers-llvm-selectiondag

@llvm/pr-subscribers-backend-x86

Author: Alexander Richardson (arichardson)

Changes

This introduces a new ptrtoaddr instruction which is similar to
ptrtoint but has two differences:

  1. Unlike ptrtoint, ptrtoaddr does not capture provenance
  2. ptrtoaddr only extracts (and then extends/truncates) the low
    index-width bits of the pointer

For most architectures, difference 2) does not matter since index (address)
width and pointer representation width are the same, but this does make a
difference for architectures that have pointers that aren't just plain
integer addresses such as AMDGPU fat pointers or CHERI capabilities.

This commit introduces textual and bitcode IR support as well as basic code
generation, but optimization passes do not handle the new instruction yet
so it may result in worse code than using ptrtoint. Follow-up changes will
update capture tracking, etc. for the new instruction.

RFC: https://discourse.llvm.org/t/clarifiying-the-semantics-of-ptrtoint/83987/54


Patch is 26.29 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/139357.diff

24 Files Affected:

  • (modified) llvm/docs/LangRef.rst (+55)
  • (modified) llvm/include/llvm-c/Core.h (+1)
  • (modified) llvm/include/llvm/Analysis/TargetTransformInfoImpl.h (+8)
  • (modified) llvm/include/llvm/AsmParser/LLToken.h (+1)
  • (modified) llvm/include/llvm/Bitcode/LLVMBitCodes.h (+2-1)
  • (modified) llvm/include/llvm/CodeGen/GlobalISel/IRTranslator.h (+3)
  • (modified) llvm/include/llvm/IR/InstVisitor.h (+1)
  • (modified) llvm/include/llvm/IR/Instruction.def (+27-26)
  • (modified) llvm/include/llvm/IR/Instructions.h (+34-1)
  • (modified) llvm/include/llvm/SandboxIR/Instruction.h (+1)
  • (modified) llvm/lib/AsmParser/LLLexer.cpp (+1)
  • (modified) llvm/lib/AsmParser/LLParser.cpp (+2)
  • (modified) llvm/lib/Bitcode/Reader/BitcodeReader.cpp (+1)
  • (modified) llvm/lib/Bitcode/Writer/BitcodeWriter.cpp (+1)
  • (modified) llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp (+4)
  • (modified) llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h (+1)
  • (modified) llvm/lib/CodeGen/TargetLoweringBase.cpp (+1)
  • (modified) llvm/lib/IR/Instruction.cpp (+1)
  • (modified) llvm/lib/IR/Instructions.cpp (+16-2)
  • (modified) llvm/lib/SandboxIR/Context.cpp (+1)
  • (added) llvm/test/Assembler/ptrtoaddr-const.ll (+15)
  • (added) llvm/test/Assembler/ptrtoaddr.ll (+6)
  • (added) llvm/test/CodeGen/X86/GlobalISel/ptrtoaddr.ll (+66)
  • (added) llvm/test/CodeGen/X86/ptrtoaddr.ll (+66)
diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index f971c5a32c61f..2d18d0d97aaee 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -12432,6 +12432,61 @@ Example:
       %Y = ptrtoint ptr %P to i64                        ; yields zero extension on 32-bit architecture
       %Z = ptrtoint <4 x ptr> %P to <4 x i64>; yields vector zero extension for a vector of addresses on 32-bit architecture
 
+.. _i_ptrtoaddr:
+
+'``ptrtoaddr .. to``' Instruction
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax:
+"""""""
+
+::
+
+      <result> = ptrtoaddr <ty> <value> to <ty2>             ; yields ty2
+
+Overview:
+"""""""""
+
+The '``ptrtoaddr``' instruction converts the pointer or a vector of
+pointers ``value`` to the underlying integer address (or vector of integers) of
+type ``ty2``. This is different from :ref:`ptrtoint <i_ptrtoint>` in that it
+only operates on the index bits of the pointer and ignores all other bits.
+
+Arguments:
+""""""""""
+
+The '``ptrtoaddr``' instruction takes a ``value`` to cast, which must be
+a value of type :ref:`pointer <t_pointer>` or a vector of pointers, and a
+type to cast it to ``ty2``, which must be an :ref:`integer <t_integer>` or
+a vector of integers type.
+
+Semantics:
+""""""""""
+
+The '``ptrtoaddr``' instruction converts ``value`` to integer type
+``ty2`` by interpreting the lowest index-width pointer representation bits as an
+integer and either truncating or zero extending that value to the size of the
+integer type.
+If the address of ``value`` is smaller than ``ty2`` then a zero extension is
+done. If the address of ``value`` is larger than ``ty2`` then a truncation is
+done. If the address size and the pointer representation size are the same and
+``value`` and ``ty2`` are the same size, then nothing is done (*no-op cast*)
+other than a type change.
+
+The ``ptrtoaddr`` always :ref:`captures the address (but not provenance) <pointercapture>`
+of the pointer argument.
+
+Example:
+""""""""
+This example assumes pointers in address space 1 are 64 bits in size with an
+address width of 32 bits (``p1:64:64:64:32`` :ref:`datalayout string<langref_datalayout>`)
+.. code-block:: llvm
+
+      %X = ptrtoaddr ptr addrspace(1) %P to i8  ; extracts low 32 bits and truncates
+      %Y = ptrtoaddr ptr addrspace(1) %P to i64 ; extracts low 32 bits and zero extends
+      %Z = ptrtoaddr <4 x ptr addrspace(1)> %P to <4 x i64>; yields vector zero extension of low 32 bits for each pointer
+
+
 .. _i_inttoptr:
 
 '``inttoptr .. to``' Instruction
diff --git a/llvm/include/llvm-c/Core.h b/llvm/include/llvm-c/Core.h
index 6857944e6875f..dd74c68bd20c1 100644
--- a/llvm/include/llvm-c/Core.h
+++ b/llvm/include/llvm-c/Core.h
@@ -110,6 +110,7 @@ typedef enum {
   LLVMFPTrunc        = 37,
   LLVMFPExt          = 38,
   LLVMPtrToInt       = 39,
+  LLVMPtrToAddr      = 69,
   LLVMIntToPtr       = 40,
   LLVMBitCast        = 41,
   LLVMAddrSpaceCast  = 60,
diff --git a/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h b/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
index a440b6484e94d..d6e194fb21b88 100644
--- a/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
+++ b/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
@@ -732,6 +732,13 @@ class TargetTransformInfoImplBase {
         return 0;
       break;
     }
+    case Instruction::PtrToAddr: {
+      unsigned DstSize = Dst->getScalarSizeInBits();
+      if (DL.isLegalInteger(DstSize) &&
+          DstSize >= DL.getPointerAddressSizeInBits(Src))
+        return 0;
+      break;
+    }
     case Instruction::PtrToInt: {
       unsigned DstSize = Dst->getScalarSizeInBits();
       if (DL.isLegalInteger(DstSize) &&
@@ -1441,6 +1448,7 @@ class TargetTransformInfoImplCRTPBase : public TargetTransformInfoImplBase {
                                                Op2Info, Operands, I);
     }
     case Instruction::IntToPtr:
+    case Instruction::PtrToAddr:
     case Instruction::PtrToInt:
     case Instruction::SIToFP:
     case Instruction::UIToFP:
diff --git a/llvm/include/llvm/AsmParser/LLToken.h b/llvm/include/llvm/AsmParser/LLToken.h
index c7e4bdf3ff811..13ff9e773dfbe 100644
--- a/llvm/include/llvm/AsmParser/LLToken.h
+++ b/llvm/include/llvm/AsmParser/LLToken.h
@@ -318,6 +318,7 @@ enum Kind {
   kw_fptoui,
   kw_fptosi,
   kw_inttoptr,
+  kw_ptrtoaddr,
   kw_ptrtoint,
   kw_bitcast,
   kw_addrspacecast,
diff --git a/llvm/include/llvm/Bitcode/LLVMBitCodes.h b/llvm/include/llvm/Bitcode/LLVMBitCodes.h
index b362a88963f6c..e00c9eeb2fd0e 100644
--- a/llvm/include/llvm/Bitcode/LLVMBitCodes.h
+++ b/llvm/include/llvm/Bitcode/LLVMBitCodes.h
@@ -456,7 +456,8 @@ enum CastOpcodes {
   CAST_PTRTOINT = 9,
   CAST_INTTOPTR = 10,
   CAST_BITCAST = 11,
-  CAST_ADDRSPACECAST = 12
+  CAST_ADDRSPACECAST = 12,
+  CAST_PTRTOADDR = 13,
 };
 
 /// UnaryOpcodes - These are values used in the bitcode files to encode which
diff --git a/llvm/include/llvm/CodeGen/GlobalISel/IRTranslator.h b/llvm/include/llvm/CodeGen/GlobalISel/IRTranslator.h
index 6fd05c8fddd5f..fcdc733d92c7f 100644
--- a/llvm/include/llvm/CodeGen/GlobalISel/IRTranslator.h
+++ b/llvm/include/llvm/CodeGen/GlobalISel/IRTranslator.h
@@ -486,6 +486,9 @@ class IRTranslator : public MachineFunctionPass {
   bool translatePtrToInt(const User &U, MachineIRBuilder &MIRBuilder) {
     return translateCast(TargetOpcode::G_PTRTOINT, U, MIRBuilder);
   }
+  bool translatePtrToAddr(const User &U, MachineIRBuilder &MIRBuilder) {
+    return translatePtrToInt(U, MIRBuilder);
+  }
   bool translateTrunc(const User &U, MachineIRBuilder &MIRBuilder) {
     return translateCast(TargetOpcode::G_TRUNC, U, MIRBuilder);
   }
diff --git a/llvm/include/llvm/IR/InstVisitor.h b/llvm/include/llvm/IR/InstVisitor.h
index b4eb729c7ce38..181e490b66a85 100644
--- a/llvm/include/llvm/IR/InstVisitor.h
+++ b/llvm/include/llvm/IR/InstVisitor.h
@@ -183,6 +183,7 @@ class InstVisitor {
   RetTy visitUIToFPInst(UIToFPInst &I)            { DELEGATE(CastInst);}
   RetTy visitSIToFPInst(SIToFPInst &I)            { DELEGATE(CastInst);}
   RetTy visitPtrToIntInst(PtrToIntInst &I)        { DELEGATE(CastInst);}
+  RetTy visitPtrToAddrInst(PtrToAddrInst &I)      { DELEGATE(PtrToIntInst);}
   RetTy visitIntToPtrInst(IntToPtrInst &I)        { DELEGATE(CastInst);}
   RetTy visitBitCastInst(BitCastInst &I)          { DELEGATE(CastInst);}
   RetTy visitAddrSpaceCastInst(AddrSpaceCastInst &I) { DELEGATE(CastInst);}
diff --git a/llvm/include/llvm/IR/Instruction.def b/llvm/include/llvm/IR/Instruction.def
index a5ad92f58f94e..face6a93ec7d5 100644
--- a/llvm/include/llvm/IR/Instruction.def
+++ b/llvm/include/llvm/IR/Instruction.def
@@ -190,35 +190,36 @@ HANDLE_CAST_INST(43, UIToFP  , UIToFPInst  )  // UInt -> floating point
 HANDLE_CAST_INST(44, SIToFP  , SIToFPInst  )  // SInt -> floating point
 HANDLE_CAST_INST(45, FPTrunc , FPTruncInst )  // Truncate floating point
 HANDLE_CAST_INST(46, FPExt   , FPExtInst   )  // Extend floating point
-HANDLE_CAST_INST(47, PtrToInt, PtrToIntInst)  // Pointer -> Integer
-HANDLE_CAST_INST(48, IntToPtr, IntToPtrInst)  // Integer -> Pointer
-HANDLE_CAST_INST(49, BitCast , BitCastInst )  // Type cast
-HANDLE_CAST_INST(50, AddrSpaceCast, AddrSpaceCastInst)  // addrspace cast
-  LAST_CAST_INST(50)
+HANDLE_CAST_INST(47, PtrToInt, PtrToIntInst)  // Pointer -> Integer (bitcast)
+HANDLE_CAST_INST(48, PtrToAddr, PtrToAddrInst) // Pointer -> Address
+HANDLE_CAST_INST(49, IntToPtr, IntToPtrInst)  // Integer -> Pointer
+HANDLE_CAST_INST(50, BitCast , BitCastInst )  // Type cast
+HANDLE_CAST_INST(51, AddrSpaceCast, AddrSpaceCastInst)  // addrspace cast
+  LAST_CAST_INST(51)
 
- FIRST_FUNCLETPAD_INST(51)
-HANDLE_FUNCLETPAD_INST(51, CleanupPad, CleanupPadInst)
-HANDLE_FUNCLETPAD_INST(52, CatchPad  , CatchPadInst)
-  LAST_FUNCLETPAD_INST(52)
+ FIRST_FUNCLETPAD_INST(52)
+HANDLE_FUNCLETPAD_INST(52, CleanupPad, CleanupPadInst)
+HANDLE_FUNCLETPAD_INST(53, CatchPad  , CatchPadInst)
+  LAST_FUNCLETPAD_INST(53)
 
 // Other operators...
- FIRST_OTHER_INST(53)
-HANDLE_OTHER_INST(53, ICmp   , ICmpInst   )  // Integer comparison instruction
-HANDLE_OTHER_INST(54, FCmp   , FCmpInst   )  // Floating point comparison instr.
-HANDLE_OTHER_INST(55, PHI    , PHINode    )  // PHI node instruction
-HANDLE_OTHER_INST(56, Call   , CallInst   )  // Call a function
-HANDLE_OTHER_INST(57, Select , SelectInst )  // select instruction
-HANDLE_USER_INST (58, UserOp1, Instruction)  // May be used internally in a pass
-HANDLE_USER_INST (59, UserOp2, Instruction)  // Internal to passes only
-HANDLE_OTHER_INST(60, VAArg  , VAArgInst  )  // vaarg instruction
-HANDLE_OTHER_INST(61, ExtractElement, ExtractElementInst)// extract from vector
-HANDLE_OTHER_INST(62, InsertElement, InsertElementInst)  // insert into vector
-HANDLE_OTHER_INST(63, ShuffleVector, ShuffleVectorInst)  // shuffle two vectors.
-HANDLE_OTHER_INST(64, ExtractValue, ExtractValueInst)// extract from aggregate
-HANDLE_OTHER_INST(65, InsertValue, InsertValueInst)  // insert into aggregate
-HANDLE_OTHER_INST(66, LandingPad, LandingPadInst)  // Landing pad instruction.
-HANDLE_OTHER_INST(67, Freeze, FreezeInst) // Freeze instruction.
-  LAST_OTHER_INST(67)
+ FIRST_OTHER_INST(54)
+HANDLE_OTHER_INST(54, ICmp   , ICmpInst   )  // Integer comparison instruction
+HANDLE_OTHER_INST(55, FCmp   , FCmpInst   )  // Floating point comparison instr.
+HANDLE_OTHER_INST(56, PHI    , PHINode    )  // PHI node instruction
+HANDLE_OTHER_INST(57, Call   , CallInst   )  // Call a function
+HANDLE_OTHER_INST(58, Select , SelectInst )  // select instruction
+HANDLE_USER_INST (59, UserOp1, Instruction)  // May be used internally in a pass
+HANDLE_USER_INST (60, UserOp2, Instruction)  // Internal to passes only
+HANDLE_OTHER_INST(61, VAArg  , VAArgInst  )  // vaarg instruction
+HANDLE_OTHER_INST(62, ExtractElement, ExtractElementInst)// extract from vector
+HANDLE_OTHER_INST(63, InsertElement, InsertElementInst)  // insert into vector
+HANDLE_OTHER_INST(64, ShuffleVector, ShuffleVectorInst)  // shuffle two vectors.
+HANDLE_OTHER_INST(65, ExtractValue, ExtractValueInst)// extract from aggregate
+HANDLE_OTHER_INST(66, InsertValue, InsertValueInst)  // insert into aggregate
+HANDLE_OTHER_INST(67, LandingPad, LandingPadInst)  // Landing pad instruction.
+HANDLE_OTHER_INST(68, Freeze, FreezeInst) // Freeze instruction.
+  LAST_OTHER_INST(68)
 
 #undef  FIRST_TERM_INST
 #undef HANDLE_TERM_INST
diff --git a/llvm/include/llvm/IR/Instructions.h b/llvm/include/llvm/IR/Instructions.h
index c164f76eb335b..81f93aa1eb77d 100644
--- a/llvm/include/llvm/IR/Instructions.h
+++ b/llvm/include/llvm/IR/Instructions.h
@@ -4881,6 +4881,9 @@ class PtrToIntInst : public CastInst {
   /// Clone an identical PtrToIntInst.
   PtrToIntInst *cloneImpl() const;
 
+  PtrToIntInst(unsigned Op, Value *S, Type *Ty, const Twine &NameStr,
+               InsertPosition InsertBefore);
+
 public:
   /// Constructor with insert-before-instruction semantics
   PtrToIntInst(Value *S,                  ///< The value to be converted
@@ -4904,13 +4907,43 @@ class PtrToIntInst : public CastInst {
 
   // Methods for support type inquiry through isa, cast, and dyn_cast:
   static bool classof(const Instruction *I) {
-    return I->getOpcode() == PtrToInt;
+    return I->getOpcode() == PtrToInt || I->getOpcode() == PtrToAddr;
+  }
+  static bool classof(const Value *V) {
+    return isa<Instruction>(V) && classof(cast<Instruction>(V));
+  }
+};
+
+/// This class represents a cast from a pointer to an address (non-capturing
+/// ptrtoint). Inherits from PtrToIntInst since it is a less restrictive version
+/// of ptrtoint, so treating it as ptrtoint is conservatively correct.
+class PtrToAddrInst : public PtrToIntInst {
+protected:
+  // Note: Instruction needs to be a friend here to call cloneImpl.
+  friend class Instruction;
+
+  /// Clone an identical PtrToIntInst.
+  PtrToAddrInst *cloneImpl() const;
+
+public:
+  /// Constructor with insert-before-instruction semantics
+  PtrToAddrInst(Value *S,                  ///< The value to be converted
+                Type *Ty,                  ///< The type to convert to
+                const Twine &NameStr = "", ///< A name for the new instruction
+                InsertPosition InsertBefore =
+                    nullptr ///< Where to insert the new instruction
+  );
+
+  // Methods for support type inquiry through isa, cast, and dyn_cast:
+  static bool classof(const Instruction *I) {
+    return I->getOpcode() == PtrToAddr;
   }
   static bool classof(const Value *V) {
     return isa<Instruction>(V) && classof(cast<Instruction>(V));
   }
 };
 
+
 //===----------------------------------------------------------------------===//
 //                             BitCastInst Class
 //===----------------------------------------------------------------------===//
diff --git a/llvm/include/llvm/SandboxIR/Instruction.h b/llvm/include/llvm/SandboxIR/Instruction.h
index ce5a2cbec85bd..14ca7fb20c787 100644
--- a/llvm/include/llvm/SandboxIR/Instruction.h
+++ b/llvm/include/llvm/SandboxIR/Instruction.h
@@ -2264,6 +2264,7 @@ class CastInst : public UnaryInstruction {
       return Opcode::FPToSI;
     case llvm::Instruction::FPExt:
       return Opcode::FPExt;
+    case llvm::Instruction::PtrToAddr:
     case llvm::Instruction::PtrToInt:
       return Opcode::PtrToInt;
     case llvm::Instruction::IntToPtr:
diff --git a/llvm/lib/AsmParser/LLLexer.cpp b/llvm/lib/AsmParser/LLLexer.cpp
index ce813e1d7b1c4..ce75a75905b21 100644
--- a/llvm/lib/AsmParser/LLLexer.cpp
+++ b/llvm/lib/AsmParser/LLLexer.cpp
@@ -927,6 +927,7 @@ lltok::Kind LLLexer::LexIdentifier() {
   INSTKEYWORD(fptoui,      FPToUI);
   INSTKEYWORD(fptosi,      FPToSI);
   INSTKEYWORD(inttoptr,    IntToPtr);
+  INSTKEYWORD(ptrtoaddr,   PtrToAddr);
   INSTKEYWORD(ptrtoint,    PtrToInt);
   INSTKEYWORD(bitcast,     BitCast);
   INSTKEYWORD(addrspacecast, AddrSpaceCast);
diff --git a/llvm/lib/AsmParser/LLParser.cpp b/llvm/lib/AsmParser/LLParser.cpp
index 2dfc8254d5885..8b8ee00e3c46b 100644
--- a/llvm/lib/AsmParser/LLParser.cpp
+++ b/llvm/lib/AsmParser/LLParser.cpp
@@ -4275,6 +4275,7 @@ bool LLParser::parseValID(ValID &ID, PerFunctionState *PFS, Type *ExpectedTy) {
   case lltok::kw_bitcast:
   case lltok::kw_addrspacecast:
   case lltok::kw_inttoptr:
+  // ptrtoaddr not supported in constant exprs (yet?).
   case lltok::kw_ptrtoint: {
     unsigned Opc = Lex.getUIntVal();
     Type *DestTy = nullptr;
@@ -7237,6 +7238,7 @@ int LLParser::parseInstruction(Instruction *&Inst, BasicBlock *BB,
   case lltok::kw_fptoui:
   case lltok::kw_fptosi:
   case lltok::kw_inttoptr:
+  case lltok::kw_ptrtoaddr:
   case lltok::kw_ptrtoint:
     return parseCast(Inst, PFS, KeywordVal);
   case lltok::kw_fptrunc:
diff --git a/llvm/lib/Bitcode/Reader/BitcodeReader.cpp b/llvm/lib/Bitcode/Reader/BitcodeReader.cpp
index a7fbb0c74cb1e..c2cee9b73566b 100644
--- a/llvm/lib/Bitcode/Reader/BitcodeReader.cpp
+++ b/llvm/lib/Bitcode/Reader/BitcodeReader.cpp
@@ -1278,6 +1278,7 @@ static int getDecodedCastOpcode(unsigned Val) {
   case bitc::CAST_SITOFP  : return Instruction::SIToFP;
   case bitc::CAST_FPTRUNC : return Instruction::FPTrunc;
   case bitc::CAST_FPEXT   : return Instruction::FPExt;
+  case bitc::CAST_PTRTOADDR: return Instruction::PtrToAddr;
   case bitc::CAST_PTRTOINT: return Instruction::PtrToInt;
   case bitc::CAST_INTTOPTR: return Instruction::IntToPtr;
   case bitc::CAST_BITCAST : return Instruction::BitCast;
diff --git a/llvm/lib/Bitcode/Writer/BitcodeWriter.cpp b/llvm/lib/Bitcode/Writer/BitcodeWriter.cpp
index ef397879a132c..4f2428f0ac629 100644
--- a/llvm/lib/Bitcode/Writer/BitcodeWriter.cpp
+++ b/llvm/lib/Bitcode/Writer/BitcodeWriter.cpp
@@ -640,6 +640,7 @@ static unsigned getEncodedCastOpcode(unsigned Opcode) {
   case Instruction::SIToFP  : return bitc::CAST_SITOFP;
   case Instruction::FPTrunc : return bitc::CAST_FPTRUNC;
   case Instruction::FPExt   : return bitc::CAST_FPEXT;
+  case Instruction::PtrToAddr: return bitc::CAST_PTRTOADDR;
   case Instruction::PtrToInt: return bitc::CAST_PTRTOINT;
   case Instruction::IntToPtr: return bitc::CAST_INTTOPTR;
   case Instruction::BitCast : return bitc::CAST_BITCAST;
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
index 9d138d364bad7..bf3297e9b5961 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
@@ -3877,6 +3877,10 @@ void SelectionDAGBuilder::visitSIToFP(const User &I) {
   setValue(&I, DAG.getNode(ISD::SINT_TO_FP, getCurSDLoc(), DestVT, N));
 }
 
+void SelectionDAGBuilder::visitPtrToAddr(const User &I) {
+  visitPtrToInt(I);
+}
+
 void SelectionDAGBuilder::visitPtrToInt(const User &I) {
   // What to do depends on the size of the integer and the size of the pointer.
   // We can either truncate, zero extend, or no-op, accordingly.
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h
index 35c15bc269d4b..108c0141495e3 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h
@@ -575,6 +575,7 @@ class SelectionDAGBuilder {
   void visitFPToSI(const User &I);
   void visitUIToFP(const User &I);
   void visitSIToFP(const User &I);
+  void visitPtrToAddr(const User &I);
   void visitPtrToInt(const User &I);
   void visitIntToPtr(const User &I);
   void visitBitCast(const User &I);
diff --git a/llvm/lib/CodeGen/TargetLoweringBase.cpp b/llvm/lib/CodeGen/TargetLoweringBase.cpp
index c85f0c71ef25f..3d937c4127b82 100644
--- a/llvm/lib/CodeGen/TargetLoweringBase.cpp
+++ b/llvm/lib/CodeGen/TargetLoweringBase.cpp
@@ -1835,6 +1835,7 @@ int TargetLoweringBase::InstructionOpcodeToISD(unsigned Opcode) const {
   case SIToFP:         return ISD::SINT_TO_FP;
   case FPTrunc:        return ISD::FP_ROUND;
   case FPExt:          return ISD::FP_EXTEND;
+  case PtrToAddr:      return ISD::BITCAST;
   case PtrToInt:       return ISD::BITCAST;
   case IntToPtr:       return ISD::BITCAST;
   case BitCast:        return ISD::BITCAST;
diff --git a/llvm/lib/IR/Instruction.cpp b/llvm/lib/IR/Instruction.cpp
index 54e5e6d53e791..27392e4ba6122 100644
--- a/llvm/lib/IR/Instruction.cpp
+++ b/llvm/lib/IR/Instruction.cpp
@@ -818,6 +818,7 @@ const char *Instruction::getOpcodeName(unsigned OpCode) {
   case UIToFP:        return "uitofp";
   case SIToFP:        return "sitofp";
   case IntToPtr:      return "inttoptr";
+  case PtrToAddr:     return "ptrtoaddr";
   case PtrToInt:      return "ptrtoint";
   case BitCast:       return "bitcast";
   case AddrSpaceCast: return "addrspacecast";
diff --git a/llvm/lib/IR/Instructions.cpp b/llvm/lib/IR/Instructions.cpp
index f404e11b9c0f0..99dd5df3e632a 100644
--- a/llvm/lib/IR/Instructions.cpp
+++ b/llvm/lib/IR/Instructions.cpp
@@ -3046,6 +3046,7 @@ CastInst *CastInst::Create(Instruction::CastOps op, Value *S, Type *Ty,
   case SIToFP:        return new SIToFPInst        (S, Ty, Name, InsertBefore);
   case FPToUI:        return new FPToUIInst        (S, Ty, Name, InsertBefore);
   case FPToSI:        return new FPToSIInst        (S, Ty, Name, InsertBefore);
+  case PtrToAddr:     return new PtrToAddrInst     (S, Ty, Name, InsertBefore);
   case PtrToInt:      return new PtrToIntInst      (S, Ty, Name, InsertBefore);
   case IntToPtr:      return new IntToPtrInst      (S, Ty, Name, InsertBefore);
   case BitCast:
@@ -3347,6 +3348,7 @@ CastInst::castIsValid(Instruction::CastOps op, Type *SrcTy, Type *DstTy) {
   case Instruction::FPToSI:
     return SrcTy->isFPOrFPVectorTy() && DstTy->isIntOrIntVectorTy() &&
            SrcEC == DstEC;
+  case Instruction::PtrToAddr:
   case Instruction::PtrToInt:
     if (SrcEC != DstEC)
       return false;
@@ -3454,12 +3456,20 @@ FPToSIInst::FPToSIInst(Value *S, Type *Ty, const Twine &Name,
   assert(castIsValid(getOpcode(), S, Ty) && "Illegal FPToSI");
 }
 
-PtrToIntInst::PtrToIntInst(Value *S, Type *Ty, const Twine &Name,
+PtrToIntInst::PtrToIntInst(unsigned Op, Value *S, Type *Ty, const Twine &Name,
                            InsertPosition InsertBefore)
-    : CastInst(Ty, PtrToInt, S, Name, InsertBefore) {
+    : CastInst(Ty, Op, S, Name, InsertBefore) {
   assert(castIsValid(getOpcode(), S, Ty) && "Illegal PtrToInt");
 }
 
+PtrToIntInst::PtrToIntInst(Value *S, Type *Ty, const Twine &Name,
+                           InsertPosition InsertBefore)
+    : PtrToIntInst(PtrToInt, ...
[truncated]

Copy link

github-actions bot commented May 10, 2025

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:
git-clang-format --diff HEAD~1 HEAD --extensions h,cpp -- llvm/include/llvm-c/Core.h llvm/include/llvm/Analysis/TargetTransformInfoImpl.h llvm/include/llvm/AsmParser/LLToken.h llvm/include/llvm/Bitcode/LLVMBitCodes.h llvm/include/llvm/CodeGen/GlobalISel/IRTranslator.h llvm/include/llvm/IR/Constants.h llvm/include/llvm/IR/IRBuilder.h llvm/include/llvm/IR/InstVisitor.h llvm/include/llvm/IR/Instructions.h llvm/include/llvm/IR/Operator.h llvm/include/llvm/SandboxIR/Instruction.h llvm/lib/Analysis/ConstantFolding.cpp llvm/lib/AsmParser/LLLexer.cpp llvm/lib/AsmParser/LLParser.cpp llvm/lib/Bitcode/Reader/BitcodeReader.cpp llvm/lib/Bitcode/Writer/BitcodeWriter.cpp llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h llvm/lib/CodeGen/TargetLoweringBase.cpp llvm/lib/IR/ConstantFold.cpp llvm/lib/IR/ConstantRange.cpp llvm/lib/IR/Constants.cpp llvm/lib/IR/Globals.cpp llvm/lib/IR/Instruction.cpp llvm/lib/IR/Instructions.cpp llvm/lib/IR/Verifier.cpp llvm/lib/SandboxIR/Context.cpp llvm/lib/SandboxIR/Instruction.cpp llvm/lib/Transforms/Scalar/LowerMatrixIntrinsics.cpp llvm/lib/Transforms/Vectorize/SandboxVectorizer/Legality.cpp llvm/unittests/Analysis/IR2VecTest.cpp
View the diff from clang-format here.
diff --git a/llvm/include/llvm-c/Core.h b/llvm/include/llvm-c/Core.h
index 9879d0d3a..f8a531e32 100644
--- a/llvm/include/llvm-c/Core.h
+++ b/llvm/include/llvm-c/Core.h
@@ -60,91 +60,91 @@ LLVM_C_EXTERN_C_BEGIN
 /// to reorder them.
 typedef enum {
   /* Terminator Instructions */
-  LLVMRet            = 1,
-  LLVMBr             = 2,
-  LLVMSwitch         = 3,
-  LLVMIndirectBr     = 4,
-  LLVMInvoke         = 5,
+  LLVMRet = 1,
+  LLVMBr = 2,
+  LLVMSwitch = 3,
+  LLVMIndirectBr = 4,
+  LLVMInvoke = 5,
   /* removed 6 due to API changes */
-  LLVMUnreachable    = 7,
-  LLVMCallBr         = 67,
+  LLVMUnreachable = 7,
+  LLVMCallBr = 67,
 
   /* Standard Unary Operators */
-  LLVMFNeg           = 66,
+  LLVMFNeg = 66,
 
   /* Standard Binary Operators */
-  LLVMAdd            = 8,
-  LLVMFAdd           = 9,
-  LLVMSub            = 10,
-  LLVMFSub           = 11,
-  LLVMMul            = 12,
-  LLVMFMul           = 13,
-  LLVMUDiv           = 14,
-  LLVMSDiv           = 15,
-  LLVMFDiv           = 16,
-  LLVMURem           = 17,
-  LLVMSRem           = 18,
-  LLVMFRem           = 19,
+  LLVMAdd = 8,
+  LLVMFAdd = 9,
+  LLVMSub = 10,
+  LLVMFSub = 11,
+  LLVMMul = 12,
+  LLVMFMul = 13,
+  LLVMUDiv = 14,
+  LLVMSDiv = 15,
+  LLVMFDiv = 16,
+  LLVMURem = 17,
+  LLVMSRem = 18,
+  LLVMFRem = 19,
 
   /* Logical Operators */
-  LLVMShl            = 20,
-  LLVMLShr           = 21,
-  LLVMAShr           = 22,
-  LLVMAnd            = 23,
-  LLVMOr             = 24,
-  LLVMXor            = 25,
+  LLVMShl = 20,
+  LLVMLShr = 21,
+  LLVMAShr = 22,
+  LLVMAnd = 23,
+  LLVMOr = 24,
+  LLVMXor = 25,
 
   /* Memory Operators */
-  LLVMAlloca         = 26,
-  LLVMLoad           = 27,
-  LLVMStore          = 28,
-  LLVMGetElementPtr  = 29,
+  LLVMAlloca = 26,
+  LLVMLoad = 27,
+  LLVMStore = 28,
+  LLVMGetElementPtr = 29,
 
   /* Cast Operators */
-  LLVMTrunc          = 30,
-  LLVMZExt           = 31,
-  LLVMSExt           = 32,
-  LLVMFPToUI         = 33,
-  LLVMFPToSI         = 34,
-  LLVMUIToFP         = 35,
-  LLVMSIToFP         = 36,
-  LLVMFPTrunc        = 37,
-  LLVMFPExt          = 38,
-  LLVMPtrToInt       = 39,
-  LLVMPtrToAddr      = 69,
-  LLVMIntToPtr       = 40,
-  LLVMBitCast        = 41,
-  LLVMAddrSpaceCast  = 60,
+  LLVMTrunc = 30,
+  LLVMZExt = 31,
+  LLVMSExt = 32,
+  LLVMFPToUI = 33,
+  LLVMFPToSI = 34,
+  LLVMUIToFP = 35,
+  LLVMSIToFP = 36,
+  LLVMFPTrunc = 37,
+  LLVMFPExt = 38,
+  LLVMPtrToInt = 39,
+  LLVMPtrToAddr = 69,
+  LLVMIntToPtr = 40,
+  LLVMBitCast = 41,
+  LLVMAddrSpaceCast = 60,
 
   /* Other Operators */
-  LLVMICmp           = 42,
-  LLVMFCmp           = 43,
-  LLVMPHI            = 44,
-  LLVMCall           = 45,
-  LLVMSelect         = 46,
-  LLVMUserOp1        = 47,
-  LLVMUserOp2        = 48,
-  LLVMVAArg          = 49,
+  LLVMICmp = 42,
+  LLVMFCmp = 43,
+  LLVMPHI = 44,
+  LLVMCall = 45,
+  LLVMSelect = 46,
+  LLVMUserOp1 = 47,
+  LLVMUserOp2 = 48,
+  LLVMVAArg = 49,
   LLVMExtractElement = 50,
-  LLVMInsertElement  = 51,
-  LLVMShuffleVector  = 52,
-  LLVMExtractValue   = 53,
-  LLVMInsertValue    = 54,
-  LLVMFreeze         = 68,
+  LLVMInsertElement = 51,
+  LLVMShuffleVector = 52,
+  LLVMExtractValue = 53,
+  LLVMInsertValue = 54,
+  LLVMFreeze = 68,
 
   /* Atomic operators */
-  LLVMFence          = 55,
-  LLVMAtomicCmpXchg  = 56,
-  LLVMAtomicRMW      = 57,
+  LLVMFence = 55,
+  LLVMAtomicCmpXchg = 56,
+  LLVMAtomicRMW = 57,
 
   /* Exception Handling Operators */
-  LLVMResume         = 58,
-  LLVMLandingPad     = 59,
-  LLVMCleanupRet     = 61,
-  LLVMCatchRet       = 62,
-  LLVMCatchPad       = 63,
-  LLVMCleanupPad     = 64,
-  LLVMCatchSwitch    = 65
+  LLVMResume = 58,
+  LLVMLandingPad = 59,
+  LLVMCleanupRet = 61,
+  LLVMCatchRet = 62,
+  LLVMCatchPad = 63,
+  LLVMCleanupPad = 64,
+  LLVMCatchSwitch = 65
 } LLVMOpcode;
 
 typedef enum {
diff --git a/llvm/include/llvm/IR/InstVisitor.h b/llvm/include/llvm/IR/InstVisitor.h
index 8e4dc647e..53942a0be 100644
--- a/llvm/include/llvm/IR/InstVisitor.h
+++ b/llvm/include/llvm/IR/InstVisitor.h
@@ -183,7 +183,7 @@ public:
   RetTy visitUIToFPInst(UIToFPInst &I)            { DELEGATE(CastInst);}
   RetTy visitSIToFPInst(SIToFPInst &I)            { DELEGATE(CastInst);}
   RetTy visitPtrToIntInst(PtrToIntInst &I)        { DELEGATE(CastInst);}
-  RetTy visitPtrToAddrInst(PtrToAddrInst &I)      { DELEGATE(CastInst);}
+  RetTy visitPtrToAddrInst(PtrToAddrInst &I) { DELEGATE(CastInst); }
   RetTy visitIntToPtrInst(IntToPtrInst &I)        { DELEGATE(CastInst);}
   RetTy visitBitCastInst(BitCastInst &I)          { DELEGATE(CastInst);}
   RetTy visitAddrSpaceCastInst(AddrSpaceCastInst &I) { DELEGATE(CastInst);}
diff --git a/llvm/lib/AsmParser/LLLexer.cpp b/llvm/lib/AsmParser/LLLexer.cpp
index 3d5bd6155..0039a41a5 100644
--- a/llvm/lib/AsmParser/LLLexer.cpp
+++ b/llvm/lib/AsmParser/LLLexer.cpp
@@ -928,7 +928,7 @@ lltok::Kind LLLexer::LexIdentifier() {
   INSTKEYWORD(fptoui,      FPToUI);
   INSTKEYWORD(fptosi,      FPToSI);
   INSTKEYWORD(inttoptr,    IntToPtr);
-  INSTKEYWORD(ptrtoaddr,   PtrToAddr);
+  INSTKEYWORD(ptrtoaddr, PtrToAddr);
   INSTKEYWORD(ptrtoint,    PtrToInt);
   INSTKEYWORD(bitcast,     BitCast);
   INSTKEYWORD(addrspacecast, AddrSpaceCast);
diff --git a/llvm/lib/Bitcode/Reader/BitcodeReader.cpp b/llvm/lib/Bitcode/Reader/BitcodeReader.cpp
index 22a0d0ffd..487b47f2c 100644
--- a/llvm/lib/Bitcode/Reader/BitcodeReader.cpp
+++ b/llvm/lib/Bitcode/Reader/BitcodeReader.cpp
@@ -1283,7 +1283,8 @@ static int getDecodedCastOpcode(unsigned Val) {
   case bitc::CAST_SITOFP  : return Instruction::SIToFP;
   case bitc::CAST_FPTRUNC : return Instruction::FPTrunc;
   case bitc::CAST_FPEXT   : return Instruction::FPExt;
-  case bitc::CAST_PTRTOADDR: return Instruction::PtrToAddr;
+  case bitc::CAST_PTRTOADDR:
+    return Instruction::PtrToAddr;
   case bitc::CAST_PTRTOINT: return Instruction::PtrToInt;
   case bitc::CAST_INTTOPTR: return Instruction::IntToPtr;
   case bitc::CAST_BITCAST : return Instruction::BitCast;
diff --git a/llvm/lib/Bitcode/Writer/BitcodeWriter.cpp b/llvm/lib/Bitcode/Writer/BitcodeWriter.cpp
index a3f825408..c72abad7a 100644
--- a/llvm/lib/Bitcode/Writer/BitcodeWriter.cpp
+++ b/llvm/lib/Bitcode/Writer/BitcodeWriter.cpp
@@ -647,7 +647,8 @@ static unsigned getEncodedCastOpcode(unsigned Opcode) {
   case Instruction::SIToFP  : return bitc::CAST_SITOFP;
   case Instruction::FPTrunc : return bitc::CAST_FPTRUNC;
   case Instruction::FPExt   : return bitc::CAST_FPEXT;
-  case Instruction::PtrToAddr: return bitc::CAST_PTRTOADDR;
+  case Instruction::PtrToAddr:
+    return bitc::CAST_PTRTOADDR;
   case Instruction::PtrToInt: return bitc::CAST_PTRTOINT;
   case Instruction::IntToPtr: return bitc::CAST_INTTOPTR;
   case Instruction::BitCast : return bitc::CAST_BITCAST;
diff --git a/llvm/lib/CodeGen/TargetLoweringBase.cpp b/llvm/lib/CodeGen/TargetLoweringBase.cpp
index d80a229b2..20d226932 100644
--- a/llvm/lib/CodeGen/TargetLoweringBase.cpp
+++ b/llvm/lib/CodeGen/TargetLoweringBase.cpp
@@ -1893,7 +1893,8 @@ int TargetLoweringBase::InstructionOpcodeToISD(unsigned Opcode) const {
   case SIToFP:         return ISD::SINT_TO_FP;
   case FPTrunc:        return ISD::FP_ROUND;
   case FPExt:          return ISD::FP_EXTEND;
-  case PtrToAddr:      return ISD::BITCAST;
+  case PtrToAddr:
+    return ISD::BITCAST;
   case PtrToInt:       return ISD::BITCAST;
   case IntToPtr:       return ISD::BITCAST;
   case BitCast:        return ISD::BITCAST;
diff --git a/llvm/lib/IR/Instruction.cpp b/llvm/lib/IR/Instruction.cpp
index 4540268e9..6b343f32e 100644
--- a/llvm/lib/IR/Instruction.cpp
+++ b/llvm/lib/IR/Instruction.cpp
@@ -817,7 +817,8 @@ const char *Instruction::getOpcodeName(unsigned OpCode) {
   case UIToFP:        return "uitofp";
   case SIToFP:        return "sitofp";
   case IntToPtr:      return "inttoptr";
-  case PtrToAddr:     return "ptrtoaddr";
+  case PtrToAddr:
+    return "ptrtoaddr";
   case PtrToInt:      return "ptrtoint";
   case BitCast:       return "bitcast";
   case AddrSpaceCast: return "addrspacecast";
diff --git a/llvm/lib/IR/Instructions.cpp b/llvm/lib/IR/Instructions.cpp
index a1751c0ee..eb6a4d463 100644
--- a/llvm/lib/IR/Instructions.cpp
+++ b/llvm/lib/IR/Instructions.cpp
@@ -3050,7 +3050,8 @@ CastInst *CastInst::Create(Instruction::CastOps op, Value *S, Type *Ty,
   case SIToFP:        return new SIToFPInst        (S, Ty, Name, InsertBefore);
   case FPToUI:        return new FPToUIInst        (S, Ty, Name, InsertBefore);
   case FPToSI:        return new FPToSIInst        (S, Ty, Name, InsertBefore);
-  case PtrToAddr:     return new PtrToAddrInst     (S, Ty, Name, InsertBefore);
+  case PtrToAddr:
+    return new PtrToAddrInst(S, Ty, Name, InsertBefore);
   case PtrToInt:      return new PtrToIntInst      (S, Ty, Name, InsertBefore);
   case IntToPtr:      return new IntToPtrInst      (S, Ty, Name, InsertBefore);
   case BitCast:

Created using spr 1.3.6-beta.1
Created using spr 1.3.6-beta.1

[skip ci]
Created using spr 1.3.6-beta.1
Created using spr 1.3.6-beta.1

[skip ci]
Created using spr 1.3.6-beta.1
Created using spr 1.3.6-beta.1

[skip ci]
Created using spr 1.3.6-beta.1
Created using spr 1.3.6-beta.1

[skip ci]
Created using spr 1.3.6-beta.1
Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing test/Bitcode compatibility tests

Created using spr 1.3.6-beta.1

[skip ci]
Created using spr 1.3.6-beta.1
Created using spr 1.3.6-beta.1

[skip ci]
Created using spr 1.3.6-beta.1
Copy link
Contributor

@krzysz00 krzysz00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the RFC's at consensus and that things are in a decent state.

One comment I have is that the documentation mentions vectors of pointers and I don't see any tests for that.

But that minor issue aside, I'd lay ... one last call for comments before this gets out of limbo? Let's give it until next Wednesday, PDT afternoon?

kparzysz and others added 2 commits July 21, 2025 09:36
Created using spr 1.3.6-beta.1

[skip ci]
Created using spr 1.3.6-beta.1
@arichardson
Copy link
Member Author

Now that 21 has branched, I plan to submit this by end of week unless there are any new comments.

@krzysz00
Copy link
Contributor

I'd like to re-raise my comment about a lack of tests for vectors of pointers (unless I missed some new ones)

@arichardson
Copy link
Member Author

I'd like to re-raise my comment about a lack of tests for vectors of pointers (unless I missed some new ones)

Ah sorry I missed that one, will add tests!

@davidchisnall
Copy link
Contributor

If we're going to add this, it would be good to add the other one of the pair—the set-address instruction—at the same time.

The Rust strict provenance model (and, hopefully, C++29) and CHERI both have these as parts of the provenance model: an instruction / function that takes a pointer and an address and propagate the provenance from the pointer but not the address. This makes provenance-based alias analysis simpler, but just adding one of these two operations in isolation doesn't give that benefit.

@nikic
Copy link
Contributor

nikic commented Jul 23, 2025

If we're going to add this, it would be good to add the other one of the pair—the set-address instruction—at the same time.

The Rust strict provenance model (and, hopefully, C++29) and CHERI both have these as parts of the provenance model: an instruction / function that takes a pointer and an address and propagate the provenance from the pointer but not the address. This makes provenance-based alias analysis simpler, but just adding one of these two operations in isolation doesn't give that benefit.

The set address operation can already be encoded by ptradd(p, new_addr - ptrtoaddr(p)). I doubt we're going to add an instruction for that. We might add an intrinsic. But I don't think think this needs to be coupled to the ptrtoaddr introduction.

Created using spr 1.3.6-beta.1

[skip ci]
Created using spr 1.3.6-beta.1
Copy link
Member Author

@arichardson arichardson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated based on @nikic's feedback. Now the type needs to match and constant expressions are supported.

Thanks for all the review - I'll wait another week for any further comments in case I missed something.

arichardson added a commit that referenced this pull request Jul 28, 2025
Instead make the members of Vocabulary public. This was causing test
failures with #139357.

Reviewed By: svkeerthy, boomanaiden154

Pull Request: #150878
Created using spr 1.3.6-beta.1

[skip ci]
Created using spr 1.3.6-beta.1
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Jul 28, 2025
Instead make the members of Vocabulary public. This was causing test
failures with llvm/llvm-project#139357.

Reviewed By: svkeerthy, boomanaiden154

Pull Request: llvm/llvm-project#150878
Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

hekota and others added 2 commits August 7, 2025 16:24
Created using spr 1.3.6-beta.1

[skip ci]
Created using spr 1.3.6-beta.1
@arichardson arichardson changed the base branch from users/arichardson/spr/main.ir-introduce-the-ptrtoaddr-instruction to main August 8, 2025 17:12
@arichardson arichardson merged commit 3a4b351 into main Aug 8, 2025
15 of 17 checks passed
@arichardson arichardson deleted the users/arichardson/spr/ir-introduce-the-ptrtoaddr-instruction branch August 8, 2025 17:12
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Aug 8, 2025
This introduces a new `ptrtoaddr` instruction which is similar to
`ptrtoint` but has two differences:

1) Unlike `ptrtoint`, `ptrtoaddr` does not capture provenance
2) `ptrtoaddr` only extracts (and then extends/truncates) the low
   index-width bits of the pointer

For most architectures, difference 2) does not matter since index (address)
width and pointer representation width are the same, but this does make a
difference for architectures that have pointers that aren't just plain
integer addresses such as AMDGPU fat pointers or CHERI capabilities.

This commit introduces textual and bitcode IR support as well as basic code
generation, but optimization passes do not handle the new instruction yet
so it may result in worse code than using ptrtoint. Follow-up changes will
update capture tracking, etc. for the new instruction.

RFC: https://discourse.llvm.org/t/clarifiying-the-semantics-of-ptrtoint/83987/54

Reviewed By: nikic

Pull Request: llvm/llvm-project#139357
mtrofin added a commit that referenced this pull request Aug 8, 2025
…152813)

We'll remove the size estimator after, this change is to get the `ml-*`
build bots green after the aforementioned PR.

We never used the size estimator again after the initial DQN-based
training. Should we want to again, we now have IR2Vec, which the old
estimator was approximating in functionality.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.