Skip to content

Conversation

@kuilpd
Copy link
Contributor

@kuilpd kuilpd commented Jul 4, 2025

A draft to discuss scalar literal node implementation

@kuilpd kuilpd added the lldb label Jul 4, 2025
@llvmbot
Copy link
Member

llvmbot commented Jul 4, 2025

@llvm/pr-subscribers-lldb

Author: Ilia Kuklin (kuilpd)

Changes

A draft to discuss scalar literal node implementation


Patch is 26.08 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/147064.diff

8 Files Affected:

  • (modified) lldb/include/lldb/ValueObject/DILAST.h (+58)
  • (modified) lldb/include/lldb/ValueObject/DILEval.h (+15)
  • (modified) lldb/include/lldb/ValueObject/DILLexer.h (+1)
  • (modified) lldb/include/lldb/ValueObject/DILParser.h (+3)
  • (modified) lldb/source/ValueObject/DILAST.cpp (+21)
  • (modified) lldb/source/ValueObject/DILEval.cpp (+405)
  • (modified) lldb/source/ValueObject/DILLexer.cpp (+6-4)
  • (modified) lldb/source/ValueObject/DILParser.cpp (+72-2)
diff --git a/lldb/include/lldb/ValueObject/DILAST.h b/lldb/include/lldb/ValueObject/DILAST.h
index 709f0639135f1..a9d244031d55f 100644
--- a/lldb/include/lldb/ValueObject/DILAST.h
+++ b/lldb/include/lldb/ValueObject/DILAST.h
@@ -9,6 +9,7 @@
 #ifndef LLDB_VALUEOBJECT_DILAST_H
 #define LLDB_VALUEOBJECT_DILAST_H
 
+#include "lldb/ValueObject/DILLexer.h"
 #include "lldb/ValueObject/ValueObject.h"
 #include "llvm/Support/Error.h"
 #include <cstdint>
@@ -19,10 +20,12 @@ namespace lldb_private::dil {
 /// The various types DIL AST nodes (used by the DIL parser).
 enum class NodeKind {
   eArraySubscriptNode,
+  eBinaryOpNode,
   eBitExtractionNode,
   eErrorNode,
   eIdentifierNode,
   eMemberOfNode,
+  eScalarLiteralNode,
   eUnaryOpNode,
 };
 
@@ -32,6 +35,14 @@ enum class UnaryOpKind {
   Deref,  // "*"
 };
 
+enum class BinaryOpKind {
+  Add, // "+"
+  Sub, // "-"
+};
+
+/// Translates DIL tokens to BinaryOpKind.
+BinaryOpKind GetBinaryOpKindFromToken(Token::Kind token_kind);
+
 /// Forward declaration, for use in DIL AST nodes. Definition is at the very
 /// end of this file.
 class Visitor;
@@ -178,6 +189,49 @@ class BitFieldExtractionNode : public ASTNode {
   int64_t m_last_index;
 };
 
+class ScalarLiteralNode : public ASTNode {
+public:
+  ScalarLiteralNode(uint32_t location, lldb::BasicType type, Scalar value)
+      : ASTNode(location, NodeKind::eScalarLiteralNode), m_type(type),
+        m_value(value) {}
+
+  llvm::Expected<lldb::ValueObjectSP> Accept(Visitor *v) const override;
+
+  lldb::BasicType GetType() const { return m_type; }
+  Scalar GetValue() const & { return m_value; }
+
+  static bool classof(const ASTNode *node) {
+    return node->GetKind() == NodeKind::eScalarLiteralNode;
+  }
+
+private:
+  lldb::BasicType m_type;
+  Scalar m_value;
+};
+
+class BinaryOpNode : public ASTNode {
+public:
+  BinaryOpNode(uint32_t location, BinaryOpKind kind, ASTNodeUP lhs,
+               ASTNodeUP rhs)
+      : ASTNode(location, NodeKind::eBinaryOpNode), m_kind(kind),
+        m_lhs(std::move(lhs)), m_rhs(std::move(rhs)) {}
+
+  llvm::Expected<lldb::ValueObjectSP> Accept(Visitor *v) const override;
+
+  BinaryOpKind GetKind() const { return m_kind; }
+  ASTNode *GetLHS() const { return m_lhs.get(); }
+  ASTNode *GetRHS() const { return m_rhs.get(); }
+
+  static bool classof(const ASTNode *node) {
+    return node->GetKind() == NodeKind::eBinaryOpNode;
+  }
+
+private:
+  BinaryOpKind m_kind;
+  ASTNodeUP m_lhs;
+  ASTNodeUP m_rhs;
+};
+
 /// This class contains one Visit method for each specialized type of
 /// DIL AST node. The Visit methods are used to dispatch a DIL AST node to
 /// the correct function in the DIL expression evaluator for evaluating that
@@ -195,6 +249,10 @@ class Visitor {
   Visit(const ArraySubscriptNode *node) = 0;
   virtual llvm::Expected<lldb::ValueObjectSP>
   Visit(const BitFieldExtractionNode *node) = 0;
+  virtual llvm::Expected<lldb::ValueObjectSP>
+  Visit(const ScalarLiteralNode *node) = 0;
+  virtual llvm::Expected<lldb::ValueObjectSP>
+  Visit(const BinaryOpNode *node) = 0;
 };
 
 } // namespace lldb_private::dil
diff --git a/lldb/include/lldb/ValueObject/DILEval.h b/lldb/include/lldb/ValueObject/DILEval.h
index 45e29b3ddcd7b..9e9028f6122fd 100644
--- a/lldb/include/lldb/ValueObject/DILEval.h
+++ b/lldb/include/lldb/ValueObject/DILEval.h
@@ -54,6 +54,21 @@ class Interpreter : Visitor {
   Visit(const ArraySubscriptNode *node) override;
   llvm::Expected<lldb::ValueObjectSP>
   Visit(const BitFieldExtractionNode *node) override;
+  llvm::Expected<lldb::ValueObjectSP>
+  Visit(const ScalarLiteralNode *node) override;
+  llvm::Expected<lldb::ValueObjectSP> Visit(const BinaryOpNode *node) override;
+
+  lldb::ValueObjectSP
+  ConvertValueObjectToTypeSystem(lldb::ValueObjectSP valobj,
+                                 lldb::TypeSystemSP type_system);
+
+  llvm::Error PrepareBinaryAddition(lldb::ValueObjectSP &lhs,
+                                    lldb::ValueObjectSP &rhs,
+                                    uint32_t location);
+
+  llvm::Expected<lldb::ValueObjectSP>
+  EvaluateBinaryAddition(lldb::ValueObjectSP lhs, lldb::ValueObjectSP rhs,
+                         uint32_t location);
 
   // Used by the interpreter to create objects, perform casts, etc.
   lldb::TargetSP m_target;
diff --git a/lldb/include/lldb/ValueObject/DILLexer.h b/lldb/include/lldb/ValueObject/DILLexer.h
index 9c1ba97680253..58059657bf3a5 100644
--- a/lldb/include/lldb/ValueObject/DILLexer.h
+++ b/lldb/include/lldb/ValueObject/DILLexer.h
@@ -34,6 +34,7 @@ class Token {
     minus,
     numeric_constant,
     period,
+    plus,
     r_paren,
     r_square,
     star,
diff --git a/lldb/include/lldb/ValueObject/DILParser.h b/lldb/include/lldb/ValueObject/DILParser.h
index 9eda7bac4a364..c57ef0cf28022 100644
--- a/lldb/include/lldb/ValueObject/DILParser.h
+++ b/lldb/include/lldb/ValueObject/DILParser.h
@@ -87,6 +87,7 @@ class DILParser {
   ASTNodeUP Run();
 
   ASTNodeUP ParseExpression();
+  ASTNodeUP ParseAdditiveExpression();
   ASTNodeUP ParseUnaryExpression();
   ASTNodeUP ParsePostfixExpression();
   ASTNodeUP ParsePrimaryExpression();
@@ -96,6 +97,8 @@ class DILParser {
   std::string ParseIdExpression();
   std::string ParseUnqualifiedId();
   std::optional<int64_t> ParseIntegerConstant();
+  ASTNodeUP ParseNumericLiteral();
+  ASTNodeUP ParseNumericConstant();
 
   void BailOut(const std::string &error, uint32_t loc, uint16_t err_len);
 
diff --git a/lldb/source/ValueObject/DILAST.cpp b/lldb/source/ValueObject/DILAST.cpp
index b1cd824c2299e..7aee692f2b28a 100644
--- a/lldb/source/ValueObject/DILAST.cpp
+++ b/lldb/source/ValueObject/DILAST.cpp
@@ -11,6 +11,18 @@
 
 namespace lldb_private::dil {
 
+BinaryOpKind GetBinaryOpKindFromToken(Token::Kind token_kind) {
+  switch (token_kind) {
+  case Token::minus:
+    return BinaryOpKind::Sub;
+  case Token::plus:
+    return BinaryOpKind::Add;
+  default:
+    break;
+  }
+  llvm_unreachable("Unknown binary operator kind.");
+}
+
 llvm::Expected<lldb::ValueObjectSP> ErrorNode::Accept(Visitor *v) const {
   llvm_unreachable("Attempting to Visit a DIL ErrorNode.");
 }
@@ -37,4 +49,13 @@ BitFieldExtractionNode::Accept(Visitor *v) const {
   return v->Visit(this);
 }
 
+llvm::Expected<lldb::ValueObjectSP>
+ScalarLiteralNode::Accept(Visitor *v) const {
+  return v->Visit(this);
+}
+
+llvm::Expected<lldb::ValueObjectSP> BinaryOpNode::Accept(Visitor *v) const {
+  return v->Visit(this);
+}
+
 } // namespace lldb_private::dil
diff --git a/lldb/source/ValueObject/DILEval.cpp b/lldb/source/ValueObject/DILEval.cpp
index 8ca9b4215985d..2e42b57e93042 100644
--- a/lldb/source/ValueObject/DILEval.cpp
+++ b/lldb/source/ValueObject/DILEval.cpp
@@ -7,7 +7,9 @@
 //===----------------------------------------------------------------------===//
 
 #include "lldb/ValueObject/DILEval.h"
+#include "lldb/Core/Module.h"
 #include "lldb/Symbol/CompileUnit.h"
+#include "lldb/Symbol/TypeSystem.h"
 #include "lldb/Symbol/VariableList.h"
 #include "lldb/Target/RegisterContext.h"
 #include "lldb/ValueObject/DILAST.h"
@@ -401,4 +403,407 @@ Interpreter::Visit(const BitFieldExtractionNode *node) {
   return child_valobj_sp;
 }
 
+static CompilerType GetBasicTypeFromCU(std::shared_ptr<StackFrame> ctx,
+                                       lldb::BasicType basic_type) {
+  SymbolContext symbol_context =
+      ctx->GetSymbolContext(lldb::eSymbolContextCompUnit);
+  auto language = symbol_context.comp_unit->GetLanguage();
+
+  symbol_context = ctx->GetSymbolContext(lldb::eSymbolContextModule);
+  auto type_system =
+      symbol_context.module_sp->GetTypeSystemForLanguage(language);
+
+  if (type_system)
+    if (auto compiler_type = type_system.get()->GetBasicTypeFromAST(basic_type))
+      return compiler_type;
+
+  CompilerType empty_type;
+  return empty_type;
+}
+
+llvm::Expected<lldb::ValueObjectSP>
+Interpreter::Visit(const ScalarLiteralNode *node) {
+  CompilerType result_type =
+      GetBasicTypeFromCU(m_exe_ctx_scope, node->GetType());
+  Scalar value = node->GetValue();
+
+  // Scalar later could be float or bool
+  if (result_type.IsInteger() || result_type.IsNullPtrType() ||
+      result_type.IsPointerType()) {
+    llvm::APInt val = value.GetAPSInt();
+    return ValueObject::CreateValueObjectFromAPInt(m_target, val, result_type,
+                                                   "result");
+  }
+
+  return lldb::ValueObjectSP();
+}
+
+static CompilerType GetBasicType(lldb::TypeSystemSP type_system,
+                                 lldb::BasicType basic_type) {
+  if (type_system)
+    if (auto compiler_type = type_system.get()->GetBasicTypeFromAST(basic_type))
+      return compiler_type;
+
+  CompilerType empty_type;
+  return empty_type;
+}
+
+static CompilerType DoIntegralPromotion(CompilerType from,
+                                        std::shared_ptr<StackFrame> ctx) {
+  if (!from.IsInteger() && !from.IsUnscopedEnumerationType())
+    return from;
+
+  if (!from.IsPromotableIntegerType())
+    return from;
+
+  if (from.IsUnscopedEnumerationType())
+    return DoIntegralPromotion(from.GetEnumerationIntegerType(), ctx);
+  lldb::BasicType builtin_type =
+      from.GetCanonicalType().GetBasicTypeEnumeration();
+  lldb::TypeSystemSP type_system = from.GetTypeSystem().GetSharedPointer();
+
+  uint64_t from_size = 0;
+  if (builtin_type == lldb::eBasicTypeWChar ||
+      builtin_type == lldb::eBasicTypeSignedWChar ||
+      builtin_type == lldb::eBasicTypeUnsignedWChar ||
+      builtin_type == lldb::eBasicTypeChar16 ||
+      builtin_type == lldb::eBasicTypeChar32) {
+    // Find the type that can hold the entire range of values for our type.
+    bool is_signed = from.IsSigned();
+    if (auto temp = from.GetByteSize(ctx.get()))
+      from_size = *temp;
+
+    CompilerType promote_types[] = {
+        GetBasicType(type_system, lldb::eBasicTypeInt),
+        GetBasicType(type_system, lldb::eBasicTypeUnsignedInt),
+        GetBasicType(type_system, lldb::eBasicTypeLong),
+        GetBasicType(type_system, lldb::eBasicTypeUnsignedLong),
+        GetBasicType(type_system, lldb::eBasicTypeLongLong),
+        GetBasicType(type_system, lldb::eBasicTypeUnsignedLongLong),
+    };
+    for (auto &type : promote_types) {
+      uint64_t byte_size = 0;
+      if (auto temp = type.GetByteSize(ctx.get()))
+        byte_size = *temp;
+      if (from_size < byte_size ||
+          (from_size == byte_size &&
+           is_signed == (bool)(type.GetTypeInfo() & lldb::eTypeIsSigned))) {
+        return type;
+      }
+    }
+
+    llvm_unreachable("char type should fit into long long");
+  }
+
+  // Here we can promote only to "int" or "unsigned int".
+  CompilerType int_type = GetBasicType(type_system, lldb::eBasicTypeInt);
+  uint64_t int_byte_size = 0;
+  if (auto temp = int_type.GetByteSize(ctx.get()))
+    int_byte_size = *temp;
+
+  // Signed integer types can be safely promoted to "int".
+  if (from.IsSigned()) {
+    return int_type;
+  }
+  // Unsigned integer types are promoted to "unsigned int" if "int" cannot hold
+  // their entire value range.
+  return (from_size == int_byte_size)
+             ? GetBasicType(type_system, lldb::eBasicTypeUnsignedInt)
+             : int_type;
+}
+
+static lldb::ValueObjectSP UnaryConversion(lldb::ValueObjectSP valobj,
+                                           std::shared_ptr<StackFrame> ctx) {
+  // Perform usual conversions for unary operators.
+  CompilerType in_type = valobj->GetCompilerType();
+  CompilerType result_type;
+
+  if (valobj->GetCompilerType().IsInteger() ||
+      valobj->GetCompilerType().IsUnscopedEnumerationType()) {
+    CompilerType promoted_type =
+        DoIntegralPromotion(valobj->GetCompilerType(), ctx);
+    if (!promoted_type.CompareTypes(valobj->GetCompilerType()))
+      return valobj->CastToBasicType(promoted_type);
+  }
+
+  return valobj;
+}
+
+static size_t ConversionRank(CompilerType type) {
+  // Get integer conversion rank
+  // https://eel.is/c++draft/conv.rank
+  switch (type.GetCanonicalType().GetBasicTypeEnumeration()) {
+  case lldb::eBasicTypeBool:
+    return 1;
+  case lldb::eBasicTypeChar:
+  case lldb::eBasicTypeSignedChar:
+  case lldb::eBasicTypeUnsignedChar:
+    return 2;
+  case lldb::eBasicTypeShort:
+  case lldb::eBasicTypeUnsignedShort:
+    return 3;
+  case lldb::eBasicTypeInt:
+  case lldb::eBasicTypeUnsignedInt:
+    return 4;
+  case lldb::eBasicTypeLong:
+  case lldb::eBasicTypeUnsignedLong:
+    return 5;
+  case lldb::eBasicTypeLongLong:
+  case lldb::eBasicTypeUnsignedLongLong:
+    return 6;
+
+    // The ranks of char16_t, char32_t, and wchar_t are equal to the
+    // ranks of their underlying types.
+  case lldb::eBasicTypeWChar:
+  case lldb::eBasicTypeSignedWChar:
+  case lldb::eBasicTypeUnsignedWChar:
+    return 3;
+  case lldb::eBasicTypeChar16:
+    return 3;
+  case lldb::eBasicTypeChar32:
+    return 4;
+
+  default:
+    break;
+  }
+  return 0;
+}
+
+static lldb::BasicType BasicTypeToUnsigned(lldb::BasicType basic_type) {
+  switch (basic_type) {
+  case lldb::eBasicTypeInt:
+    return lldb::eBasicTypeUnsignedInt;
+  case lldb::eBasicTypeLong:
+    return lldb::eBasicTypeUnsignedLong;
+  case lldb::eBasicTypeLongLong:
+    return lldb::eBasicTypeUnsignedLongLong;
+  default:
+    return basic_type;
+  }
+}
+
+static void PerformIntegerConversions(std::shared_ptr<StackFrame> ctx,
+                                      lldb::ValueObjectSP &lhs,
+                                      lldb::ValueObjectSP &rhs,
+                                      bool convert_lhs, bool convert_rhs) {
+  CompilerType l_type = lhs->GetCompilerType();
+  CompilerType r_type = rhs->GetCompilerType();
+  if (r_type.IsSigned() && !l_type.IsSigned()) {
+    uint64_t l_size = 0;
+    uint64_t r_size = 0;
+    if (auto temp = l_type.GetByteSize(ctx.get()))
+      l_size = *temp;
+    ;
+    if (auto temp = r_type.GetByteSize(ctx.get()))
+      r_size = *temp;
+    if (l_size <= r_size) {
+      if (r_size == l_size) {
+        lldb::TypeSystemSP type_system =
+            l_type.GetTypeSystem().GetSharedPointer();
+        auto r_type_unsigned = GetBasicType(
+            type_system,
+            BasicTypeToUnsigned(
+                r_type.GetCanonicalType().GetBasicTypeEnumeration()));
+        if (convert_rhs)
+          rhs = rhs->CastToBasicType(r_type_unsigned);
+      }
+    }
+  }
+  if (convert_lhs)
+    lhs = lhs->CastToBasicType(rhs->GetCompilerType());
+}
+
+static CompilerType ArithmeticConversions(lldb::ValueObjectSP &lhs,
+                                          lldb::ValueObjectSP &rhs,
+                                          std::shared_ptr<StackFrame> ctx) {
+  // Apply unary conversion (e.g. intergal promotion) for both operands.
+  lhs = UnaryConversion(lhs, ctx);
+  rhs = UnaryConversion(rhs, ctx);
+
+  CompilerType lhs_type = lhs->GetCompilerType();
+  CompilerType rhs_type = rhs->GetCompilerType();
+
+  if (lhs_type.CompareTypes(rhs_type))
+    return lhs_type;
+
+  // If either of the operands is not arithmetic (e.g. pointer), we're done.
+  if (!lhs_type.IsScalarType() || !rhs_type.IsScalarType()) {
+    CompilerType bad_type;
+    return bad_type;
+  }
+
+  // Removed floating point conversions
+  if (lhs_type.IsFloat() || rhs_type.IsFloat()) {
+    CompilerType bad_type;
+    return bad_type;
+  }
+
+  if (lhs_type.IsInteger() && rhs_type.IsInteger()) {
+    using Rank = std::tuple<size_t, bool>;
+    Rank l_rank = {ConversionRank(lhs_type), !lhs_type.IsSigned()};
+    Rank r_rank = {ConversionRank(rhs_type), !rhs_type.IsSigned()};
+
+    if (l_rank < r_rank) {
+      PerformIntegerConversions(ctx, lhs, rhs, true, true);
+    } else if (l_rank > r_rank) {
+      PerformIntegerConversions(ctx, rhs, lhs, true, true);
+    }
+  }
+
+  return rhs_type;
+}
+
+llvm::Error Interpreter::PrepareBinaryAddition(lldb::ValueObjectSP &lhs,
+                                               lldb::ValueObjectSP &rhs,
+                                               uint32_t location) {
+  // Operation '+' works for:
+  //
+  //  {scalar,unscoped_enum} <-> {scalar,unscoped_enum}
+  //  {integer,unscoped_enum} <-> pointer
+  //  pointer <-> {integer,unscoped_enum}
+  auto orig_lhs_type = lhs->GetCompilerType();
+  auto orig_rhs_type = rhs->GetCompilerType();
+  auto result_type = ArithmeticConversions(lhs, rhs, m_exe_ctx_scope);
+
+  if (result_type.IsScalarType())
+    return llvm::Error::success();
+
+  // Removed pointer arithmetics
+  return llvm::make_error<DILDiagnosticError>(m_expr, "unimplemented",
+                                              location);
+}
+
+static lldb::ValueObjectSP EvaluateArithmeticOpInteger(lldb::TargetSP target,
+                                                       BinaryOpKind kind,
+                                                       lldb::ValueObjectSP lhs,
+                                                       lldb::ValueObjectSP rhs,
+                                                       CompilerType rtype) {
+  assert(lhs->GetCompilerType().IsInteger() &&
+         rhs->GetCompilerType().IsInteger() &&
+         "invalid ast: both operands must be integers");
+
+  auto wrap = [target, rtype](auto value) {
+    return ValueObject::CreateValueObjectFromAPInt(target, value, rtype,
+                                                   "result");
+  };
+
+  llvm::Expected<llvm::APSInt> l_value = lhs->GetValueAsAPSInt();
+  llvm::Expected<llvm::APSInt> r_value = rhs->GetValueAsAPSInt();
+
+  if (l_value && r_value) {
+    llvm::APSInt l = *l_value;
+    llvm::APSInt r = *r_value;
+
+    switch (kind) {
+    case BinaryOpKind::Add:
+      return wrap(l + r);
+
+    default:
+      assert(false && "invalid ast: invalid arithmetic operation");
+      return lldb::ValueObjectSP();
+    }
+  } else {
+    return lldb::ValueObjectSP();
+  }
+}
+
+static lldb::ValueObjectSP EvaluateArithmeticOp(lldb::TargetSP target,
+                                                BinaryOpKind kind,
+                                                lldb::ValueObjectSP lhs,
+                                                lldb::ValueObjectSP rhs,
+                                                CompilerType rtype) {
+  assert((rtype.IsInteger() || rtype.IsFloat()) &&
+         "invalid ast: result type must either integer or floating point");
+
+  // Evaluate arithmetic operation for two integral values.
+  if (rtype.IsInteger()) {
+    return EvaluateArithmeticOpInteger(target, kind, lhs, rhs, rtype);
+  }
+
+  // Removed floating point arithmetics
+
+  return lldb::ValueObjectSP();
+}
+
+llvm::Expected<lldb::ValueObjectSP> Interpreter::EvaluateBinaryAddition(
+    lldb::ValueObjectSP lhs, lldb::ValueObjectSP rhs, uint32_t location) {
+  // Addition of two arithmetic types.
+  if (lhs->GetCompilerType().IsScalarType() &&
+      rhs->GetCompilerType().IsScalarType()) {
+    return EvaluateArithmeticOp(m_target, BinaryOpKind::Add, lhs, rhs,
+                                lhs->GetCompilerType().GetCanonicalType());
+  }
+
+  // Removed pointer arithmetics
+  return llvm::make_error<DILDiagnosticError>(m_expr, "unimplemented",
+                                              location);
+}
+
+lldb::ValueObjectSP
+Interpreter::ConvertValueObjectToTypeSystem(lldb::ValueObjectSP valobj,
+                                            lldb::TypeSystemSP type_system) {
+  auto apsint = valobj->GetValueAsAPSInt();
+  if (apsint) {
+    llvm::APInt value = *apsint;
+    if (type_system) {
+      lldb::BasicType basic_type = valobj->GetCompilerType()
+                                       .GetCanonicalType()
+                                       .GetBasicTypeEnumeration();
+      if (auto compiler_type =
+              type_system.get()->GetBasicTypeFromAST(basic_type)) {
+        valobj->GetValue().SetCompilerType(compiler_type);
+        return ValueObject::CreateValueObjectFromAPInt(m_target, value,
+                                                       compiler_type, "result");
+      }
+    }
+  }
+
+  return lldb::ValueObjectSP();
+}
+
+llvm::Expected<lldb::ValueObjectSP>
+Interpreter::Visit(const BinaryOpNode *node) {
+  auto lhs_or_err = Evaluate(node->GetLHS());
+  if (!lhs_or_err)
+    return lhs_or_err;
+  lldb::ValueObjectSP lhs = *lhs_or_err;
+  auto rhs_or_err = Evaluate(node->GetRHS());
+  if (!rhs_or_err)
+    return rhs_or_err;
+  lldb::ValueObjectSP rhs = *rhs_or_err;
+
+  lldb::TypeSystemSP lhs_system =
+      lh...
[truncated]

@kuilpd
Copy link
Contributor Author

kuilpd commented Jul 4, 2025

@labath @cmtice @jimingham @asl
This is a draft with some potential changes I'd like to discuss regarding how to handle the type system of constant literals.
I'm also trying to demonstrate how they interact further with other values in a binary addition node. A lot of the code is cut out for better visibility of the important parts, but it's still quite a lot, I will later split this into separate PRs if anything comes out of this.

Things I've tried to address:

  1. DILEval.cpp:406-439: Upon creation of ScalarLiteralNode it will default to the current compile unit's type system, instead of iterating through target_sp->GetScratchTypeSystems() (old code). I think it's a good bet to assume that we want the type system of where we're currently stopped at. The node has to have some type in case it needs to be output as is.
  2. DILEval.cpp:775-791: However, when it is used in another node, like a + b, we check if operands' type systems match and if not, we check if one of them is a scalar literal, in which case they can be converted to the type system of another operand.

The rest of the code in DILEval.cpp:441-740 is mostly the type promotion and conversion implementation from our downstream, except I changed the compiler type to be retrieved from operands' type system, instead of the same scratch type system iteration. There's no need to review the entire logic here yet, just the part how it handles the types.

Here is the problem with types: initial scalar literal type, all the type promotions and conversions are done using lldb::BasicType enumeration types, and later converted to CompilerType using the TypeSystem. In C++, this works perfectly well, because in TypeSystemClang.cpp:5468 it is implemented, but in TypeSystemSwift:304 it is not.

Do we assume that this was just never needed before, so it can now be implemented and used? Or is there some sort of inherent incompatibility between LLDB's basic type system and Swift type system (talking about primitive types) that this cannot be implemented correctly? If so, what would be an alternative way to abstract the type promotion/conversion from the specific compiler type and type system?

@kuilpd kuilpd requested review from jimingham and labath July 7, 2025 09:47
@cmtice cmtice requested a review from adrian-prantl July 8, 2025 05:25
@augusto2112
Copy link
Contributor

Here is the problem with types: initial scalar literal type, all the type promotions and conversions are done using lldb::BasicType enumeration types, and later converted to CompilerType using the TypeSystem. In C++, this works perfectly well, because in TypeSystemClang.cpp:5468 it is implemented, but in TypeSystemSwift:304 it is not.

Do we assume that this was just never needed before, so it can now be implemented and used? Or is there some sort of inherent incompatibility between LLDB's basic type system and Swift type system (talking about primitive types) that this cannot be implemented correctly? If so, what would be an alternative way to abstract the type promotion/conversion from the specific compiler type and type system?

I don't think there's any inherent reason why the Swift type systems can't implement GetBasicTypeEnumeration and GetBasicTypeFromAST, it's most likely something that just wasn't needed before.

@labath
Copy link
Collaborator

labath commented Jul 9, 2025

We could, but I'm not sure if that's the most important question right now. The whole BasicType concept is very C-like. There is no "unsigned long long" basic type in swift (or pretty much any other language). While you could say that the function returns the langauge's equivalent (same bit width and signedness?) of the named C type, by doing that, and also encoding all of the type promotion rules inside common code, you're basically hardcoding everything that makes the swift type system unique (when it comes to numbers at least). Maybe that's okay, but at that point, I'm wondering if there's any point in using a non-C type system.

OTOH, if we say that addition of two swift variables should behave "like in swift", then I think all of the addition code (including type ranking, promotion, and whatnot) needs to happen inside a plugin (because swift doesn't have type promotion). At that point, the question of using GetBasicType may become moot.

@kuilpd
Copy link
Contributor Author

kuilpd commented Jul 9, 2025

We could, but I'm not sure if that's the most important question right now. The whole BasicType concept is very C-like. There is no "unsigned long long" basic type in swift (or pretty much any other language). While you could say that the function returns the langauge's equivalent (same bit width and signedness?) of the named C type, by doing that, and also encoding all of the type promotion rules inside common code, you're basically hardcoding everything that makes the swift type system unique (when it comes to numbers at least). Maybe that's okay, but at that point, I'm wondering if there's any point in using a non-C type system.

OTOH, if we say that addition of two swift variables should behave "like in swift", then I think all of the addition code (including type ranking, promotion, and whatnot) needs to happen inside a plugin (because swift doesn't have type promotion). At that point, the question of using GetBasicType may become moot.

This is what I was thinking as well. On the one hand, we want DIL to produce the same expression results as the language it tries to emulate. So if there is some fundamental difference in evaluating C++ and Swift in a certain scenario, we will have to write different code for each of them. Like for example (from my limited understanding), when adding Int8 and Int32 in Swift we're supposed to return an error and not attempt any type promotions. On the other hand, we're making our own language, which can behave any way we want, so we can just add Swift numbers with C logic and get some result, which is potentially more useful than an error.

@labath
Copy link
Collaborator

labath commented Jul 9, 2025

Personally, I'd be fine with saying that anytime we start doing arithmetic on something (maybe, whenever we do a lvalue-to-rvalue conversion?) we switch to a "DIL" type system. We could make that "type system" work mostly like C, but I can also imagine doing something like python's "arbitrary width" integers. However, I think we need to hear from Swift folks, as they're the ones most impacted by this.

@kuilpd
Copy link
Contributor Author

kuilpd commented Jul 9, 2025

but I can also imagine doing something like python's "arbitrary width" integers.

Isn't this basically llvm::APInt class? We could just disregard every integer's type, add their values as APInt and return the result with a maximum width integer type. I kind of like this path, makes things way simpler. The only thing is that if the values of shorter types are expected to overflow and truncate, we would get a wrong result, and I'm not sure how important this is in the context of data inspection.

@jimingham
Copy link
Collaborator

I am also fine with arithmetic in the DIL being "DIL" arithmetic, and something like the arbitrary width integers seems the most straightforward way to do that. If you require language accuracy, that's what the expression evaluator is for. At the point in the future where lldb is supporting 10 languages with slightly different rules for arithmetic, I think the benefits of "DIL" expressions all do arithmetic "this way" will be clearer...

@labath
Copy link
Collaborator

labath commented Jul 10, 2025

but I can also imagine doing something like python's "arbitrary width" integers.

Isn't this basically llvm::APInt class?

Sort of, but not quite. APInt has an arbitrary but fixed width. Python integers scale their width dynamically to fit the result (so they sort of have "infinite width").

We could emulate that using APInt by scaling its width before an operation sufficiently so that it does not lose precision, but I would actually rather not -- mainly because we would have to implement wrapper class for that, but also because it avoids some pathological cases if a user accidentally does a 1<<UINT_MAX or something.

However, that means we still need to define what happens in case of overflows, width mismatches, and such. Two easy ways to do that would be:

  1. say that values retain their width and signedness, but then get converted into the wider width using C-like rules. We already have this implemented inside the Scalar class (which wraps APInt), so this would be best in terms of code reuse, and it would produce few surprises to people used to C arithmetic. The Scalar class also supports floating point numbers (via APFloat), so this would automatically get us floating point support.

  2. Say that all operations happen on large-but-fixed-width types. For example, 128-bit integers and IEEE quad floats? We could still use the Scalar class for this, but promote the value to the desired type before doing anything.

@kuilpd
Copy link
Contributor Author

kuilpd commented Jul 16, 2025

say that values retain their width and signedness, but then get converted into the wider width using C-like rules. We already have this implemented inside the Scalar class (which wraps APInt), so this would be best in terms of code reuse, and it would produce few surprises to people used to C arithmetic. The Scalar class also supports floating point numbers (via APFloat), so this would automatically get us floating point support.

@labath
I did an experimental implementation of this, basically relying on Scalar to do all arithmetics for me, including type conversions and promotions. It works, even when adding Scalar(APInt) and Scalar(APFloat), but a couple of things are weird:

  1. Constant wrapping/unwrapping of Scalar(APInt). For literals, it goes: parse as APInt -> wrap to Scalar to create a node -> unwrap APInt to create ValueObjectSP -> wrap to Scalar to use arithmetic operation -> unwrap APInt to create ValueObjectSP. Not sure if it's a problem or not, but there's no functionality to store a Scalar in ValueObject.
  2. Still not sure how to sensibly get a compiler type for a given number. I just did it by whatever size fits first. Also, it looks like bit size of APInt that comes from a ValueObject is not reliable, since even a variable of type char has an APInt with 32 bit size. The best I could come up with is to recreate APInt with the size taken from compiler type. After that, type promotion in Scalar works as expected.

@cmtice cmtice requested a review from werat July 17, 2025 15:04
@labath
Copy link
Collaborator

labath commented Jul 24, 2025

  1. Constant wrapping/unwrapping of Scalar(APInt). For literals, it goes: parse as APInt -> wrap to Scalar to create a node -> unwrap APInt to create ValueObjectSP -> wrap to Scalar to use arithmetic operation -> unwrap APInt to create ValueObjectSP. Not sure if it's a problem or not, but there's no functionality to store a Scalar in ValueObject.

We can definitely add functions to covert (directly) between Scalars and ValueObjects (in fact, I think the only reason that the APInt overloads currently exists is because of the initial attempt to upstream the DIL implementation). However, I'm still not sure if it's necessary to ValueObjects for values internal to the expression. I don't think the different result type should be a problem since the result type for a given ast node kind should be always the same. Scalar nodes and all arithmetic operation should return a scalar, while other operations return a value object. If you need to use a value object in a scalar context (e.g. in a[b] where b is a variable) you add new node type (LValueToRValueConversionNode ?), which performs the extraction.

2. Still not sure how to sensibly get a compiler type for a given number. I just did it by whatever size fits first.

That's probably okay, though if we go down this path, it might not be unreasonable to create our own type system for representing these values.

Also, it looks like bit size of APInt that comes from a ValueObject is not reliable, since even a variable of type char has an APInt with 32 bit size.

That might be something we can fix. What's the API you're using and what happens if you change it to do what you need?

@kuilpd
Copy link
Contributor Author

kuilpd commented Jul 24, 2025

We can definitely add functions to covert (directly) between Scalars and ValueObjects (in fact, I think the only reason that the APInt overloads currently exists is because of the initial attempt to upstream the DIL implementation).

Ah, I didn't know that, I think I joined the upstream implementation right after that.

However, I'm still not sure if it's necessary to ValueObjects for values internal to the expression. I don't think the different result type should be a problem since the result type for a given ast node kind should be always the same. Scalar nodes and all arithmetic operation should return a scalar, while other operations return a value object. If you need to use a value object in a scalar context (e.g. in a[b] where b is a variable) you add new node type (LValueToRValueConversionNode ?), which performs the extraction.

I did an internal value for a subscript node to make the switch to DIL faster, but in the full implementation none of the nodes really have any internal values (if I understood correctly what you are proposing). We just have incoming AST nodes for arguments, and there is a single interface to traverse the AST tree: Evaluate(node) that returns a ValueObject of the entire subtree of that node. So when evaluating b in a[b] we don't know what b is until we evaluate it: a literal, variable, or a whole another expression. We specifically moved away from analyzing types of nodes during parsing, the way it was done in lldb-eval.

Also, it looks like bit size of APInt that comes from a ValueObject is not reliable, since even a variable of type char has an APInt with 32 bit size.

That might be something we can fix. What's the API you're using and what happens if you change it to do what you need?

I looked at this again, and realized that APInt having 32 bit width by default is actually okay, since according to C99 whenever we use a lower width variable in any arithmetic operation it gets promoted to int anyway. Although my question now is how exactly and where is that 32 bit size picked: is it some default value or is it based on a target machine? I'm not sure at which point is an instance of APInt created for a variable, I'll look into it further.

@labath
Copy link
Collaborator

labath commented Jul 24, 2025

single interface to traverse the AST tree:

That's the part that would change in this setup. Instead of one API, we'd have two: EvaluateValueNode() and EvaluateScalarNode(), and we'd have two node hierarchies: ValueNode and ScalarNode. MemberOfNode would be a ValueNode because it produces a value object. AddNode would be a ScalarNode because it produces a number. ArraySubscriptNode would be a ValueNode (the result is a value), but its index member would be a ScalarNode, because we need to use it for indexing.

The correct types would be ensured when parsing, possibly with the addition of the "rvalue" nodes. a[1] would be parsed as Subscript(Identifier(a), Scalar(1)), while a[b] would turn into Subscript(Identifier(a), RValue(Identifier(b))). a[b+1] would be something like Subscript(Identifier(a), Add(RValue(Identifier(b)), Scalar(1))). The evaluator would have access to the right types, so it could invoke the right function. This has nothing to do with the types of the variables, its about the types (kinds) of AST nodes.

I'm not saying you have to do this, but I find the idea of not constantly wrapping and unwrapping scalars appealing.

@kuilpd
Copy link
Contributor Author

kuilpd commented Jul 25, 2025

single interface to traverse the AST tree:

That's the part that would change in this setup. Instead of one API, we'd have two: EvaluateValueNode() and EvaluateScalarNode(), and we'd have two node hierarchies: ValueNode and ScalarNode. MemberOfNode would be a ValueNode because it produces a value object. AddNode would be a ScalarNode because it produces a number. ArraySubscriptNode would be a ValueNode (the result is a value), but its index member would be a ScalarNode, because we need to use it for indexing.

The correct types would be ensured when parsing, possibly with the addition of the "rvalue" nodes. a[1] would be parsed as Subscript(Identifier(a), Scalar(1)), while a[b] would turn into Subscript(Identifier(a), RValue(Identifier(b))). a[b+1] would be something like Subscript(Identifier(a), Add(RValue(Identifier(b)), Scalar(1))). The evaluator would have access to the right types, so it could invoke the right function. This has nothing to do with the types of the variables, its about the types (kinds) of AST nodes.

I'm not saying you have to do this, but I find the idea of not constantly wrapping and unwrapping scalars appealing.

I don't know about this... wouldn't this RValue(ValueNode) function be pretty much the same process of unwrapping Scalar from ValueObject? We will still need to wrap/unwrap Scalar from ValueObject whenever they interact, so this will only cut down the wraps when Scalar/Scalar interact, which is pretty much just arithmetics with constant literals and subscript with a constant index.

I think for now it's better to implement a clean interface in ValueObject to create it from a Scalar, and we can optimize the arithmetics the way you suggested later, if this ends up being a performance bottleneck.

@kuilpd kuilpd closed this Aug 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants