Skip to content

Dump host_ir in HostIrJit, resolve clangtidy errors#6010

Merged
Priya2698 merged 3 commits intomainfrom
pm/debug_host_ir
Feb 25, 2026
Merged

Dump host_ir in HostIrJit, resolve clangtidy errors#6010
Priya2698 merged 3 commits intomainfrom
pm/debug_host_ir

Conversation

@Priya2698
Copy link
Collaborator

No description provided.

@Priya2698
Copy link
Collaborator Author

!test

@github-actions
Copy link

github-actions bot commented Feb 24, 2026

Review updated until commit 47d2f08

Description

  • Initialize main_func_ member to nullptr to prevent undefined behavior

  • Fix template function parameter to avoid dangling reference issues

  • Convert member references to pointers with getter methods for better safety

  • Add debug dump option for HostIR in compilation process

  • Apply clang-tidy fixes including performance and safety annotations

Changes walkthrough

Relevant files
Enhancement
jit.cpp
HostIR debug dump and clangtidy fixes                                       

csrc/host_ir/jit.cpp

  • Initialize main_func_ member variable to nullptr
  • Fix throwIfError template function parameter type
  • Convert reference members to pointers with getter methods in
    HostIrCompileDispatcher
  • Add debug dump functionality for HostIR compilation
  • Apply clang-tidy fixes with NOLINTNEXTLINE annotations
  • Fix function call to use valueOrError with proper casting
  • +121/-100

    PR Reviewer Guide

    Here are some key observations to aid the review process:

    🧪 No relevant tests
    ⚡ Recommended focus areas for review
    Null Pointer Safety

    The refactoring changes member variables from direct references to pointers (builder_ and val_to_value_) which could potentially lead to null pointer dereferences if not properly initialized. While the current usage appears safe, this change increases the risk surface area.

          : builder_(&builder), val_to_value_(&val_to_value) {}
      using OptInDispatch::handle;
    
      void handle(ReshapeOp* vop) final {
        auto* in_tv = vop->in()->as<TensorView>();
        auto* out_tv = vop->out()->as<TensorView>();
        llvm::Value* in_tensor = getOrDefault(valToValue(), in_tv);
        NVF_ERROR(in_tensor != nullptr)
        llvm::Value* out_tensor = getOrDefault(valToValue(), out_tv);
        NVF_ERROR(out_tensor == nullptr)
    
        llvm::Module* module = builder().GetInsertBlock()->getParent()->getParent();
        llvm::LLVMContext& context = builder().getContext();
    
        llvm::SmallVector<llvm::Value*, kMaxTensorDim> tensor_sizes;
        llvm::SmallVector<llvm::Value*, kMaxTensorDim> tensor_strides;
        inferTensorShapesAndStrides(
            out_tv, valToValue(), builder(), tensor_sizes, tensor_strides);
    
        const std::vector<IterDomain*>& logical_domain =
            TensorDomain::noReductions(out_tv->getLogicalDomain());
    
        NVF_ERROR_EQ(tensor_sizes.size(), logical_domain.size());
    
        llvm::ArrayType* sizes_type = getInt64StaticArrayType(
            context, static_cast<int64_t>(tensor_sizes.size()));
        llvm::Value* sizes_array =
            builder().CreateAlloca(sizes_type, nullptr, "sizes");
        for (auto [i, tensor_size] : enumerate(tensor_sizes)) {
          llvm::Value* gep = builder().CreateInBoundsGEP(
              sizes_type,
              sizes_array,
              {builder().getInt32(0), builder().getInt32(i)});
          builder().CreateStore(tensor_size, gep);
        }
    
        llvm::Value* sizes_ptr =
            builder().CreateBitCast(sizes_array, getInt64PtrType(context));
        out_tensor = builder().CreateCall(
            module->getFunction(kReshapeFuncName),
            {in_tensor,
             sizes_ptr,
             builder().getInt64(static_cast<int64_t>(tensor_sizes.size()))});
        valToValue()[out_tv] = out_tensor;
      }
    
      void handle(LoadStoreOp* load_store_op) final {
        NVF_ERROR(
            load_store_op->opType() == LoadStoreOpType::Set ||
            load_store_op->opType() == LoadStoreOpType::SegmenterSet);
        NVF_ERROR(
            load_store_op->out()->isA<TensorView>(), "out must be a TensorView");
        auto* in_tv = load_store_op->in()->as<TensorView>();
        auto* out_tv = load_store_op->out()->as<TensorView>();
        llvm::Value* in_tensor = getOrDefault(valToValue(), in_tv);
        NVF_ERROR(in_tensor != nullptr)
        // we assume all output tensors are already created, either through new or
        // allocated
        llvm::Value* out_tensor = getOrDefault(valToValue(), out_tv);
        NVF_ERROR(out_tensor == nullptr)
    
        llvm::Module* module = builder().GetInsertBlock()->getParent()->getParent();
        llvm::LLVMContext& context = builder().getContext();
    
        if (out_tv->hasRoot()) {
          std::optional<std::vector<int64_t>> permutation =
              ir_utils::computePermutation(
                  out_tv->getRootDomain(), out_tv->getLogicalDomain());
          NVF_ERROR(
              permutation.has_value(),
              "The logical domain of a Set.Permute is supposed to be a permutation"
              " of the root domain: ",
              out_tv);
    
          // Create array of permutation values
          const auto& perm = valueOrError(permutation);
          llvm::ArrayType* perm_array_type =
              getInt64StaticArrayType(context, static_cast<int64_t>(perm.size()));
          llvm::Value* perm_array =
              builder().CreateAlloca(perm_array_type, nullptr, "permutation");
    
          for (auto [i, extent] : enumerate(perm)) {
            llvm::Value* gep = builder().CreateInBoundsGEP(
                perm_array_type,
                perm_array,
                {builder().getInt32(0), builder().getInt32(i)});
            builder().CreateStore(builder().getInt64(extent), gep);
          }
    
          llvm::Type* int64_ptr_type = getInt64PtrType(context);
          llvm::Value* perm_ptr =
              builder().CreateBitCast(perm_array, int64_ptr_type);
          llvm::Value* perm_size =
              builder().getInt64(static_cast<int64_t>(perm.size()));
          out_tensor = builder().CreateCall(
              module->getFunction(kPermuteFuncName),
              {in_tensor, perm_ptr, perm_size},
              "permute");
          valToValue()[out_tv] = out_tensor;
          return;
        }
        out_tensor = builder().CreateCall(
            module->getFunction(kSetTensorFuncName), {in_tensor}, "set");
        valToValue()[out_tv] = out_tensor;
      }
    
      void handle(MatmulOp* matmul_op) final {
        llvm::Module* module = builder().GetInsertBlock()->getParent()->getParent();
    
        llvm::Value* a = getOrDefault(valToValue(), matmul_op->inA());
        llvm::Value* b = getOrDefault(valToValue(), matmul_op->inB());
        llvm::Value* out = getOrDefault(valToValue(), matmul_op->out());
        NVF_ERROR(out != nullptr);
        builder().CreateCall(module->getFunction(kMatmulOutFuncName), {out, a, b});
      }
    
      void handle(LinearOp* linear_op) final {
        llvm::Module* module = builder().GetInsertBlock()->getParent()->getParent();
        llvm::LLVMContext& context = builder().getContext();
    
        llvm::Value* in = getOrDefault(valToValue(), linear_op->inA());
        NVF_ERROR(in != nullptr)
        llvm::Value* weight = getOrDefault(valToValue(), linear_op->inB());
        NVF_ERROR(weight != nullptr)
        llvm::Value* out = getOrDefault(valToValue(), linear_op->out());
    
        llvm::Value* bias = nullptr;
        if (linear_op->hasBias()) {
          bias = getOrDefault(valToValue(), linear_op->bias());
          NVF_ERROR(bias != nullptr)
        } else {
          // Create a proper null pointer for LLVM
          auto* tensor_type = getTensorPtrType(context);
          bias = llvm::ConstantPointerNull::get(
              llvm::cast<llvm::PointerType>(tensor_type));
        }
        NVF_ERROR(out != nullptr);
        builder().CreateCall(
            module->getFunction(kLinearOutFuncName), {out, in, weight, bias});
      }
    
      void handle(hir::LaunchKernel* launch_kernel) final {
        llvm::Module* module = builder().GetInsertBlock()->getParent()->getParent();
        llvm::LLVMContext& context = builder().getContext();
        auto* void_ptr_type = getInt8PtrType(context);
        auto* void_array_ptr_type = getInt8PtrDynamicArrayType(context);
    
        // Get index type from CompiledKernel
        PrimDataType index_type =
            launch_kernel->compiledKernel()->kernel()->indexType();
    
        // Pack each input/output argument using LLVM IR
        llvm::SmallVector<llvm::Value*, 16> packed_buffers;
    
        // Helper lambda to pack a single Val (tensor or scalar)
        auto packArgument = [&](Val* val) {
          if (auto* tv = dynamic_cast<TensorView*>(val)) {
            // Pack tensor argument
            llvm::Value* tensor = getOrDefault(valToValue(), tv);
            NVF_ERROR(
                tensor != nullptr, "Tensor not found in val_to_value map: ", val);
            packed_buffers.push_back(packTensorArgument(
                tensor, tv, index_type, valToValue(), builder()));
          } else {
            // Pack scalar argument
            llvm::Value* scalar = getOrDefault(valToValue(), val);
            NVF_ERROR(
                scalar != nullptr, "Scalar not found in val_to_value map: ", val);
    
            // For scalars, we need to create a stack allocation and get its pointer
            // The scalar value is already an LLVM value (e.g., i64)
            // We need to store it in memory and pass a pointer to that memory
            llvm::Value* scalar_alloca = builder().CreateAlloca(scalar->getType());
            builder().CreateStore(scalar, scalar_alloca);
    
            // Cast to i8* (void*)
            llvm::Value* scalar_ptr =
                builder().CreateBitCast(scalar_alloca, void_ptr_type);
            packed_buffers.push_back(scalar_ptr);
          }
        };
    
        // Pack inputs
        for (auto* in : launch_kernel->inputs()) {
          packArgument(in);
        }
    
        // Pack outputs
        for (auto* out : launch_kernel->outputs()) {
          packArgument(out);
        }
    
        // Create kernel_args array (void**)
        auto* args_array_type =
            llvm::ArrayType::get(void_ptr_type, packed_buffers.size());
        llvm::Value* args_array =
            builder().CreateAlloca(args_array_type, nullptr, "kernel_args_array");
    
        for (auto [i, packed_buffer] : enumerate(packed_buffers)) {
          llvm::Value* gep = builder().CreateInBoundsGEP(
              args_array_type,
              args_array,
              {builder().getInt32(0), builder().getInt32(i)});
          builder().CreateStore(packed_buffer, gep);
        }
    
        // Cast to void**
        llvm::Value* args_array_ptr =
            builder().CreateBitCast(args_array, void_array_ptr_type);
    
        // Get launch parameters from LaunchParams (compile-time constants)
        const LaunchParams& lp = launch_kernel->launchParams();
    
        llvm::Value* gdimx = builder().getInt64(lp.gdimx());
        llvm::Value* gdimy = builder().getInt64(lp.gdimy());
        llvm::Value* gdimz = builder().getInt64(lp.gdimz());
        llvm::Value* bdimx = builder().getInt64(lp.bdimx());
        llvm::Value* bdimy = builder().getInt64(lp.bdimy());
        llvm::Value* bdimz = builder().getInt64(lp.bdimz());
        llvm::Value* smem = builder().getInt64(lp.smem());
    
        // Get CUDA function pointer from CompiledKernel
        CUfunction cuda_function =
            launch_kernel->compiledKernel()->cudaExecutable()->function;
        llvm::Value* function_ptr = builder().CreateIntToPtr(
            builder().getInt64(reinterpret_cast<uintptr_t>(cuda_function)),
            void_ptr_type);
    
        // Call launch_kernel_direct with all parameters
        builder().CreateCall(
            module->getFunction(kLaunchKernelDirectFuncName),
            {args_array_ptr,
             function_ptr,
             gdimx,
             gdimy,
             gdimz,
             bdimx,
             bdimy,
             bdimz,
             smem});
      }
    
      void handle(kir::Allocate* allocate) final {
        llvm::LLVMContext& context = builder().getContext();
        llvm::Module* module = builder().GetInsertBlock()->getParent()->getParent();
    
        llvm::Type* int64_ptr_type = getInt64PtrType(context);
    
        llvm::SmallVector<llvm::Value*, kMaxTensorDim> tensor_sizes;
        llvm::SmallVector<llvm::Value*, kMaxTensorDim> tensor_strides;
        inferTensorShapesAndStrides(
            allocate->buffer()->as<TensorView>(),
            valToValue(),
            builder(),
            tensor_sizes,
            tensor_strides);
    
        const std::vector<IterDomain*>& logical_domain = TensorDomain::noReductions(
            allocate->buffer()->as<TensorView>()->getLogicalDomain());
    
        NVF_ERROR_EQ(tensor_sizes.size(), logical_domain.size());
    
        llvm::ArrayType* sizes_type = getInt64StaticArrayType(
            context, static_cast<int64_t>(tensor_sizes.size()));
        llvm::ArrayType* strides_type = getInt64StaticArrayType(
            context, static_cast<int64_t>(tensor_strides.size()));
    
        llvm::Value* sizes = builder().CreateAlloca(sizes_type, nullptr, "sizes");
        llvm::Value* strides =
            builder().CreateAlloca(strides_type, nullptr, "strides");
    
        for (const auto [i, size] : enumerate(tensor_sizes)) {
          llvm::Value* gep = builder().CreateInBoundsGEP(
              sizes_type, sizes, {builder().getInt32(0), builder().getInt32(i)});
          builder().CreateStore(size, gep);
        }
    
        for (const auto [i, stride] : enumerate(tensor_strides)) {
          llvm::Value* gep = builder().CreateInBoundsGEP(
              strides_type,
              strides,
              {builder().getInt32(0), builder().getInt32(i)});
          builder().CreateStore(stride, gep);
        }
    
        // Convert arrays to pointers
        llvm::Value* sizes_arg = builder().CreateBitCast(sizes, int64_ptr_type);
        llvm::Value* strides_arg = builder().CreateBitCast(strides, int64_ptr_type);
    
        // Create array size arguments
        llvm::Value* shape_ndim_arg =
            builder().getInt64(static_cast<int64_t>(tensor_sizes.size()));
        llvm::Value* strides_ndim_arg =
            builder().getInt64(static_cast<int64_t>(tensor_strides.size()));
    
        // Create output tensor
        llvm::Value* out_tensor = builder().CreateCall(
            module->getFunction(kNewTensorFuncName), {}, "out_tensor");
    
        // Create constants for type and device from params
        at::ScalarType data_type = data_type_to_aten(
            allocate->buffer()->dtype() == DataType::Index
                ? PrimDataType::Int
                : allocate->buffer()->dtype());
        llvm::Value* dtype_constant =
            builder().getInt32(static_cast<int32_t>(data_type));
        llvm::Value* device_index_constant =
            builder().getInt64(Communicator::getInstance().deviceId());
    
        // Configure output tensor
        llvm::Function* at_empty_strided_cuda_func =
            module->getFunction(kAtEmptyStridedCudaWrapper);
    
        // Call at::native::empty_strided_cuda with the computed arguments
        builder().CreateCall(
            at_empty_strided_cuda_func,
            {sizes_arg,
             shape_ndim_arg,
             strides_arg,
             strides_ndim_arg,
             dtype_constant,
             device_index_constant,
             out_tensor});
        valToValue()[allocate->buffer()] = out_tensor;
      }
    
      void handle(hir::Deallocate* deallocate) final {
        llvm::Module* module = builder().GetInsertBlock()->getParent()->getParent();
        llvm::Function* delete_tensor_func =
            module->getFunction(kDeleteTensorFuncName);
        builder().CreateCall(
            delete_tensor_func, {valToValue().at(deallocate->buffer())});
      }
    
     private:
      llvm::IRBuilder<>& builder() {
        return *builder_;
      }
      std::unordered_map<Val*, llvm::Value*>& valToValue() {
        return *val_to_value_;
      }
    
      llvm::IRBuilder<>* builder_;
      std::unordered_map<Val*, llvm::Value*>* val_to_value_;
    };
    Error Handling Robustness

    The change from .value() to valueOrError() on line 984 improves error handling, but the static_cast<int64_t>() could potentially truncate values if the cache ID exceeds int64_t range, though this is likely not a practical concern for typical use cases.

    static_cast<int64_t>(valueOrError(args.getCacheId())),
    Template Parameter Change

    The change from rvalue reference parameter (llvm::Expected&&) to value parameter (llvm::Expected) in the throwIfError template function should be reviewed to ensure it doesn't impact performance or move semantics, though it likely resolves a clang-tidy warning about unnecessary perfect forwarding.

    T throwIfError(llvm::Expected<T> E) {
      if (!E) {
        throwIfError(E.takeError());
      }
      return std::move(*E);

    @greptile-apps
    Copy link
    Contributor

    greptile-apps bot commented Feb 24, 2026

    Greptile Summary

    This PR adds debug output capability for host_ir and resolves multiple clang-tidy warnings through defensive coding improvements.

    Key changes:

    • Added debug dump for host_ir when DebugDumpOption::HostIr is enabled (lines 928-930)
    • Fixed implicit type conversions by adding static_cast<int64_t> for all size_t to int64_t conversions
    • Initialized main_func_ member to nullptr for safer default state
    • Removed rvalue reference from throwIfError template to avoid move-related issues
    • Refactored HostIrCompileDispatcher to use pointers instead of references with accessor methods
    • Replaced .value() with valueOrError() for better error messaging
    • Added NOLINTNEXTLINE suppressions for intentional performance-no-int-to-ptr casts where reinterpret_cast is required

    All clang-tidy warnings appear to have been properly addressed with consistent application of fixes throughout the file.

    Confidence Score: 5/5

    • This PR is safe to merge - all changes are defensive improvements and clang-tidy fixes
    • PR contains only code quality improvements: explicit type casts to resolve warnings, better initialization, improved error handling with valueOrError, and addition of debug output. All changes follow best practices and maintain existing behavior.
    • No files require special attention

    Important Files Changed

    Filename Overview
    csrc/host_ir/jit.cpp Clang-tidy fixes for type conversions and initialization, plus debug dump for host_ir - all changes are safe

    Last reviewed commit: 47d2f08

    Copy link
    Contributor

    @greptile-apps greptile-apps bot left a comment

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    1 file reviewed, 1 comment

    Edit Code Review Agent Settings | Greptile

    Copy link
    Contributor

    @greptile-apps greptile-apps bot left a comment

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    1 file reviewed, no comments

    Edit Code Review Agent Settings | Greptile

    Copy link
    Contributor

    @greptile-apps greptile-apps bot left a comment

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    1 file reviewed, no comments

    Edit Code Review Agent Settings | Greptile

    @Priya2698 Priya2698 requested a review from wujingyue February 24, 2026 22:44
    @Priya2698
    Copy link
    Collaborator Author

    !test

    @Priya2698 Priya2698 merged commit 005f7e3 into main Feb 25, 2026
    52 of 53 checks passed
    @Priya2698 Priya2698 deleted the pm/debug_host_ir branch February 25, 2026 03:06
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

    Labels

    None yet

    Projects

    None yet

    Development

    Successfully merging this pull request may close these issues.

    2 participants