Dump `host_ir` in HostIrJit, resolve clangtidy errors by Priya2698 · Pull Request #6010 · NVIDIA/Fuser

Priya2698 · 2026-02-24T21:41:19Z

No description provided.

Priya2698 · 2026-02-24T21:41:25Z

!test

github-actions · 2026-02-24T21:44:13Z

Review updated until commit 47d2f08

Description

Initialize main_func_ member to nullptr to prevent undefined behavior
Fix template function parameter to avoid dangling reference issues
Convert member references to pointers with getter methods for better safety
Add debug dump option for HostIR in compilation process
Apply clang-tidy fixes including performance and safety annotations

Changes walkthrough

Relevant files

Enhancement

jit.cpp `HostIR debug dump and clangtidy fixes` csrc/host_ir/jit.cpp Initialize `main_func_` member variable to nullptr Fix `throwIfError` template function parameter type Convert reference members to pointers with getter methods in `HostIrCompileDispatcher` Add debug dump functionality for HostIR compilation Apply clang-tidy fixes with NOLINTNEXTLINE annotations Fix function call to use `valueOrError` with proper casting	+121/-100

PR Reviewer Guide

Here are some key observations to aid the review process:

🧪 No relevant tests

⚡ Recommended focus areas for review

Null Pointer Safety

The refactoring changes member variables from direct references to pointers (builder_ and val_to_value_) which could potentially lead to null pointer dereferences if not properly initialized. While the current usage appears safe, this change increases the risk surface area.

      : builder_(&builder), val_to_value_(&val_to_value) {}
  using OptInDispatch::handle;

  void handle(ReshapeOp* vop) final {
    auto* in_tv = vop->in()->as<TensorView>();
    auto* out_tv = vop->out()->as<TensorView>();
    llvm::Value* in_tensor = getOrDefault(valToValue(), in_tv);
    NVF_ERROR(in_tensor != nullptr)
    llvm::Value* out_tensor = getOrDefault(valToValue(), out_tv);
    NVF_ERROR(out_tensor == nullptr)

    llvm::Module* module = builder().GetInsertBlock()->getParent()->getParent();
    llvm::LLVMContext& context = builder().getContext();

    llvm::SmallVector<llvm::Value*, kMaxTensorDim> tensor_sizes;
    llvm::SmallVector<llvm::Value*, kMaxTensorDim> tensor_strides;
    inferTensorShapesAndStrides(
        out_tv, valToValue(), builder(), tensor_sizes, tensor_strides);

    const std::vector<IterDomain*>& logical_domain =
        TensorDomain::noReductions(out_tv->getLogicalDomain());

    NVF_ERROR_EQ(tensor_sizes.size(), logical_domain.size());

    llvm::ArrayType* sizes_type = getInt64StaticArrayType(
        context, static_cast<int64_t>(tensor_sizes.size()));
    llvm::Value* sizes_array =
        builder().CreateAlloca(sizes_type, nullptr, "sizes");
    for (auto [i, tensor_size] : enumerate(tensor_sizes)) {
      llvm::Value* gep = builder().CreateInBoundsGEP(
          sizes_type,
          sizes_array,
          {builder().getInt32(0), builder().getInt32(i)});
      builder().CreateStore(tensor_size, gep);
    }

    llvm::Value* sizes_ptr =
        builder().CreateBitCast(sizes_array, getInt64PtrType(context));
    out_tensor = builder().CreateCall(
        module->getFunction(kReshapeFuncName),
        {in_tensor,
         sizes_ptr,
         builder().getInt64(static_cast<int64_t>(tensor_sizes.size()))});
    valToValue()[out_tv] = out_tensor;
  }

  void handle(LoadStoreOp* load_store_op) final {
    NVF_ERROR(
        load_store_op->opType() == LoadStoreOpType::Set ||
        load_store_op->opType() == LoadStoreOpType::SegmenterSet);
    NVF_ERROR(
        load_store_op->out()->isA<TensorView>(), "out must be a TensorView");
    auto* in_tv = load_store_op->in()->as<TensorView>();
    auto* out_tv = load_store_op->out()->as<TensorView>();
    llvm::Value* in_tensor = getOrDefault(valToValue(), in_tv);
    NVF_ERROR(in_tensor != nullptr)
    // we assume all output tensors are already created, either through new or
    // allocated
    llvm::Value* out_tensor = getOrDefault(valToValue(), out_tv);
    NVF_ERROR(out_tensor == nullptr)

    llvm::Module* module = builder().GetInsertBlock()->getParent()->getParent();
    llvm::LLVMContext& context = builder().getContext();

    if (out_tv->hasRoot()) {
      std::optional<std::vector<int64_t>> permutation =
          ir_utils::computePermutation(
              out_tv->getRootDomain(), out_tv->getLogicalDomain());
      NVF_ERROR(
          permutation.has_value(),
          "The logical domain of a Set.Permute is supposed to be a permutation"
          " of the root domain: ",
          out_tv);

      // Create array of permutation values
      const auto& perm = valueOrError(permutation);
      llvm::ArrayType* perm_array_type =
          getInt64StaticArrayType(context, static_cast<int64_t>(perm.size()));
      llvm::Value* perm_array =
          builder().CreateAlloca(perm_array_type, nullptr, "permutation");

      for (auto [i, extent] : enumerate(perm)) {
        llvm::Value* gep = builder().CreateInBoundsGEP(
            perm_array_type,
            perm_array,
            {builder().getInt32(0), builder().getInt32(i)});
        builder().CreateStore(builder().getInt64(extent), gep);
      }

      llvm::Type* int64_ptr_type = getInt64PtrType(context);
      llvm::Value* perm_ptr =
          builder().CreateBitCast(perm_array, int64_ptr_type);
      llvm::Value* perm_size =
          builder().getInt64(static_cast<int64_t>(perm.size()));
      out_tensor = builder().CreateCall(
          module->getFunction(kPermuteFuncName),
          {in_tensor, perm_ptr, perm_size},
          "permute");
      valToValue()[out_tv] = out_tensor;
      return;
    }
    out_tensor = builder().CreateCall(
        module->getFunction(kSetTensorFuncName), {in_tensor}, "set");
    valToValue()[out_tv] = out_tensor;
  }

  void handle(MatmulOp* matmul_op) final {
    llvm::Module* module = builder().GetInsertBlock()->getParent()->getParent();

    llvm::Value* a = getOrDefault(valToValue(), matmul_op->inA());
    llvm::Value* b = getOrDefault(valToValue(), matmul_op->inB());
    llvm::Value* out = getOrDefault(valToValue(), matmul_op->out());
    NVF_ERROR(out != nullptr);
    builder().CreateCall(module->getFunction(kMatmulOutFuncName), {out, a, b});
  }

  void handle(LinearOp* linear_op) final {
    llvm::Module* module = builder().GetInsertBlock()->getParent()->getParent();
    llvm::LLVMContext& context = builder().getContext();

    llvm::Value* in = getOrDefault(valToValue(), linear_op->inA());
    NVF_ERROR(in != nullptr)
    llvm::Value* weight = getOrDefault(valToValue(), linear_op->inB());
    NVF_ERROR(weight != nullptr)
    llvm::Value* out = getOrDefault(valToValue(), linear_op->out());

    llvm::Value* bias = nullptr;
    if (linear_op->hasBias()) {
      bias = getOrDefault(valToValue(), linear_op->bias());
      NVF_ERROR(bias != nullptr)
    } else {
      // Create a proper null pointer for LLVM
      auto* tensor_type = getTensorPtrType(context);
      bias = llvm::ConstantPointerNull::get(
          llvm::cast<llvm::PointerType>(tensor_type));
    }
    NVF_ERROR(out != nullptr);
    builder().CreateCall(
        module->getFunction(kLinearOutFuncName), {out, in, weight, bias});
  }

  void handle(hir::LaunchKernel* launch_kernel) final {
    llvm::Module* module = builder().GetInsertBlock()->getParent()->getParent();
    llvm::LLVMContext& context = builder().getContext();
    auto* void_ptr_type = getInt8PtrType(context);
    auto* void_array_ptr_type = getInt8PtrDynamicArrayType(context);

    // Get index type from CompiledKernel
    PrimDataType index_type =
        launch_kernel->compiledKernel()->kernel()->indexType();

    // Pack each input/output argument using LLVM IR
    llvm::SmallVector<llvm::Value*, 16> packed_buffers;

    // Helper lambda to pack a single Val (tensor or scalar)
    auto packArgument = [&](Val* val) {
      if (auto* tv = dynamic_cast<TensorView*>(val)) {
        // Pack tensor argument
        llvm::Value* tensor = getOrDefault(valToValue(), tv);
        NVF_ERROR(
            tensor != nullptr, "Tensor not found in val_to_value map: ", val);
        packed_buffers.push_back(packTensorArgument(
            tensor, tv, index_type, valToValue(), builder()));
      } else {
        // Pack scalar argument
        llvm::Value* scalar = getOrDefault(valToValue(), val);
        NVF_ERROR(
            scalar != nullptr, "Scalar not found in val_to_value map: ", val);

        // For scalars, we need to create a stack allocation and get its pointer
        // The scalar value is already an LLVM value (e.g., i64)
        // We need to store it in memory and pass a pointer to that memory
        llvm::Value* scalar_alloca = builder().CreateAlloca(scalar->getType());
        builder().CreateStore(scalar, scalar_alloca);

        // Cast to i8* (void*)
        llvm::Value* scalar_ptr =
            builder().CreateBitCast(scalar_alloca, void_ptr_type);
        packed_buffers.push_back(scalar_ptr);
      }
    };

    // Pack inputs
    for (auto* in : launch_kernel->inputs()) {
      packArgument(in);
    }

    // Pack outputs
    for (auto* out : launch_kernel->outputs()) {
      packArgument(out);
    }

    // Create kernel_args array (void**)
    auto* args_array_type =
        llvm::ArrayType::get(void_ptr_type, packed_buffers.size());
    llvm::Value* args_array =
        builder().CreateAlloca(args_array_type, nullptr, "kernel_args_array");

    for (auto [i, packed_buffer] : enumerate(packed_buffers)) {
      llvm::Value* gep = builder().CreateInBoundsGEP(
          args_array_type,
          args_array,
          {builder().getInt32(0), builder().getInt32(i)});
      builder().CreateStore(packed_buffer, gep);
    }

    // Cast to void**
    llvm::Value* args_array_ptr =
        builder().CreateBitCast(args_array, void_array_ptr_type);

    // Get launch parameters from LaunchParams (compile-time constants)
    const LaunchParams& lp = launch_kernel->launchParams();

    llvm::Value* gdimx = builder().getInt64(lp.gdimx());
    llvm::Value* gdimy = builder().getInt64(lp.gdimy());
    llvm::Value* gdimz = builder().getInt64(lp.gdimz());
    llvm::Value* bdimx = builder().getInt64(lp.bdimx());
    llvm::Value* bdimy = builder().getInt64(lp.bdimy());
    llvm::Value* bdimz = builder().getInt64(lp.bdimz());
    llvm::Value* smem = builder().getInt64(lp.smem());

    // Get CUDA function pointer from CompiledKernel
    CUfunction cuda_function =
        launch_kernel->compiledKernel()->cudaExecutable()->function;
    llvm::Value* function_ptr = builder().CreateIntToPtr(
        builder().getInt64(reinterpret_cast<uintptr_t>(cuda_function)),
        void_ptr_type);

    // Call launch_kernel_direct with all parameters
    builder().CreateCall(
        module->getFunction(kLaunchKernelDirectFuncName),
        {args_array_ptr,
         function_ptr,
         gdimx,
         gdimy,
         gdimz,
         bdimx,
         bdimy,
         bdimz,
         smem});
  }

  void handle(kir::Allocate* allocate) final {
    llvm::LLVMContext& context = builder().getContext();
    llvm::Module* module = builder().GetInsertBlock()->getParent()->getParent();

    llvm::Type* int64_ptr_type = getInt64PtrType(context);

    llvm::SmallVector<llvm::Value*, kMaxTensorDim> tensor_sizes;
    llvm::SmallVector<llvm::Value*, kMaxTensorDim> tensor_strides;
    inferTensorShapesAndStrides(
        allocate->buffer()->as<TensorView>(),
        valToValue(),
        builder(),
        tensor_sizes,
        tensor_strides);

    const std::vector<IterDomain*>& logical_domain = TensorDomain::noReductions(
        allocate->buffer()->as<TensorView>()->getLogicalDomain());

    NVF_ERROR_EQ(tensor_sizes.size(), logical_domain.size());

    llvm::ArrayType* sizes_type = getInt64StaticArrayType(
        context, static_cast<int64_t>(tensor_sizes.size()));
    llvm::ArrayType* strides_type = getInt64StaticArrayType(
        context, static_cast<int64_t>(tensor_strides.size()));

    llvm::Value* sizes = builder().CreateAlloca(sizes_type, nullptr, "sizes");
    llvm::Value* strides =
        builder().CreateAlloca(strides_type, nullptr, "strides");

    for (const auto [i, size] : enumerate(tensor_sizes)) {
      llvm::Value* gep = builder().CreateInBoundsGEP(
          sizes_type, sizes, {builder().getInt32(0), builder().getInt32(i)});
      builder().CreateStore(size, gep);
    }

    for (const auto [i, stride] : enumerate(tensor_strides)) {
      llvm::Value* gep = builder().CreateInBoundsGEP(
          strides_type,
          strides,
          {builder().getInt32(0), builder().getInt32(i)});
      builder().CreateStore(stride, gep);
    }

    // Convert arrays to pointers
    llvm::Value* sizes_arg = builder().CreateBitCast(sizes, int64_ptr_type);
    llvm::Value* strides_arg = builder().CreateBitCast(strides, int64_ptr_type);

    // Create array size arguments
    llvm::Value* shape_ndim_arg =
        builder().getInt64(static_cast<int64_t>(tensor_sizes.size()));
    llvm::Value* strides_ndim_arg =
        builder().getInt64(static_cast<int64_t>(tensor_strides.size()));

    // Create output tensor
    llvm::Value* out_tensor = builder().CreateCall(
        module->getFunction(kNewTensorFuncName), {}, "out_tensor");

    // Create constants for type and device from params
    at::ScalarType data_type = data_type_to_aten(
        allocate->buffer()->dtype() == DataType::Index
            ? PrimDataType::Int
            : allocate->buffer()->dtype());
    llvm::Value* dtype_constant =
        builder().getInt32(static_cast<int32_t>(data_type));
    llvm::Value* device_index_constant =
        builder().getInt64(Communicator::getInstance().deviceId());

    // Configure output tensor
    llvm::Function* at_empty_strided_cuda_func =
        module->getFunction(kAtEmptyStridedCudaWrapper);

    // Call at::native::empty_strided_cuda with the computed arguments
    builder().CreateCall(
        at_empty_strided_cuda_func,
        {sizes_arg,
         shape_ndim_arg,
         strides_arg,
         strides_ndim_arg,
         dtype_constant,
         device_index_constant,
         out_tensor});
    valToValue()[allocate->buffer()] = out_tensor;
  }

  void handle(hir::Deallocate* deallocate) final {
    llvm::Module* module = builder().GetInsertBlock()->getParent()->getParent();
    llvm::Function* delete_tensor_func =
        module->getFunction(kDeleteTensorFuncName);
    builder().CreateCall(
        delete_tensor_func, {valToValue().at(deallocate->buffer())});
  }

 private:
  llvm::IRBuilder<>& builder() {
    return *builder_;
  }
  std::unordered_map<Val*, llvm::Value*>& valToValue() {
    return *val_to_value_;
  }

  llvm::IRBuilder<>* builder_;
  std::unordered_map<Val*, llvm::Value*>* val_to_value_;
};

Error Handling Robustness

The change from .value() to valueOrError() on line 984 improves error handling, but the static_cast<int64_t>() could potentially truncate values if the cache ID exceeds int64_t range, though this is likely not a practical concern for typical use cases.

static_cast<int64_t>(valueOrError(args.getCacheId())),

Template Parameter Change

The change from rvalue reference parameter (llvm::Expected&&) to value parameter (llvm::Expected) in the throwIfError template function should be reviewed to ensure it doesn't impact performance or move semantics, though it likely resolves a clang-tidy warning about unnecessary perfect forwarding.

T throwIfError(llvm::Expected<T> E) {
  if (!E) {
    throwIfError(E.takeError());
  }
  return std::move(*E);

greptile-apps · 2026-02-24T21:45:19Z

Greptile Summary

This PR adds debug output capability for host_ir and resolves multiple clang-tidy warnings through defensive coding improvements.

Key changes:

Added debug dump for host_ir when DebugDumpOption::HostIr is enabled (lines 928-930)
Fixed implicit type conversions by adding static_cast<int64_t> for all size_t to int64_t conversions
Initialized main_func_ member to nullptr for safer default state
Removed rvalue reference from throwIfError template to avoid move-related issues
Refactored HostIrCompileDispatcher to use pointers instead of references with accessor methods
Replaced .value() with valueOrError() for better error messaging
Added NOLINTNEXTLINE suppressions for intentional performance-no-int-to-ptr casts where reinterpret_cast is required

All clang-tidy warnings appear to have been properly addressed with consistent application of fixes throughout the file.

Confidence Score: 5/5

This PR is safe to merge - all changes are defensive improvements and clang-tidy fixes
PR contains only code quality improvements: explicit type casts to resolve warnings, better initialization, improved error handling with valueOrError, and addition of debug output. All changes follow best practices and maintain existing behavior.
No files require special attention

Important Files Changed

Filename	Overview
csrc/host_ir/jit.cpp	Clang-tidy fixes for type conversions and initialization, plus debug dump for host_ir - all changes are safe

_{Last reviewed commit: 47d2f08}

greptile-apps

_{1 file reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

csrc/host_ir/jit.cpp

greptile-apps

_{1 file reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps

_{1 file reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

Priya2698 · 2026-02-24T23:24:07Z

!test

debug option and clangtidy

33d5534

greptile-apps bot reviewed Feb 24, 2026

View reviewed changes

csrc/host_ir/jit.cpp Outdated Show resolved Hide resolved

more clangtidy fixes

7849582

greptile-apps bot reviewed Feb 24, 2026

View reviewed changes

some more clangtidy

47d2f08

greptile-apps bot reviewed Feb 24, 2026

View reviewed changes

Priya2698 requested a review from wujingyue February 24, 2026 22:44

wujingyue approved these changes Feb 25, 2026

View reviewed changes

Priya2698 merged commit 005f7e3 into main Feb 25, 2026
52 of 53 checks passed

Priya2698 deleted the pm/debug_host_ir branch February 25, 2026 03:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dump `host_ir` in HostIrJit, resolve clangtidy errors#6010

Dump `host_ir` in HostIrJit, resolve clangtidy errors#6010
Priya2698 merged 3 commits intomainfrom
pm/debug_host_ir

Priya2698 commented Feb 24, 2026

Uh oh!

Priya2698 commented Feb 24, 2026

Uh oh!

github-actions bot commented Feb 24, 2026 •

edited

Loading

Changes walkthrough

PR Reviewer Guide

Uh oh!

greptile-apps bot commented Feb 24, 2026 •

edited

Loading

Uh oh!

greptile-apps bot left a comment

Uh oh!

Uh oh!

greptile-apps bot left a comment

Uh oh!

greptile-apps bot left a comment

Uh oh!

Priya2698 commented Feb 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Priya2698 commented Feb 24, 2026

Uh oh!

Priya2698 commented Feb 24, 2026

Uh oh!

github-actions bot commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Changes walkthrough

PR Reviewer Guide

Uh oh!

greptile-apps bot commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Priya2698 commented Feb 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions bot commented Feb 24, 2026 •

edited

Loading

greptile-apps bot commented Feb 24, 2026 •

edited

Loading