Skip to content

Conversation

@JoeLin2333
Copy link

针对第15题,添加了NVIDIA实现

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request adds NVIDIA/CUDA implementations for 5 mathematical operators to support "question 15". The implementation includes complete test infrastructure, operator registration, CUDA kernels, CPU fallbacks, and Python bindings.

Key changes:

  • Added 5 new operators: atanh, addcmul, cdist, binary_cross_entropy_with_logits, and reciprocal
  • Implemented CUDA kernels for NVIDIA devices with CPU fallbacks
  • Added comprehensive test suites for both infiniop and infinicore layers
  • Updated Python bindings to expose new operators

Reviewed changes

Copilot reviewed 88 out of 88 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
test/infiniop/*.py Test files for all 5 operators with test cases and tolerance maps
test/infiniop/libinfiniop/op_register.py Registration of operator API bindings
src/infiniop/ops/*/operator.cc Operator dispatchers for device-specific implementations
src/infiniop/ops//nvidia/.cu CUDA kernel implementations
src/infiniop/ops//cpu/.cc CPU implementations
src/infinicore/ops//.cc InfiniCore operator implementations
src/infinicore/pybind11/ops/*.hpp Python binding definitions
python/infinicore/ops/*.py Python wrapper functions
include/infiniop/ops/*.h C API header files
include/infinicore/ops/*.hpp C++ API header files

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@@ -0,0 +1,163 @@
import torch
import ctypes
Copy link

Copilot AI Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Module 'ctypes' is imported with both 'import' and 'import from'.

Copilot uses AI. Check for mistakes.
@@ -0,0 +1,171 @@
import torch
import ctypes
Copy link

Copilot AI Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Module 'ctypes' is imported with both 'import' and 'import from'.

Copilot uses AI. Check for mistakes.
@@ -0,0 +1,154 @@
import torch
import ctypes
Copy link

Copilot AI Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Module 'ctypes' is imported with both 'import' and 'import from'.

Copilot uses AI. Check for mistakes.
@@ -0,0 +1,156 @@
import torch
import ctypes
Copy link

Copilot AI Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Module 'ctypes' is imported with both 'import' and 'import from'.

Copilot uses AI. Check for mistakes.
@@ -0,0 +1,165 @@
import torch
import ctypes
Copy link

Copilot AI Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Module 'ctypes' is imported with both 'import' and 'import from'.

Copilot uses AI. Check for mistakes.
@@ -0,0 +1,154 @@
import torch
import ctypes
from ctypes import c_uint64, c_float, c_char_p
Copy link

Copilot AI Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'c_float' is not used.

Copilot uses AI. Check for mistakes.
@@ -0,0 +1,156 @@
import torch
import ctypes
from ctypes import c_uint64, c_float, c_double
Copy link

Copilot AI Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'c_float' is not used.

Copilot uses AI. Check for mistakes.
Comment on lines +17 to +31
constexpr int BCE_MAX_DIMS = 8;

struct BCETensorInfoDevice {
size_t ndim;
size_t shape[BCE_MAX_DIMS];
ptrdiff_t strides[BCE_MAX_DIMS];
};

static inline BCETensorInfoDevice make_device_info(const BCETensorInfo &info) {
BCETensorInfoDevice dev{};
dev.ndim = info.ndim;
for (size_t i = 0; i < info.ndim && i < static_cast<size_t>(BCE_MAX_DIMS); ++i) {
dev.shape[i] = info.dims[i];
dev.strides[i] = info.stride[i];
}
Copy link

Copilot AI Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BCETensorInfoDevice stores shape and strides in fixed-size arrays of length BCE_MAX_DIMS, but make_device_info sets ndim directly from info.ndim and only truncates the arrays when copying. If info.ndim exceeds BCE_MAX_DIMS, later calls to indexToOffset(idx, logits_info.ndim, logits_info.shape, logits_info.strides) will index past the end of these arrays, producing out-of-bounds offsets and causing GPU memory corruption or unintended reads/writes on the logits/targets/outputs buffers. To harden this, either enforce info.ndim <= BCE_MAX_DIMS at descriptor creation or clamp and validate ndim before launching the CUDA kernel so that higher-rank tensors are rejected or handled safely.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant