Skip to content

Conversation

yucai-intel
Copy link
Contributor

Fixed the following issues found by test/test_nn.py::TestNNDeviceTypeXPU::test_nll_loss_large_tensor_reduction_mean_xpu and test_nll_loss_large_tensor_reduction_sum_xpu

  1. Segmentation faults caused by pointer type conversion errors that result in invalid memory addresses.
  2. Kernel call errors caused by incorrect judgment conditions.

@yucai-intel
Copy link
Contributor Author

issue link #2008

@yucai-intel
Copy link
Contributor Author

image

@CuiYifeng CuiYifeng requested a review from Copilot September 30, 2025 01:37
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes segmentation faults and kernel call errors in the NLLLoss kernel implementation for XPU devices. The changes refactor the kernel functors to use safer memory access patterns and more consistent parameter ordering.

Key changes include:

  • Complete rewrite of kernel functors with improved memory safety and bounds checking
  • Simplified function signatures with reordered parameters for better consistency
  • Addition of proper index validation and overflow protection

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File Description
src/ATen/native/xpu/sycl/LossNLLKernel.h Updated function signatures to reorder parameters and use consistent naming
src/ATen/native/xpu/sycl/LossNLLKernel.cpp Major refactor of kernel implementations with improved memory safety and bounds checking
src/ATen/native/xpu/sycl/KernelUtils.h Added utility constants and functions for kernel execution
src/ATen/native/xpu/LossNLL.cpp Updated function calls to match new kernel signatures

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

const index_t t = target[i];
if (t != ignore_index) {
CHECK_INDEX_IN_CLASS(t, n_classes);
const bwd_index_t index = static_cast<bwd_index_t>(i) * ndim + t;
Copy link

Copilot AI Sep 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The index calculation static_cast<bwd_index_t>(i) * ndim + t could potentially overflow for large tensors. Consider adding overflow checks or using safer arithmetic operations.

Copilot uses AI. Check for mistakes.

Comment on lines 497 to 498
int64_t local_size =
syclMaxWorkGroupSize<NllLossForwardReduce2DKernel>();
Copy link

Copilot AI Sep 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The local_size is calculated but then overridden by nthreads on line 513-514. This could lead to inefficient kernel launches if nthreads doesn't match optimal work group sizes.

Copilot uses AI. Check for mistakes.

Comment on lines +631 to +632
nll_loss_threads(input.size(0)),
nll_loss_threads(input.size(0)),
Copy link

Copilot AI Sep 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function nll_loss_threads() is called twice with the same argument. Consider storing the result in a variable to avoid redundant computation.

Copilot uses AI. Check for mistakes.

@yucai-intel
Copy link
Contributor Author

yucai-intel commented Oct 15, 2025

Pref
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants