-
Notifications
You must be signed in to change notification settings - Fork 60
Fix segmentation fault in NLLLoss kernel #2111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
issue link #2008 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR fixes segmentation faults and kernel call errors in the NLLLoss kernel implementation for XPU devices. The changes refactor the kernel functors to use safer memory access patterns and more consistent parameter ordering.
Key changes include:
- Complete rewrite of kernel functors with improved memory safety and bounds checking
- Simplified function signatures with reordered parameters for better consistency
- Addition of proper index validation and overflow protection
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
File | Description |
---|---|
src/ATen/native/xpu/sycl/LossNLLKernel.h | Updated function signatures to reorder parameters and use consistent naming |
src/ATen/native/xpu/sycl/LossNLLKernel.cpp | Major refactor of kernel implementations with improved memory safety and bounds checking |
src/ATen/native/xpu/sycl/KernelUtils.h | Added utility constants and functions for kernel execution |
src/ATen/native/xpu/LossNLL.cpp | Updated function calls to match new kernel signatures |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
const index_t t = target[i]; | ||
if (t != ignore_index) { | ||
CHECK_INDEX_IN_CLASS(t, n_classes); | ||
const bwd_index_t index = static_cast<bwd_index_t>(i) * ndim + t; |
Copilot
AI
Sep 30, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The index calculation static_cast<bwd_index_t>(i) * ndim + t
could potentially overflow for large tensors. Consider adding overflow checks or using safer arithmetic operations.
Copilot uses AI. Check for mistakes.
int64_t local_size = | ||
syclMaxWorkGroupSize<NllLossForwardReduce2DKernel>(); |
Copilot
AI
Sep 30, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The local_size is calculated but then overridden by nthreads on line 513-514. This could lead to inefficient kernel launches if nthreads doesn't match optimal work group sizes.
Copilot uses AI. Check for mistakes.
nll_loss_threads(input.size(0)), | ||
nll_loss_threads(input.size(0)), |
Copilot
AI
Sep 30, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function nll_loss_threads()
is called twice with the same argument. Consider storing the result in a variable to avoid redundant computation.
Copilot uses AI. Check for mistakes.
Fixed the following issues found by test/test_nn.py::TestNNDeviceTypeXPU::test_nll_loss_large_tensor_reduction_mean_xpu and test_nll_loss_large_tensor_reduction_sum_xpu