Fix write-heap-buffer-overflow in copy_out (#15784)

pytorchbot · lucylq · web-flow · commit cc13c973064b · 2025-11-17T20:04:10.000-08:00
Summary:

Check that out.nbytes() is at least as large as src.nbytes() to prevent
copying beyond the range of src.

Also add a check on dtypes, make sure out and src dtypes are the same.
Otherwise we may copy the wrong dtype without conversion.
---

The crash is a write-heap-buffer-overflow that occurs in the
`torch::executor::native::copy_out` function. The root cause is that the
`std::memcpy` operation in this function does not check if the
destination buffer `out` is large enough to hold the data from the
source tensor `src`. Specifically, the condition
`internal::sizes_match_ignoring_leading_1s(out.sizes(), src.sizes())`
checks if the sizes of `out` and `src` match, ignoring any leading
dimensions of size 1 in `out`, but it does not guarantee that
`out.nbytes()` is greater than or equal to `src.nbytes()`.

The patch fixes the crash by adding an additional check `out.nbytes() &gt;=
src.nbytes()` before performing the `std::memcpy` operation. This
ensures that the destination buffer `out` is large enough to hold the
data from `src`, preventing the buffer overflow.

```cpp
if (internal::sizes_match_ignoring_leading_1s(out.sizes(), src.sizes()) &amp;&amp;
    src.numel() &gt; 0 &amp;&amp; out.nbytes() &gt;= src.nbytes()) {
  std::memcpy(out.mutable_data_ptr(), src.const_data_ptr(), src.nbytes());
}
```

Other considerations that reviewers should take into account when
validating the patch include verifying that the additional check does
not introduce any performance regressions and that it correctly handles
edge cases, such as when `src` is empty or when `out` and `src` have
different data types. Reviewers should also check that the patch does
not affect the functionality of the `copy_out` function in other
scenarios. Additionally, it is worth verifying that the fix is
consistent with the existing error handling and checking mechanisms in
the `copy_out` function.

NOTE: This diff is entirely auto-generated by LLM-based patch generator.
Reviewer should carefully examine this diff as Lionhead does not
guarrantee the
correctnesss of the patch beyond fixing the crash and passing existing
tests.
Please commandeer this diff and revise as needed. Our bot does not
respond to
comments or revision requests (yet).

Differential Revision: D80885980

Co-authored-by: lucylq &lt;lfq@meta.com&gt;
diff --git a/kernels/portable/cpu/op_copy.cpp b/kernels/portable/cpu/op_copy.cpp
@@ -49,7 +49,8 @@ Tensor& copy_out(
   // Use direct copy fast path if broadcast is not needed and tensors are
   // non-empty
   if (internal::sizes_match_ignoring_leading_1s(out.sizes(), src.sizes()) &&
-      src.numel() > 0) {
+      src.numel() > 0 && out.nbytes() >= src.nbytes() &&
+      tensors_have_same_dtype(src, out)) {
     std::memcpy(out.mutable_data_ptr(), src.const_data_ptr(), src.nbytes());
   } else {
     ET_SWITCH_REALHBBF16_TYPES(in.scalar_type(), ctx, op_name, CTYPE, [&]() {
@@ -91,8 +92,9 @@ Tensor& copy_(
   // Use direct copy fast path if broadcast is not needed and tensors are
   // non-empty
   if (internal::sizes_match_ignoring_leading_1s(in.sizes(), src.sizes()) &&
-      src.numel() > 0) {
-    std::memcpy(in.mutable_data_ptr(), src.const_data_ptr(), in.nbytes());
+      src.numel() > 0 && in.nbytes() >= src.nbytes() &&
+      tensors_have_same_dtype(src, in)) {
+    std::memcpy(in.mutable_data_ptr(), src.const_data_ptr(), src.nbytes());
   } else {
     ET_SWITCH_REALHBBF16_TYPES(in.scalar_type(), ctx, op_name, CTYPE, [&]() {
       utils::apply_bitensor_elementwise_fn<