Skip to content

Normalize bool tensor raw_data to {0, 1} on unpack#29238

Open
jiafatom wants to merge 1 commit into
microsoft:mainfrom
jiafatom:jiafatom/compress-bool-normalize
Open

Normalize bool tensor raw_data to {0, 1} on unpack#29238
jiafatom wants to merge 1 commit into
microsoft:mainfrom
jiafatom:jiafatom/compress-bool-normalize

Conversation

@jiafatom

Copy link
Copy Markdown
Contributor

Description

Bool initializers supplied via TensorProto raw_data are copied verbatim by UnpackTensor<bool>, so their bytes are not guaranteed to be the canonical {0, 1} (the int32_data path normalizes via static_cast<bool>, but the raw_data path did not). Kernels across the codebase assume bool tensors hold {0, 1}.

The CUDA Compress kernel is concretely affected: its output-sizing path sign-extends the condition bytes (int8_tint32_t) through cub::DeviceScan::InclusiveSum, while _CompressKernel selects elements using bool truthiness (condition_data[div]). For condition bytes outside {0, 1} the two interpretations disagree and the output is sized inconsistently with how elements are written. The CPU kernel uses truthiness for both sizing and selection and is unaffected.

Changes

  • UnpackTensor<bool> (tensorprotoutils.cc): normalize raw_data bytes to {0, 1} after copy, so every consumer observes a single consistent value. This is the root-cause fix and applies to all EPs and all bool-consuming kernels.
  • CUDA Compress CastToInt32 (compress_impl.cu): normalize to {0, 1} (still returns int32_t, preserving the accumulator-widening intent of Fix inclusive sum overlfow when applied on int8_t buffer in Compress #9295) so the sizing path matches the kernel's write predicate, matching the CPU kernel and the CUDA NonZero bool(x) convention.
  • Add a unit test in tensorutils_test.cc for bool raw_data with non-canonical bytes. A Compress OpTester test cannot reproduce this because the test harness itself normalizes bool during input construction, so coverage is placed at the deserialization layer. The test uses only Status returns and gtest assertions, so it builds and runs in no-exception builds.

Motivation and Context

CastToInt32 was introduced in #9295 to widen the cub::InclusiveSum accumulator (an int8 overflow fix); it did not normalize the bool interpretation. The accumulator-width and bool-normalization concerns are independent. This change addresses the latter at the source and hardens the CUDA Compress kernel.

Bool initializers provided via TensorProto raw_data are copied verbatim, so
their bytes are not guaranteed to be the canonical {0, 1}. Kernels assume bool
tensors hold {0, 1}, and the CUDA Compress sizing path in particular sign-extends
the condition bytes (int8 -> int32) to size the output while the kernel selects
elements using bool truthiness. For bytes outside {0, 1} the two interpretations
disagree, producing an incorrectly sized output.

Normalize bool raw_data to {0, 1} in UnpackTensor<bool> so every consumer sees a
consistent value, and harden the CUDA Compress CastToInt32 functor to normalize
as well so its sizing path matches its write predicate.

Add a unit test covering bool raw_data with non-canonical bytes.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens ONNX Runtime’s deserialization and CUDA Compress behavior by ensuring boolean tensors always use canonical byte values {0, 1} when coming from TensorProto.raw_data. This prevents downstream kernels (notably CUDA Compress sizing vs. selection) from disagreeing when encountering non-canonical nonzero bytes.

Changes:

  • Normalize unpacked bool tensor raw_data bytes to {0, 1} in UnpackTensor<bool> after copying.
  • Normalize CUDA Compress prefix-sum input values to {0, 1} (while still widening to int32_t) so sizing matches the write predicate.
  • Add a unit test verifying non-canonical raw_data bytes are normalized on unpack.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File Description
onnxruntime/core/framework/tensorprotoutils.cc Normalizes bool raw_data bytes to {0, 1} after unpacking so all consumers observe consistent canonical bool storage.
onnxruntime/core/providers/cuda/tensor/compress_impl.cu Makes CastToInt32 treat any nonzero condition byte as 1 to keep CUDA Compress sizing consistent with its boolean predicate.
onnxruntime/test/framework/tensorutils_test.cc Adds coverage for bool TensorProto.raw_data containing non-canonical bytes, validating normalization at the deserialization layer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants