Normalize bool tensor raw_data to {0, 1} on unpack by jiafatom · Pull Request #29238 · microsoft/onnxruntime

jiafatom · 2026-06-23T20:50:45Z

Description

Bool initializers supplied via TensorProto raw_data are copied verbatim by UnpackTensor<bool>, so their bytes are not guaranteed to be the canonical {0, 1} (the int32_data path normalizes via static_cast<bool>, but the raw_data path did not). Kernels across the codebase assume bool tensors hold {0, 1}.

The CUDA Compress kernel is concretely affected: its output-sizing path sign-extends the condition bytes (int8_t → int32_t) through cub::DeviceScan::InclusiveSum, while _CompressKernel selects elements using bool truthiness (condition_data[div]). For condition bytes outside {0, 1} the two interpretations disagree and the output is sized inconsistently with how elements are written. The CPU kernel uses truthiness for both sizing and selection and is unaffected.

Changes

UnpackTensor<bool> (tensorprotoutils.cc): normalize raw_data bytes to {0, 1} after copy, so every consumer observes a single consistent value. This is the root-cause fix and applies to all EPs and all bool-consuming kernels.
CUDA Compress CastToInt32 (compress_impl.cu): normalize to {0, 1} (still returns int32_t, preserving the accumulator-widening intent of Fix inclusive sum overlfow when applied on int8_t buffer in Compress #9295) so the sizing path matches the kernel's write predicate, matching the CPU kernel and the CUDA NonZero bool(x) convention.
Add a unit test in tensorutils_test.cc for bool raw_data with non-canonical bytes. A Compress OpTester test cannot reproduce this because the test harness itself normalizes bool during input construction, so coverage is placed at the deserialization layer. The test uses only Status returns and gtest assertions, so it builds and runs in no-exception builds.

Motivation and Context

CastToInt32 was introduced in #9295 to widen the cub::InclusiveSum accumulator (an int8 overflow fix); it did not normalize the bool interpretation. The accumulator-width and bool-normalization concerns are independent. This change addresses the latter at the source and hardens the CUDA Compress kernel.

Bool initializers provided via TensorProto raw_data are copied verbatim, so their bytes are not guaranteed to be the canonical {0, 1}. Kernels assume bool tensors hold {0, 1}, and the CUDA Compress sizing path in particular sign-extends the condition bytes (int8 -> int32) to size the output while the kernel selects elements using bool truthiness. For bytes outside {0, 1} the two interpretations disagree, producing an incorrectly sized output. Normalize bool raw_data to {0, 1} in UnpackTensor<bool> so every consumer sees a consistent value, and harden the CUDA Compress CastToInt32 functor to normalize as well so its sizing path matches its write predicate. Add a unit test covering bool raw_data with non-canonical bytes. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

This PR hardens ONNX Runtime’s deserialization and CUDA Compress behavior by ensuring boolean tensors always use canonical byte values {0, 1} when coming from TensorProto.raw_data. This prevents downstream kernels (notably CUDA Compress sizing vs. selection) from disagreeing when encountering non-canonical nonzero bytes.

Changes:

Normalize unpacked bool tensor raw_data bytes to {0, 1} in UnpackTensor<bool> after copying.
Normalize CUDA Compress prefix-sum input values to {0, 1} (while still widening to int32_t) so sizing matches the write predicate.
Add a unit test verifying non-canonical raw_data bytes are normalized on unpack.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File	Description
onnxruntime/core/framework/tensorprotoutils.cc	Normalizes `bool` `raw_data` bytes to `{0, 1}` after unpacking so all consumers observe consistent canonical bool storage.
onnxruntime/core/providers/cuda/tensor/compress_impl.cu	Makes `CastToInt32` treat any nonzero condition byte as `1` to keep CUDA `Compress` sizing consistent with its boolean predicate.
onnxruntime/test/framework/tensorutils_test.cc	Adds coverage for bool `TensorProto.raw_data` containing non-canonical bytes, validating normalization at the deserialization layer.

jiafatom requested a review from Copilot June 24, 2026 04:00

Copilot started reviewing on behalf of jiafatom June 24, 2026 04:00 View session

Copilot AI reviewed Jun 24, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Normalize bool tensor raw_data to {0, 1} on unpack#29238

Normalize bool tensor raw_data to {0, 1} on unpack#29238
jiafatom wants to merge 1 commit into
microsoft:mainfrom
jiafatom:jiafatom/compress-bool-normalize

jiafatom commented Jun 23, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jiafatom commented Jun 23, 2026

Description

Changes

Motivation and Context

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants