Normalize bool tensor raw_data to {0, 1} on unpack#29238
Open
jiafatom wants to merge 1 commit into
Open
Conversation
Bool initializers provided via TensorProto raw_data are copied verbatim, so
their bytes are not guaranteed to be the canonical {0, 1}. Kernels assume bool
tensors hold {0, 1}, and the CUDA Compress sizing path in particular sign-extends
the condition bytes (int8 -> int32) to size the output while the kernel selects
elements using bool truthiness. For bytes outside {0, 1} the two interpretations
disagree, producing an incorrectly sized output.
Normalize bool raw_data to {0, 1} in UnpackTensor<bool> so every consumer sees a
consistent value, and harden the CUDA Compress CastToInt32 functor to normalize
as well so its sizing path matches its write predicate.
Add a unit test covering bool raw_data with non-canonical bytes.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Contributor
There was a problem hiding this comment.
Pull request overview
This PR hardens ONNX Runtime’s deserialization and CUDA Compress behavior by ensuring boolean tensors always use canonical byte values {0, 1} when coming from TensorProto.raw_data. This prevents downstream kernels (notably CUDA Compress sizing vs. selection) from disagreeing when encountering non-canonical nonzero bytes.
Changes:
- Normalize unpacked
booltensorraw_databytes to{0, 1}inUnpackTensor<bool>after copying. - Normalize CUDA
Compressprefix-sum input values to{0, 1}(while still widening toint32_t) so sizing matches the write predicate. - Add a unit test verifying non-canonical
raw_databytes are normalized on unpack.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| onnxruntime/core/framework/tensorprotoutils.cc | Normalizes bool raw_data bytes to {0, 1} after unpacking so all consumers observe consistent canonical bool storage. |
| onnxruntime/core/providers/cuda/tensor/compress_impl.cu | Makes CastToInt32 treat any nonzero condition byte as 1 to keep CUDA Compress sizing consistent with its boolean predicate. |
| onnxruntime/test/framework/tensorutils_test.cc | Adds coverage for bool TensorProto.raw_data containing non-canonical bytes, validating normalization at the deserialization layer. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Bool initializers supplied via
TensorProtoraw_dataare copied verbatim byUnpackTensor<bool>, so their bytes are not guaranteed to be the canonical{0, 1}(theint32_datapath normalizes viastatic_cast<bool>, but theraw_datapath did not). Kernels across the codebase assume bool tensors hold{0, 1}.The CUDA
Compresskernel is concretely affected: its output-sizing path sign-extends the condition bytes (int8_t→int32_t) throughcub::DeviceScan::InclusiveSum, while_CompressKernelselects elements using bool truthiness (condition_data[div]). For condition bytes outside{0, 1}the two interpretations disagree and the output is sized inconsistently with how elements are written. The CPU kernel uses truthiness for both sizing and selection and is unaffected.Changes
UnpackTensor<bool>(tensorprotoutils.cc): normalizeraw_databytes to{0, 1}after copy, so every consumer observes a single consistent value. This is the root-cause fix and applies to all EPs and all bool-consuming kernels.CompressCastToInt32(compress_impl.cu): normalize to{0, 1}(still returnsint32_t, preserving the accumulator-widening intent of Fix inclusive sum overlfow when applied on int8_t buffer in Compress #9295) so the sizing path matches the kernel's write predicate, matching the CPU kernel and the CUDANonZerobool(x)convention.tensorutils_test.ccfor boolraw_datawith non-canonical bytes. ACompressOpTestertest cannot reproduce this because the test harness itself normalizes bool during input construction, so coverage is placed at the deserialization layer. The test uses only Status returns and gtest assertions, so it builds and runs in no-exception builds.Motivation and Context
CastToInt32was introduced in #9295 to widen thecub::InclusiveSumaccumulator (an int8 overflow fix); it did not normalize the bool interpretation. The accumulator-width and bool-normalization concerns are independent. This change addresses the latter at the source and hardens the CUDACompresskernel.