diff --git a/IMPROVED_FIX.md b/IMPROVED_FIX.md new file mode 100644 index 00000000..8ff5c073 --- /dev/null +++ b/IMPROVED_FIX.md @@ -0,0 +1,141 @@ +# Improved Fix for Issue #4603 + +## Problem Analysis + +The original fix using `broadcastTensors()` is **insufficient** because: + +1. `broadcastTensors()` only handles **rank mismatch** (e.g., `[3]` vs `[2, 3]`) +2. It does NOT handle **dimension size mismatch** (e.g., `[1]` vs `[2]`) +3. The DINOv3 error shows shapes `[2]` and `[1]` - same rank, different size + +## Root Cause + +TensorRT's `IIfConditionalOutputLayer` requires **exact shape match**, but: +- ONNX allows broadcasting where dimension size 1 can expand to any size N +- Example: `[1]` can broadcast to `[2]` by repeating the element + +## Correct Solution + +We need to implement full ONNX-style broadcasting using the pattern from the `Expand` operator: + +### Algorithm +1. Align ranks by prepending dimensions of size 1 +2. For each dimension, compute the broadcast size: `max(dim1, dim2)` +3. For each tensor, if `dim[i] == 1` and `broadcast_dim[i] > 1`, use stride=0 to expand +4. Use `ISliceLayer` with computed strides to perform the expansion + +### Implementation Options + +#### Option 1: Create a new `broadcastTensorsForConditional()` function +Add to `importerUtils.cpp`: + +```cpp +void broadcastTensorsForConditional(ImporterContext* ctx, nvinfer1::ITensor*& t1, nvinfer1::ITensor*& t2) +{ + // First, align ranks + broadcastTensors(ctx, t1, t2); + + // Now both tensors have the same rank + auto shape1 = shapeOf(*t1); + auto shape2 = shapeOf(*t2); + + // Compute broadcast shape: max(shape1, shape2) element-wise + ShapeTensor broadcastShape = broadcast(ctx, shape1, shape2); + + // Expand t1 if needed + if (!shape1.allValuesKnown() || !broadcastShape.allValuesKnown() || + !std::equal(shape1.begin(), shape1.end(), broadcastShape.begin())) + { + // Compute strides: stride[i] = (shape1[i] > 1) ? 1 : 0 + ShapeTensor const zero = similar(ctx, shape1, 0); + ShapeTensor const one = similar(ctx, shape1, 1); + ShapeTensor const strides1 = min(ctx, one, max(ctx, sub(ctx, shape1, one), zero)); + + nvinfer1::ISliceLayer* slice1 = addSlice(ctx, *t1, zero, broadcastShape, strides1); + ctx->registerLayer(slice1, "ONNXTRT_BroadcastForConditional", nullptr); + t1 = N_CHECK(slice1->getOutput(0)); + } + + // Expand t2 if needed + if (!shape2.allValuesKnown() || !broadcastShape.allValuesKnown() || + !std::equal(shape2.begin(), shape2.end(), broadcastShape.begin())) + { + // Compute strides: stride[i] = (shape2[i] > 1) ? 1 : 0 + ShapeTensor const zero = similar(ctx, shape2, 0); + ShapeTensor const one = similar(ctx, shape2, 1); + ShapeTensor const strides2 = min(ctx, one, max(ctx, sub(ctx, shape2, one), zero)); + + nvinfer1::ISliceLayer* slice2 = addSlice(ctx, *t2, zero, broadcastShape, strides2); + ctx->registerLayer(slice2, "ONNXTRT_BroadcastForConditional", nullptr); + t2 = N_CHECK(slice2->getOutput(0)); + } +} +``` + +#### Option 2: Simpler approach using element-wise trick +Since TensorRT's `IElementWiseLayer` handles broadcasting automatically, we can: +1. Add a dummy element-wise operation (e.g., multiply by 1) +2. Let TensorRT handle the broadcasting +3. Extract the broadcast result + +However, this is a hack and may not work reliably. + +#### Option 3: Use existing Expand logic inline +Directly apply the Expand operator's logic in the If operator: + +```cpp +for (size_t i = 0; i < thenSubgraphTensors.size(); i++) +{ + auto* thenOut = &convertToTensor(thenSubgraphTensors[i], ctx); + auto* elseOut = &convertToTensor(elseSubgraphTensors[i], ctx); + + // Align ranks first + broadcastTensors(ctx, thenOut, elseOut); + + // Now handle dimension size broadcasting + auto thenShape = shapeOf(*thenOut); + auto elseShape = shapeOf(*elseOut); + ShapeTensor broadcastShape = broadcast(ctx, thenShape, elseShape); + + // Expand thenOut if needed + { + ShapeTensor const zero = similar(ctx, thenShape, 0); + ShapeTensor const one = similar(ctx, thenShape, 1); + ShapeTensor const strides = min(ctx, one, max(ctx, sub(ctx, thenShape, one), zero)); + nvinfer1::ISliceLayer* slice = addSlice(ctx, *thenOut, zero, broadcastShape, strides); + ctx->registerLayer(slice, "ONNXTRT_BroadcastThen", nullptr); + thenOut = N_CHECK(slice->getOutput(0)); + } + + // Expand elseOut if needed + { + ShapeTensor const zero = similar(ctx, elseShape, 0); + ShapeTensor const one = similar(ctx, elseShape, 1); + ShapeTensor const strides = min(ctx, one, max(ctx, sub(ctx, elseShape, one), zero)); + nvinfer1::ISliceLayer* slice = addSlice(ctx, *elseOut, zero, broadcastShape, strides); + ctx->registerLayer(slice, "ONNXTRT_BroadcastElse", nullptr); + elseOut = N_CHECK(slice->getOutput(0)); + } + + auto* outputLayer = N_CHECK(conditional->addOutput(*thenOut, *elseOut)); + ctx->registerLayer(outputLayer, std::string(conditional->getName()) + "_OutputLayer", &node); + graphOutputs.emplace_back(N_CHECK(outputLayer->getOutput(0))); +} +``` + +## Recommended Approach + +**Option 3** is recommended because: +1. It's self-contained in the If operator +2. Uses proven logic from the Expand operator +3. Handles both rank and dimension size broadcasting +4. No need to modify importerUtils.cpp + +## Testing + +Test cases needed: +1. `[1]` vs `[2]` - dimension size broadcast +2. `[1, 3]` vs `[2, 3]` - partial dimension broadcast +3. `[3]` vs `[2, 3]` - rank broadcast +4. `[1]` vs `[2, 3]` - combined rank and dimension broadcast +5. `[2, 3]` vs `[2, 3]` - no broadcast needed (existing case) diff --git a/TECHNICAL_DETAILS.md b/TECHNICAL_DETAILS.md new file mode 100644 index 00000000..e13c7b84 --- /dev/null +++ b/TECHNICAL_DETAILS.md @@ -0,0 +1,121 @@ +# Technical Details: If Operator Shape Broadcasting Fix + +## Background + +### ONNX If Operator +The ONNX `If` operator implements conditional execution with two subgraphs: +- **then_branch**: Executed when condition is true +- **else_branch**: Executed when condition is false + +Both branches must produce the same number of outputs, and corresponding outputs should have compatible types and shapes. + +### TensorRT IIfConditional +TensorRT's `IIfConditional` layer requires: +- A boolean scalar condition tensor +- Input layers for external tensors used in subgraphs +- Output layers that merge results from both branches + +**Critical Requirement:** `IIfConditionalOutputLayer::addOutput()` requires both input tensors to have **exactly the same shape** (same rank and same dimensions). + +## The Problem + +### Error Message Analysis +``` +Invalid Node - /bb/rope_embeddings/If +/bb/rope_embeddings/If_OutputLayer: IIfConditionalOutputLayer inputs must have the same shape. +Shapes are [2] and [1]. +``` + +This error indicates: +1. Node name: `/bb/rope_embeddings/If` +2. Then-branch output shape: `[2]` (1D tensor with 2 elements) +3. Else-branch output shape: `[1]` (1D tensor with 1 element) +4. TensorRT rejects this because shapes don't match exactly + +### Why This Happens in DINOv3 + +DINOv3 (Vision Transformer with rotary position embeddings) uses conditional logic for rope embeddings: +```python +# Simplified pseudocode +if condition: + return [freq1, freq2] # Shape: [2] +else: + return [default_freq] # Shape: [1] +``` + +Under ONNX broadcasting rules, `[1]` can be broadcast to `[2]`, making these shapes compatible. However, TensorRT doesn't automatically apply this broadcasting. + +## The Solution + +### Broadcasting Function +The codebase already includes `broadcastTensors()` in `importerUtils.cpp`: + +```cpp +void broadcastTensors(ImporterContext* ctx, nvinfer1::ITensor*& t1, nvinfer1::ITensor*& t2) +{ + int const t1Dims = t1->getDimensions().nbDims; + int const t2Dims = t2->getDimensions().nbDims; + + if (t1Dims == t2Dims) + { + return; // Already same rank + } + + if (t1Dims > t2Dims) + { + return broadcastTensor(ctx, t2, t1Dims); // Broadcast t2 to match t1 + } + return broadcastTensor(ctx, t1, t2Dims); // Broadcast t1 to match t2 +} +``` + +This function: +1. Checks if tensors already have the same number of dimensions +2. If not, calls `broadcastTensor()` to add leading dimensions of size 1 +3. Uses `IShuffleLayer` to reshape the lower-rank tensor + +### Implementation +The fix adds one line before creating the output layer: + +```cpp +// Before fix: +auto* thenOut = &convertToTensor(thenSubgraphTensors[i], ctx); +auto* elseOut = &convertToTensor(elseSubgraphTensors[i], ctx); +auto* outputLayer = N_CHECK(conditional->addOutput(*thenOut, *elseOut)); + +// After fix: +auto* thenOut = &convertToTensor(thenSubgraphTensors[i], ctx); +auto* elseOut = &convertToTensor(elseSubgraphTensors[i], ctx); +broadcastTensors(ctx, thenOut, elseOut); // ← NEW LINE +auto* outputLayer = N_CHECK(conditional->addOutput(*thenOut, *elseOut)); +``` + +## How It Works + +### Example: DINOv3 rope_embeddings + +**Before Broadcasting:** +- `thenOut`: Shape `[2]`, nbDims=1 +- `elseOut`: Shape `[1]`, nbDims=1 + +Both have the same rank (1D), so `broadcastTensors()` returns immediately without changes. + +**Wait, what?** If they have the same rank, why does the error occur? + +The issue is that having the same **number of dimensions** (rank) doesn't mean having the same **shape**. The error message shows shapes `[2]` and `[1]`, which are both 1D tensors but with different sizes. + +### Deeper Analysis + +Looking more carefully at the error and the broadcasting function, I realize the current `broadcastTensors()` only handles **rank mismatch**, not **dimension size mismatch**. + +For example: +- `[1]` vs `[2]`: Same rank (1), different size → NOT handled by current `broadcastTensors()` +- `[1]` vs `[2, 3]`: Different rank → Handled by `broadcastTensors()` + +### The Real Fix Needed + +The actual issue requires **element-wise broadcasting**, not just rank alignment. We need to handle cases where: +- Tensors have the same rank but different dimension sizes +- One tensor has size 1 in a dimension that needs to be broadcast to match the other + +Let me check if there's a more appropriate function or if we need to enhance the fix.