-
Notifications
You must be signed in to change notification settings - Fork 655
Enable arithmetic operations between device tensors/batches and scalars #6143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Greptile SummaryThis PR fixes a critical bug that prevented arithmetic operations between GPU tensors/batches and scalars, and refactors the arithmetic operation handling logic for better maintainability. Key Changes:
Confidence Score: 5/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant User
participant Tensor/Batch
participant _arithm_op
participant as_tensor
participant _arithmetic_generic_op
User->>Tensor/Batch: x + scalar (e.g., gpu_tensor + 5)
Tensor/Batch->>_arithm_op: __add__(self, scalar)
Note over _arithm_op: Detect device type from<br/>Tensor/Batch arguments
_arithm_op->>_arithm_op: gpu = any(arg.device == "gpu")
alt scalar argument (not Tensor/Batch)
_arithm_op->>_arithm_op: Check _implicitly_convertible(scalar)
alt gpu is True and not implicitly convertible
_arithm_op-->>User: ValueError: Not implicitly copyable to GPU
else convertible
_arithm_op->>as_tensor: as_tensor(scalar, device="gpu" if gpu else None)
as_tensor-->>_arithm_op: Tensor on correct device
end
end
_arithm_op->>_arithm_op: Validate all args on same device
alt mixed GPU/CPU inputs
_arithm_op-->>User: ValueError: Cannot mix GPU and CPU inputs
else all same device
_arithm_op->>_arithmetic_generic_op: Call with tensor args
_arithmetic_generic_op-->>User: Result tensor
end
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additional Comments (3)
-
dali/python/nvidia/dali/experimental/dynamic/_arithmetic.py, line 36 (link)syntax: Typo: "implictly" should be "implicitly"
-
dali/python/nvidia/dali/experimental/dynamic/_arithmetic.py, line 30-46 (link)logic: Logic bug: when a scalar appears before a GPU tensor (e.g.,
3 + gpu_tensor), the scalar gets converted to CPU before detecting the GPU tensor, causing the final check to fail with "Cannot mix GPU and CPU inputs."The algorithm needs two passes:
- First pass: scan all args to detect if any GPU tensor exists
- Second pass: convert scalars to appropriate device
Example failure case:
gpu_tensor = ndd.tensor([1, 2, 3], device="gpu") result = 5 + gpu_tensor # Will raise ValueError
-
dali/test/python/experimental_mode/test_arithm_ops.py, line 128 (link)style: Test only covers
tensor + scalarbut notscalar + tensor. Add reverse operation tests:# Also test scalar + tensor result_reversed = ndd.as_tensor(apply_bin_op(op, scalar, x)) ref_reversed = apply_bin_op(op, scalar, tensor) if not np.allclose(result_reversed.cpu(), ref_reversed): msg = f"{scalar} {op} {tensor} = \n{result_reversed}\n!=\n{ref_reversed}" raise AssertionError(msg)
5 files reviewed, 3 comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additional Comments (1)
-
dali/python/nvidia/dali/experimental/dynamic/_arithmetic.py, line 20 (link)syntax: using
|operator inisinstance()requires Python 3.10+, butpyproject.tomltargets Python 3.8+. Use tuple syntax instead for compatibility.
5 files reviewed, 1 comment
c0cc95a to
2265d73
Compare
2265d73 to
8d39fde
Compare
|
|
||
|
|
||
| def _implicitly_convertible(value: Any): | ||
| return isinstance(value, (SupportsInt, SupportsFloat, list, tuple)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm.... I don't think this'll work:
>>> isinstance(cp.array([0,0]), typing.SupportsFloat) # CuPy, not a scalar
True
>>> isinstance(np.array([0,0]), typing.SupportsFloat) # NumPy, not a scalar
True
>>> isinstance(np.array(["a","b"]), typing.SupportsFloat) # NumPy, not a scalar, not even a numeric type
True
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed by 9319fa9. Now:
>>> isinstance(cp.array([0, 0]), numbers.Real)
False
>>> isinstance(np.array([0, 0]), numbers.Real)
False
>>> isinstance(np.int64(0), numbers.Real)
TrueSigned-off-by: Rostan Tabet <[email protected]>
Signed-off-by: Rostan Tabet <[email protected]>
Signed-off-by: Rostan Tabet <[email protected]>
Signed-off-by: Rostan Tabet <[email protected]>
Signed-off-by: Rostan Tabet <[email protected]>
8d39fde to
9319fa9
Compare
|
!build |
|
CI MESSAGE: [41073770]: BUILD STARTED |
|
CI MESSAGE: [41073770]: BUILD PASSED |
| if gpu and not _implicitly_convertible(arg): | ||
| raise ValueError(f"Type {type(arg)} is not implicitly copyable to the GPU.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if arg is a CuPy array or a torch cuda tensor? Normally they would get converted to a GPU Tensor, but now we'd throw.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed by bcefb0d. This also raises an exception earlier in cases like (ndd GPU tensor) + (PyTorch CPU tensor).
03f9013 to
fcf79a4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
Overview
This PR addresses a critical bug preventing arithmetic operations between GPU tensors/batches and scalars. It refactors arithmetic handling into a dedicated module and fixes a backend implementation bug.
Key Changes
1. Critical C++ Backend Fix (backend_impl.cc)
Fixed TensorListGPU.broadcast() method which was incorrectly using CPUBackend type parameters instead of GPUBackend. This prevented GPU tensor broadcasting from working at all. The fix is minimal and correct.
2. New _arithmetic.py Module
Extracted arithmetic operation logic into a centralized module to reduce duplication. The module implements:
_implicitly_convertible(): Validates scalar types (numbers, lists, tuples)_arithm_op(): Handles argument conversion and device consistency checking
3. Refactored _tensor.py and _batch.py
Both files now delegate arithmetic operations to the new _arithm_op() function instead of inline implementations.
4. Test Coverage
Added comprehensive tests (test_binary_scalars) covering:
- Scalar operations on both CPU and GPU tensors
- Scalar operations on GPU batches with different batch sizes
- Device mixing detection and error handling
- PyTorch GPU tensor interoperability
Critical Issue Found
Logic Bug in _arithmetic.py (Lines 34-38)
The scalar-to-tensor conversion logic contains a critical structural flaw:
if not isinstance(arg, (Tensor, Batch)):
if gpu and _implicitly_convertible(arg):
arg = as_tensor(arg, device="gpu")
arg = as_tensor(arg) # ← ALWAYS EXECUTES (wrong!)Line 38 should be in an else clause. Currently, every scalar is converted twice - once to the correct device, then again to default device. The code works by accident because as_tensor() preserves device for existing Tensor objects, but this is fragile and violates the principle of least surprise.
Positive Aspects
- C++ backend fix is correct and necessary
- Refactoring improves code maintainability
- Test coverage for new functionality is good
- Device validation logic correctly prevents mixed CPU/GPU operations
- Batch broadcasting now works correctly for GPU tensors
Recommendations
The _arithmetic.py logic should be corrected to explicitly handle the scalar conversion decision, making the intent clear and eliminating accidental reliance on as_tensor() device preservation semantics.
Confidence Score: 2/5
- This PR should NOT be merged in its current state due to a critical logic bug in _arithmetic.py that violates code quality standards, despite the bug working accidentally in practice.
- The C++ backend fix (TensorListGPU.broadcast) is correct and critical. However, the _arithmetic.py module contains a structural logic error where line 38 always executes regardless of the condition at line 36. This means scalars are converted twice - once to the correct device, then again unconditionally. While this works due to as_tensor() preserving device for existing Tensor objects, it's fragile, poorly written, and could break if as_tensor() behavior changes. The code violates Python style expectations and makes the intent unclear. This is a moderate issue that doesn't cause runtime failures in current code but represents poor engineering practice and technical debt. The test coverage added is good, but doesn't validate the implementation quality.
- dali/python/nvidia/dali/experimental/dynamic/_arithmetic.py - Lines 34-38 need restructuring to clarify intent and eliminate accidental reliance on as_tensor() device preservation behavior
Important Files Changed
File Analysis
| Filename | Score | Overview |
|---|---|---|
| dali/python/nvidia/dali/experimental/dynamic/_arithmetic.py | 2/5 | New module that centralizes arithmetic operation logic. Contains a critical logic bug at line 38 where scalar conversion happens twice due to incorrect indentation - line 38 should be in an else clause. The code works by accident due to as_tensor() preserving device for existing Tensor objects, but is fragile and error-prone. |
| dali/python/backend_impl.cc | 5/5 | Critical bug fix in TensorListGPU.broadcast method (lines 1612-1613). Corrected type parameters from CPUBackend to GPUBackend, allowing GPU tensor broadcasting to work correctly. Fix is correct and minimal. |
| dali/python/nvidia/dali/experimental/dynamic/_tensor.py | 5/5 | Updated to import and use the new _arithmetic module for all arithmetic operations. Changes are clean refactoring from inline implementation to delegating to _arithm_op. No new issues introduced. |
| dali/python/nvidia/dali/experimental/dynamic/_batch.py | 5/5 | Updated to import and use the new _arithmetic module for all arithmetic operations. The broadcast method correctly detects device type and uses appropriate TensorList backend. Changes are clean refactoring with no new issues. |
| dali/test/python/experimental_mode/test_arithm_ops.py | 4/5 | New tests added for scalar arithmetic operations on both CPU and GPU tensors/batches, and device compatibility checks. Tests are comprehensive but could benefit from additional edge case coverage (e.g., complex numbers, boolean operations with scalars). |
Sequence Diagram
sequenceDiagram
participant User
participant Tensor as GPU Tensor
participant _arithm_op as _arithm_op()
participant as_tensor as as_tensor()
participant Backend as Backend
User->>Tensor: tensor + 3 (scalar)
Tensor->>_arithm_op: __add__(self, 3)
_arithm_op->>_arithm_op: gpu = True (detect GPU)
_arithm_op->>_arithm_op: arg = 3 (scalar)
Note over _arithm_op: Check: gpu and<br/>_implicitly_convertible(3)?
_arithm_op->>as_tensor: as_tensor(3, device='gpu')
as_tensor-->>_arithm_op: GPU Tensor
Note over _arithm_op: BUG: Line 38<br/>ALWAYS executes
_arithm_op->>as_tensor: as_tensor(GPU_Tensor)
as_tensor-->>_arithm_op: GPU Tensor (device preserved)
_arithm_op->>_arithm_op: Validate: GPU == GPU ✓
_arithm_op->>Backend: _arithmetic_generic_op(tensor, scalar)
Backend-->>_arithm_op: Result Tensor
_arithm_op-->>Tensor: Result
Tensor-->>User: GPU Tensor result
rect rgb(255, 200, 200)
Note over _arithm_op: Works by accident:<br/>as_tensor() preserves<br/>device for Tensor objects
end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
This PR successfully enables arithmetic operations between GPU tensors/batches and scalars, fixing a critical bug and improving code organization.
Key Changes:
- Fixed critical bug in C++ backend where
TensorListGPU.broadcastincorrectly usedCPUBackendtypes - Extracted arithmetic operation logic into new
_arithmetic.pymodule, eliminating code duplication between_tensor.pyand_batch.py - Enhanced device handling to properly detect GPU context and convert scalars/lists/tuples to GPU tensors when operating with GPU data
- Added comprehensive test coverage for scalar operations on both CPU and GPU, plus device compatibility validation
Improvements Over Previous Implementation:
The new _arithm_op function correctly handles the case where scalars appear before GPU tensors in operations (e.g., 5 + gpu_tensor). The old implementation would convert the scalar to CPU before detecting the GPU tensor, leading to device mismatch errors or incorrect behavior.
Confidence Score: 5/5
- This PR is safe to merge - it fixes critical bugs, improves code organization, and has comprehensive test coverage
- The changes fix a clear copy-paste bug in the C++ backend, successfully refactor duplicated Python code into a shared module, and properly handle GPU/CPU device detection. The only minor issue is a small code structure inefficiency that doesn't affect correctness. Comprehensive tests validate the new functionality.
- No files require special attention - all changes are well-implemented
Important Files Changed
File Analysis
| Filename | Score | Overview |
|---|---|---|
| dali/python/backend_impl.cc | 5/5 | Fixed critical bug in TensorListGPU.broadcast that incorrectly used CPUBackend types instead of GPUBackend types |
| dali/python/nvidia/dali/experimental/dynamic/_arithmetic.py | 4/5 | New module extracting arithmetic logic; correctly handles GPU/CPU device detection and scalar-to-tensor conversion, with minor code structure inefficiency |
| dali/python/nvidia/dali/experimental/dynamic/_batch.py | 5/5 | Refactored to use centralized _arithm_op from _arithmetic module, removing duplicated code |
| dali/python/nvidia/dali/experimental/dynamic/_tensor.py | 5/5 | Refactored to use centralized _arithm_op from _arithmetic module, removing duplicated code |
| dali/test/python/experimental_mode/test_arithm_ops.py | 5/5 | Added comprehensive tests for scalar operations on GPU/CPU tensors and batches, plus device compatibility validation |
Sequence Diagram
sequenceDiagram
participant User
participant Tensor/Batch
participant _arithm_op
participant as_tensor
participant _arithmetic_generic_op
participant Backend
User->>Tensor/Batch: gpu_tensor + scalar
Tensor/Batch->>_arithm_op: __add__(scalar)
Note over _arithm_op: Detect GPU from args
_arithm_op->>_arithm_op: Check if any arg is GPU tensor
_arithm_op->>_arithm_op: gpu = True
Note over _arithm_op: Convert scalar to tensor
_arithm_op->>as_tensor: as_tensor(scalar, device="gpu")
as_tensor->>Backend: TensorListGPU.broadcast(tensor, 1)
Backend-->>as_tensor: GPU tensor
as_tensor-->>_arithm_op: GPU scalar tensor
_arithm_op->>_arithm_op: Validate devices match
_arithm_op->>_arithmetic_generic_op: Execute operation
_arithmetic_generic_op->>Backend: Run GPU arithmetic op
Backend-->>_arithmetic_generic_op: Result tensor
_arithmetic_generic_op-->>User: GPU result tensor
Signed-off-by: Rostan Tabet <[email protected]>
fcf79a4 to
bcefb0d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
This PR fixes a critical bug preventing arithmetic operations between GPU tensors/batches and scalars, and refactors the arithmetic operation logic for better maintainability.
Key Changes:
- Fixed
TensorListGPU.broadcastin C++ backend that incorrectly usedCPUBackendtypes instead ofGPUBackendtypes, which was the root cause preventing GPU tensor operations - Extracted arithmetic operation handling into new
_arithmetic.pymodule, eliminating code duplication between_tensor.pyand_batch.py - Implemented smart device detection: when any operand is a GPU tensor, implicitly convertible scalars (numbers, lists, tuples) are automatically converted to GPU tensors
- Added validation to prevent mixing CPU and GPU operands, providing clear error messages
- Added comprehensive test coverage for scalar operations on both devices and device compatibility checks
Issues Found:
- One syntax error in
_tensor.pyline 71: missingfprefix for f-string formatting
Confidence Score: 4/5
- This PR is safe to merge after fixing the f-string syntax error - it addresses a critical bug and improves code quality
- Score of 4 reflects one syntax error that must be fixed before merging. The core logic is sound: the C++ backend fix is correct, the arithmetic refactoring properly handles device placement, and tests comprehensively cover the new functionality. The syntax error on line 71 is in an error path that likely wasn't hit by tests.
- Fix the f-string syntax error in
dali/python/nvidia/dali/experimental/dynamic/_tensor.pyline 71 before merging
Important Files Changed
File Analysis
| Filename | Score | Overview |
|---|---|---|
| dali/python/backend_impl.cc | 5/5 | Critical bug fix: corrected TensorListGPU.broadcast to use GPUBackend instead of CPUBackend - this was preventing GPU tensor broadcasting from working |
| dali/python/nvidia/dali/experimental/dynamic/_arithmetic.py | 5/5 | New module extracting arithmetic operation logic with proper GPU/CPU device handling and implicit scalar-to-tensor conversion |
| dali/python/nvidia/dali/experimental/dynamic/_tensor.py | 4/5 | Refactored to use shared _arithm_op from _arithmetic.py; contains one f-string syntax error on line 71 |
| dali/python/nvidia/dali/experimental/dynamic/_batch.py | 5/5 | Refactored to use shared _arithm_op from _arithmetic.py, removing duplicate code and improving maintainability |
| dali/test/python/experimental_mode/test_arithm_ops.py | 5/5 | Added comprehensive tests for scalar operations with tensors/batches on both CPU and GPU, plus device compatibility validation tests |
Sequence Diagram
sequenceDiagram
participant User
participant Tensor/Batch
participant _arithm_op
participant as_tensor
participant _arithmetic_generic_op
participant Backend
User->>Tensor/Batch: gpu_tensor + scalar
Tensor/Batch->>_arithm_op: __add__(scalar)
Note over _arithm_op: Detect GPU device from<br/>existing Tensor/Batch args
_arithm_op->>_arithm_op: Check if scalar is<br/>implicitly_convertible<br/>(Real, list, tuple)
alt Scalar is implicitly convertible & GPU detected
_arithm_op->>as_tensor: Convert scalar to GPU tensor
as_tensor-->>_arithm_op: GPU Tensor
else Otherwise
_arithm_op->>as_tensor: Convert scalar to CPU tensor
as_tensor-->>_arithm_op: CPU Tensor
end
_arithm_op->>_arithm_op: Validate all args<br/>are on same device
alt Device mismatch
_arithm_op-->>User: ValueError: Cannot mix GPU and CPU
else All devices match
_arithm_op->>_arithmetic_generic_op: Execute operation
_arithmetic_generic_op->>Backend: TensorListGPU.broadcast<br/>(now uses GPUBackend)
Backend-->>_arithmetic_generic_op: Result
_arithmetic_generic_op-->>User: Tensor/Batch result
end
| return arr.astype(np.int32), nvidia.dali.types.IMAGE_TYPE | ||
| else: | ||
| raise TypeError(f"Unexpected element type {type(item)}") | ||
| raise TypeError("Unexpected element type f{type(item)}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing f prefix for f-string.
| raise TypeError("Unexpected element type f{type(item)}") | |
| raise TypeError(f"Unexpected element type {type(item)}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not related to this PR and I fixed it in another PR already.
|
!build |
|
CI MESSAGE: [41436981]: BUILD STARTED |
|
CI MESSAGE: [41436981]: BUILD FAILED |
Category: Bug fix (non-breaking change which fixes an issue)
Description:
This PR enables arithmetic operations between tensors/batches and scalars. Previously,
x + nworked ifxwas a CPU tensor andna scalar, but not ifxwas a GPU tensor.Arithmetic operations between a GPU tensor/batch and a Python list or tuple are also supported. Tensor types need to be explicitly copied.
Additional information:
Affected modules and functionalities:
Dynamic mode tensors and batches.
Key points relevant for the review:
Do arithmetic operations work the way we intend to?
Tests:
Checklist
Documentation
DALI team only
Requirements
REQ IDs: N/A
JIRA TASK: DALI-4545