Skip to content

Allow passing tensor arguments in reader constructors#6252

Open
rostan-t wants to merge 19 commits intoNVIDIA:mainfrom
rostan-t:ndd-reader-tensor-args
Open

Allow passing tensor arguments in reader constructors#6252
rostan-t wants to merge 19 commits intoNVIDIA:mainfrom
rostan-t:ndd-reader-tensor-args

Conversation

@rostan-t
Copy link
Collaborator

Category:

New feature (non-breaking change which adds functionality)

Description:

Currently, it is necessary to invoke readers in order to pass tensor arguments. The recommended way to use readers is with next_epoch and the __call__ API is not even documented.

This PR allows constructing readers with tensor arguments.

Additional information:

Affected modules and functionalities:

Dynamic mode.

Key points relevant for the review:

Tests:

  • Existing tests apply
  • New tests added
    • Python tests
    • GTests
    • Benchmark
    • Other
  • N/A

Checklist

Documentation

  • Existing documentation applies
  • Documentation updated
    • Docstring
    • Doxygen
    • RST
    • Jupyter
    • Other
  • N/A

DALI team only

Requirements

  • Implements new requirements
  • Affects existing requirements
  • N/A

REQ IDs: N/A

JIRA TASK: DALI-4600

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 11, 2026

Greptile Summary

This PR extends DALI's experimental dynamic mode to allow tensor (and array-like) arguments to be passed directly in reader constructors, not only at call time. Previously, tensor arguments could only be provided via __call__, which conflicted with the recommended next_epoch() API that is the primary interface for readers.

Key changes:

  • _op_builder.py: The generated __init__ for reader operators now separates non-scalar tensor arguments into _raw_tensor_args (stored for lazy processing) while scalar arguments continue through the normal _scalar_decay path. A _tensor_arg_names set is populated to guard against duplicate argument passing in __call__.
  • _ops.py: Reader gains _process_tensor_args(batch_size) — a caching helper that converts _raw_tensor_args to their processed Batch/Tensor form and passes them through all execution paths (_samples, _batches, get_metadata).
  • _invocation.py / _batch2tensor.py: Default caller_depth unified to 3, removing a now-redundant explicit override.
  • _signatures.py: Reader __init__ type stubs updated to expose tensor kwargs with appropriate type constraints (allow_data_node_kwargs=False, allow_batch_kwargs=False).
  • pytorch/nodes.py: get_metadata updated to the new required batch_size parameter; fixes a positional vs keyword argument call for _stream.
  • New tests cover: ndd.tensor() arg with next_epoch, mixed constructor/call-time scalar split, and the duplicate-arg error path.

Confidence Score: 3/5

  • Safe to merge after addressing the latent _previous_batch_size = None cache-hit edge case and confirming the _signatures.py stub change is limited to reader schemas.
  • The core logic for separating tensor args at construction time and caching their processed forms is sound, and the three new tests exercise the main paths. Two issues lower confidence: (1) _previous_batch_size is initialized to None, which creates a false cache hit if _process_tensor_args(None) is ever called with non-empty tensor args — currently not triggered but fragile; (2) the removal of include_inputs=False and include_kwarg_inputs=False from the __init__ stub generator could affect non-reader CamelCase class operators if any exist. Neither issue blocks the primary feature, but both warrant a quick fix or confirmation before merging.
  • _ops.py (latent cache-hit in _process_tensor_args) and _signatures.py (stub generation scope change).

Important Files Changed

Filename Overview
dali/python/nvidia/dali/experimental/dynamic/_ops.py Core reader class extended with _raw_tensor_args, _tensor_args, and _previous_batch_size fields; adds _process_tensor_args caching helper and wires tensor args through _samples, _batches, and get_metadata. The _previous_batch_size = None initialization creates a latent false-cache-hit if _process_tensor_args(None) is ever called with non-empty tensor args.
dali/python/nvidia/dali/experimental/dynamic/_op_builder.py Generated __init__ now separates tensor args from scalar args for reader operators, stores them in _raw_tensor_args/_tensor_arg_names, and validates against Batch usage. The __call__ path correctly merges and deduplicates constructor tensor args with call-time args.
dali/python/nvidia/dali/ops/_signatures.py Reader __init__ type stub updated to include tensor keyword args (constrained to non-DataNode/non-Batch types). Removal of include_inputs=False and include_kwarg_inputs=False could expose positional inputs in __init__ stubs for any non-reader CamelCase class operators.
dali/test/python/experimental_mode/test_reader_decoder.py Adds three new tests: tensor arg via ndd.tensor() in next_epoch, partial scalar+call-time arg split, and duplicate-arg error guard. Fixes a pre-existing glob pattern missing wildcard in the shards error test.

Sequence Diagram

sequenceDiagram
    participant User
    participant Init as "Reader.__init__ (generated)"
    participant Base as "Reader base __init__"
    participant PTA as "_process_tensor_args"
    participant Backend as "_init_backend"
    participant Run as "super()._run"

    User->>Init: "Reader(filenames=..., resize_x=ndd.tensor(w), resize_y=192)"
    Init->>Init: "Separate tensor args from scalar args"
    Note over Init: "tensor_args = {resize_x: Tensor}, scalar kwargs = {resize_y: 192}"
    Init->>Base: "__init__(max_batch_size, name, resize_y=192, ...)"
    Base->>Base: "Set _raw_tensor_args={}, _tensor_args={}, _previous_batch_size=None"
    Base-->>Init: "return"
    Init->>Init: "Override _raw_tensor_args={resize_x: Tensor}, _tensor_arg_names={resize_x, resize_y}"

    User->>Init: "reader.next_epoch(batch_size=4)"
    Init->>PTA: "_process_tensor_args(batch_size=4)"
    PTA->>PTA: "Cache miss (4 != None) → _process_params"
    PTA-->>Init: "{resize_x: Batch(tensor, 4)}"
    Init->>Backend: "_init_backend(ctx, (), {resize_x: Batch})"

    loop "Each batch in epoch"
        Init->>PTA: "_process_tensor_args(4)"
        PTA-->>Init: "cached {resize_x: Batch}"
        Init->>Run: "_run(ctx, batch_size=4, resize_x=Batch)"
        Run-->>User: "yield Batch outputs"
    end
Loading

Comments Outside Diff (2)

  1. dali/python/nvidia/dali/experimental/dynamic/_ops.py, line 610 (link)

    None initial value creates false cache hit in _process_tensor_args

    _previous_batch_size is initialized to None on line 610. The cache check in _process_tensor_args is:

    if self._previous_batch_size == batch_size:
        return self._tensor_args   # _tensor_args is still {} at this point

    If _process_tensor_args(None) is called while _raw_tensor_args is non-empty, None == None is True and the method silently returns an empty {} instead of processing the tensor arguments.

    In the current code this doesn't manifest — _samples sets _actual_batch_size = 1 before calling _process_tensor_args, _batches rejects None batch_size outright, and get_metadata is typed int. However, the initialization is fragile: any future caller that passes batch_size=None (e.g., calling get_metadata(None) despite the type hint) would receive silently incorrect results.

    A safer sentinel value would avoid the ambiguity:

    Or alternatively, guard the initial state explicitly in _process_tensor_args.

  2. dali/python/nvidia/dali/ops/_signatures.py, line 797-806 (link)

    include_inputs=False removal may expose positional inputs in non-reader class stubs

    The previous code explicitly passed include_inputs=False and include_kwarg_inputs=False to _call_signature for the __init__ stub of all class-based operators. The new code relies on defaults (include_inputs=True, include_kwarg_inputs=True) while constraining annotations via allow_data_node_kwargs=False and allow_batch_kwargs=False.

    For readers (which have MinNumInput() == 0) this is fine — no positional inputs appear. But for any non-reader CamelCase operator that has positional inputs, the generated __init__ stub would now incorrectly include those inputs, even though the actual runtime build_constructor skips tensor args for non-readers (if is_tensor and not is_reader: continue).

    If all CamelCase class operators in the dynamic API are exclusively readers, this is a non-issue. But if that assumption breaks in the future, the generated .pyi stubs would mislead type-checkers and IDEs into accepting inputs in __init__ that are rejected at runtime. Consider confirming that _gen_dynamic_cls_signature is only reachable for reader schemas, or restoring an explicit guard.

Last reviewed commit: e145ef5

@rostan-t rostan-t force-pushed the ndd-reader-tensor-args branch 2 times, most recently from ed9a066 to c498545 Compare March 11, 2026 11:08
@rostan-t
Copy link
Collaborator Author

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [45875629]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [45875629]: BUILD FAILED

kwargs = {k: _scalar_decay(v) for k, v in kwargs.items()}
op_class.__base__.__init__(self, max_batch_size, name, **kwargs)
if is_reader:
self._tensor_args = {k: v for k, v in tensor_kwargs.items() if v is not None}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is potentially quite inefficient, as we'd aggressively promote arguments to tensors - and we should avoid it in __init__. We should only add tensor types as tensors - and possibly even demote 0D tensors back to scalar values. This will have considerable benefits at run-time, as we won't have to pass the tensors to each call (which, in case of readers, will mean many, many times).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also - if we get a foreign tensor in kwargs, we should make a copy - otherwise user might get a nasty surprise.

shape = np.array([480, 640])
r = readers.Numpy(files=[...], shape=shape)
shape[0] = 320  # oops, the reader will now see that

Comment on lines +675 to +677
_, kwargs = self._process_params(
self._backend, self._device, batch_size, **self._tensor_args
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend at least caching the processed kwargs when the batch size is constant.

Copy link
Contributor

@mzient mzient left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To change:

  • make a copy tensor arguments
  • do not promote non-tensor arguments to tensors
  • simplify caller_depth logic

rostan-t added 11 commits March 13, 2026 15:23
Signed-off-by: Rostan Tabet <rtabet@nvidia.com>
Signed-off-by: Rostan Tabet <rtabet@nvidia.com>
… traces

Signed-off-by: Rostan Tabet <rtabet@nvidia.com>
Signed-off-by: Rostan Tabet <rtabet@nvidia.com>
Signed-off-by: Rostan Tabet <rtabet@nvidia.com>
Signed-off-by: Rostan Tabet <rtabet@nvidia.com>
Signed-off-by: Rostan Tabet <rtabet@nvidia.com>
Signed-off-by: Rostan Tabet <rtabet@nvidia.com>
…essing

Signed-off-by: Rostan Tabet <rtabet@nvidia.com>
Signed-off-by: Rostan Tabet <rtabet@nvidia.com>
Signed-off-by: Rostan Tabet <rtabet@nvidia.com>
Signed-off-by: Rostan Tabet <rtabet@nvidia.com>
…al tensors

Signed-off-by: Rostan Tabet <rtabet@nvidia.com>
@rostan-t rostan-t force-pushed the ndd-reader-tensor-args branch from 51fb904 to f283da0 Compare March 13, 2026 17:00
Signed-off-by: Rostan Tabet <rtabet@nvidia.com>
Signed-off-by: Rostan Tabet <rtabet@nvidia.com>
Signed-off-by: Rostan Tabet <rtabet@nvidia.com>
Signed-off-by: Rostan Tabet <rtabet@nvidia.com>
Signed-off-by: Rostan Tabet <rtabet@nvidia.com>
Signed-off-by: Rostan Tabet <rtabet@nvidia.com>
@rostan-t
Copy link
Collaborator Author

the init stub change must be scoped to reader operators only

@greptileai This is not necessary. Readers are now the only operators publicly exposed as classes.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 16, 2026

That makes sense - if readers are the only operators exposed as classes, then the change is already naturally scoped correctly. Thanks for the clarification!

@rostan-t
Copy link
Collaborator Author

!build

@rostan-t rostan-t requested a review from mzient March 16, 2026 10:11
@dali-automaton
Copy link
Collaborator

CI MESSAGE: [46240320]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [46240320]: BUILD PASSED

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants