Skip to content

invalid argument Current pipeline object is no longer valid. #6149

@zjliu-cs

Description

@zjliu-cs

Describe the question.

Hello,
I am encountering an intermittent error when using NVIDIA DALI to load data. The same code sometimes runs successfully, but sometimes fails with a critical pipeline error. I would like to understand the possible reasons for this behavior.
Additionally, the failure location is not consistent: across different runs, the error may be reported in different operators or different parts of the code.

Critical error in pipeline:
Error in CPU operator 'nvidia.dali.fn.decoders.image',
which was used in the pipeline definition with the following traceback:

  File ".../dali_loader.py", line 36, in create_lmdb_dali_train_pipeline
    images = fn.decoders.image(images, device='cpu', output_type=types.RGB)

encountered:

Error in thread 0: CUDA runtime API error cudaErrorInvalidValue (1):
invalid argument
Current pipeline object is no longer valid.
Error in CPU operator 'nvidia.dali.fn.ones',

  File ".../dali_loader.py", line 64, in create_lmdb_dali_train_pipeline
    mask = fn.ones(shape=(crop, crop)).gpu()

CUDA runtime API error cudaErrorInvalidValue (1):
invalid argument
Current pipeline object is no longer valid.

Environment
OS: Ubuntu
GPU: A100
CUDA version: 12.2
NVIDIA DALI version: 1.51.2
PyTorch version: 2.4.1
Python version: 3.11.13
Data format: LMDB (ImageNet-style)

Check for duplicates

  • I have searched the open bugs/issues and have found no duplicates for this bug report

Metadata

Metadata

Assignees

Labels

questionFurther information is requested

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions