'NoneType' object has no attribute 'create_execution_context' when calling model from another class.

After putting a comment on #1488 , I decided to open my own issue. 

## Environment
**TensorRT Version**: 8.5.2.2
**NVIDIA GPU**: NVIDIA RTX 2000 Ada Generation Laptop GPU
**NVIDIA Driver Version**: 570.169
**CUDA Version**: 11.5
**CUDNN Version**: 8.9.2.26
**Operating System**: Linux 6.8.0-65-generic
**Python Version**: 3.10.12
**PyTorch Version**: 2.3.0+cu121
**Baremetal or Container**: Both


I am having the same problem but only when importing a TensorRT inference engine class I created and which runs properly when doing both in Baremetal and the Container:

```python

if __name__ == "__main__":

    import pycuda.autoinit
    
    engine_path = "model_engine_fp16.trt"
    calibration_images_path = "calib_images.pt"
    
    calibration_tensors = torch.load(calibration_images_path, map_location='cpu')
    calibration_images = [img.numpy().astype(np.float32) for img in calibration_tensors]
    
    print("Testing inference with variable batch sizes...")
    inference_engine = TensorRTInference(engine_path, max_batch_size=6)
    
    # Test with random batches
    for i in range(100):
        batch_size = np.random.randint(1, 6)  # Random batch size 1-8
        print(f"Batch size for run #{i}: {batch_size}")
        
        # Select random images
        indices = np.random.choice(len(calibration_images), batch_size, replace=False)
        batch_images = [calibration_images[idx] for idx in indices]
        
        # Stack to batch
        batch_input = np.stack(batch_images, axis=0)
        
        t_start = time.time()
        results = inference_engine.infer(batch_input)
        t_total = time.time() - t_start

        time.sleep(1./6.)Exception: Execution context is None!
        
        print(f"Batch {i}: size={batch_size}, time={t_total:.4f}s")
        print(f"  Input shape: {batch_input.shape}")
        print(f"  Output shapes: {[r.shape for r in results]}")
```

But then I try importing the same model in a class which mixes the TensorRT model with a YOLO model, which runs well on Baremetal,  I run into a problem when trying to do the same on the container, the `self.engine.create_execution_context()` returns a `NoneType`.

Code for the constructior of the `TensorRTInference` class:

```python
class TensorRTInference:
    """Handle TensorRT inference with variable batch sizes"""
    
    def __init__(self, engine_path, max_batch_size=32):

        try:
            current_context = cuda.Context.get_current()
            print(f"Current CUDA context: {current_context}")
            print(f"Context device: {current_context.get_device()}")
            print(f"Context API version: {current_context.get_api_version()}")
        except cuda.LogicError as e:
            print(f"No CUDA context: {e}")
            import pycuda.autoinit
            current_context = cuda.Context.get_current()
            print(f"After autoinit - Context: {current_context}")
        
        # Test context is working
        try:
            cuda.Context.synchronize()
            print("CUDA context is active and synchronized")
        except Exception as e:
            print(f"CUDA context sync failed: {e}")
        
        # Load TensorRT engine
        self.logger = trt.Logger(trt.Logger.WARNING)
        with open(engine_path, "rb") as f:
            runtime = trt.Runtime(self.logger)
            self.engine = runtime.deserialize_cuda_engine(f.read())

        print(f"Engine deserialized successfully: {self.engine is not None}")
        if self.engine is None:
            raise Exception("Failed to deserialize TensorRT engine")

        self.context = self.engine.create_execution_context()

        if self.context == None:
            raise Exception("Execution context is None!")

        self.max_batch_size = max_batch_size
        
        # Get tensor info
        self.input_names = []
        self.output_names = []
        for i in range(self.engine.num_io_tensors):
            name = self.engine.get_tensor_name(i)
            if self.engine.get_tensor_mode(name) == trt.TensorIOMode.INPUT:
                self.input_names.append(name)
            else:
                self.output_names.append(name)
        
        # Pre-allocate buffers for maximum batch size
        self.buffers = {}
        self.stream = cuda.Stream()
        self._allocate_max_buffers()
```

Code for the constructor of the class which loads the `TensorRTInference` class and the YOLO model:

```python
class UseTwoModels:
    def __init__(self, 
                 detection_model_path : str, 
                 other_model_path : str, 
                 device : str ='cuda:0', 
                 min_conf : float = 0.5) -> None:

        self.device = torch.device(device if torch.cuda.is_available() else 'cpu')
        torch.cuda.set_device(device)

        print(device)

        # Warm up CUDA context
        dummy_tensor = torch.zeros(1).to(device)
        del dummy_tensor
            
        print(f"Using CUDA device: {torch.cuda.get_device_name(0)}")

        try:
            self.detector_model = YOLO(detection_model_path, task='detect')
        except Exception as e:
            print(f"Error loading YOLO model: {e}")
        else:
            print("YOLO detector properly loaded.")

        self.keypoint_model = TensorRTInference(keypoint_model_path, max_batch_size=6)

        self.min_conf = min_conf
        self.person_cls = 0
```

Just to clarify my problem, all works well on the baremetal, but when using the `UseTwoModels` class on the container, I get the `Exception: Execution context is None!` which I put in the constructor for the `TensorRTInference` class.

Here is the trace when trying to load the model from `UseTwoModels`:

```
# Here we are inside UseTwoModels
cuda:0
Using CUDA device: NVIDIA RTX 2000 Ada Generation Laptop GPU
Current CUDA context Det: <pycuda._driver.Context object at 0x7c80388b5540>
YOLO detector properly loaded.

# Here we are inside TensorRTInference
Current CUDA context: <pycuda._driver.Context object at 0x7c8038360890>
Context device: <pycuda._driver.Device object at 0x7c803889ece0>
Context API version: 3020
CUDA context is active and synchronized
Engine deserialized successfully: True

# This is the error
Traceback (most recent call last):
...
raise Exception("Execution context is None!")
Exception: Execution context is None!
```

I have also tried rebuilding the engine directly on the container and using it, but I keep getting the same problem. 

My intuition tells me it is not a TensorRT version problem, given that the first snippet runs well on both baremetal and the container. 

Please let me know if I should provide other information. Thanks!


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

'NoneType' object has no attribute 'create_execution_context' when calling model from another class. #4547

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

'NoneType' object has no attribute 'create_execution_context' when calling model from another class. #4547

Description

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions