-
Notifications
You must be signed in to change notification settings - Fork 48
Add a tensorrt backend #33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from 7 commits
3059a7e
17d6c48
2319618
5e12a29
d2a00a0
659d003
4f28496
99e9248
1901c87
d51139b
07af9dd
f661509
874414b
7255b8c
14d88f9
1b93698
5a372b1
57d3eeb
ef70e0d
a33112d
8fe0598
1da5436
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
@@ -0,0 +1,274 @@ | ||||||||||||||
import numpy as np | ||||||||||||||
|
||||||||||||||
import pycuda.driver as cuda | ||||||||||||||
import tensorrt as trt | ||||||||||||||
import cv2 | ||||||||||||||
|
||||||||||||||
from typing import Dict, List, Tuple, Optional, Any | ||||||||||||||
|
||||||||||||||
from vcap import ( | ||||||||||||||
Resize, | ||||||||||||||
DETECTION_NODE_TYPE, | ||||||||||||||
OPTION_TYPE, | ||||||||||||||
BaseStreamState, | ||||||||||||||
BaseBackend, | ||||||||||||||
rect_to_coords, | ||||||||||||||
DetectionNode, | ||||||||||||||
) | ||||||||||||||
|
||||||||||||||
|
||||||||||||||
class HostDeviceMem(object): | ||||||||||||||
def __init__(self, host_mem, device_mem): | ||||||||||||||
self.host = host_mem | ||||||||||||||
self.device = device_mem | ||||||||||||||
|
||||||||||||||
def __str__(self): | ||||||||||||||
return "Host:\n" + str(self.host) + "\nDevice:\n" + str(self.device) | ||||||||||||||
|
||||||||||||||
def __repr__(self): | ||||||||||||||
return self.__str__() | ||||||||||||||
|
||||||||||||||
|
||||||||||||||
class AllocatedBuffer: | ||||||||||||||
def __init__(self, inputs_, outputs_, bindings_, stream_): | ||||||||||||||
self.inputs = inputs_ | ||||||||||||||
self.outputs = outputs_ | ||||||||||||||
self.bindings = bindings_ | ||||||||||||||
self.stream = stream_ | ||||||||||||||
|
||||||||||||||
|
||||||||||||||
class BaseTensorRTBackend(BaseBackend): | ||||||||||||||
def __init__(self, engine_bytes, width, height, device_id): | ||||||||||||||
super().__init__() | ||||||||||||||
gpu_devide_id = int(device_id[4:]) | ||||||||||||||
cuda.init() | ||||||||||||||
apockill marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||
dev = cuda.Device(gpu_devide_id) | ||||||||||||||
self.ctx = dev.make_context() | ||||||||||||||
apockill marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||||||||||
TRT_LOGGER = trt.Logger(trt.Logger.WARNING) | ||||||||||||||
apockill marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||||||||||
self.trt_runtime = trt.Runtime(TRT_LOGGER) | ||||||||||||||
# load the engine | ||||||||||||||
self.trt_engine = self.trt_runtime.deserialize_cuda_engine(engine_bytes) | ||||||||||||||
# create execution context | ||||||||||||||
self.context = self.trt_engine.create_execution_context() | ||||||||||||||
# create buffers for inference | ||||||||||||||
self.buffers = {} | ||||||||||||||
for batch_size in range(1, self.trt_engine.max_batch_size + 1): | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm leaving a comment here to remind me: We need to do some memory measurement to figure out if all of these buffers are necessary. I wonder if allocating a buffer for Batch-Size [1, 2, 5, 10] or other combinations might be better. THings to test:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Another question we'll have to figure out: Should this be configurable via the init? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'll do some tests to figure how much memory is needed for those buffers. Another thought is that if we don't get performance improvement with a larger batch size, we don't have to do that. Based on my tests, larger batch size will improve the inference time by 10% but lower the preprocessing performance, the overall performance is even a little lower than a small batch size. |
||||||||||||||
inputs, outputs, bindings, stream = self.allocate_buffers( | ||||||||||||||
batch_size=batch_size) | ||||||||||||||
self.buffers[batch_size] = AllocatedBuffer(inputs, outputs, bindings, | ||||||||||||||
stream) | ||||||||||||||
|
||||||||||||||
self.engine_width = width | ||||||||||||||
self.engine_height = height | ||||||||||||||
|
||||||||||||||
# preallocate resources for post process | ||||||||||||||
# todo: post process is only need for detectors | ||||||||||||||
self._prepare_post_process() | ||||||||||||||
|
||||||||||||||
def batch_predict(self, input_data_list: List[Any]) -> List[Any]: | ||||||||||||||
apockill marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||||||||||
task_size = len(input_data_list) | ||||||||||||||
curr_index = 0 | ||||||||||||||
while curr_index < task_size: | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This logic may need to be revisited if we decide not to have buffers [0->10], and instead have combinations of [1, 2, 5, 10], for example |
||||||||||||||
if curr_index + self.trt_engine.max_batch_size <= task_size: | ||||||||||||||
end_index = curr_index + self.trt_engine.max_batch_size | ||||||||||||||
else: | ||||||||||||||
end_index = task_size | ||||||||||||||
batch = input_data_list[curr_index:end_index] | ||||||||||||||
curr_index = end_index | ||||||||||||||
for result in self._process_batch(batch): | ||||||||||||||
yield result | ||||||||||||||
|
||||||||||||||
def _process_batch(self, input_data: List[np.array]) -> List[List[float]]: | ||||||||||||||
batch_size = len(input_data) | ||||||||||||||
prepared_buffer = self.buffers[batch_size] | ||||||||||||||
inputs = prepared_buffer.inputs | ||||||||||||||
outputs = prepared_buffer.outputs | ||||||||||||||
bindings = prepared_buffer.bindings | ||||||||||||||
stream = prepared_buffer.stream | ||||||||||||||
# todo: get dtype from engine | ||||||||||||||
inputs[0].host = np.ascontiguousarray(input_data, dtype=np.float32) | ||||||||||||||
|
||||||||||||||
detections = self.do_inference( | ||||||||||||||
bindings=bindings, inputs=inputs, outputs=outputs, stream=stream, batch_size=batch_size | ||||||||||||||
) | ||||||||||||||
return detections | ||||||||||||||
apockill marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||||||||||
|
||||||||||||||
def process_frame(self, frame: np.ndarray, detection_node: DETECTION_NODE_TYPE, | ||||||||||||||
apockill marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||||||||||
options: Dict[str, OPTION_TYPE], | ||||||||||||||
state: BaseStreamState) -> DETECTION_NODE_TYPE: | ||||||||||||||
pass | ||||||||||||||
|
||||||||||||||
def prepare_inputs(self, frame: np.ndarray, transpose: bool, normalize: bool, | ||||||||||||||
mean_subtraction: Optional[Tuple] = None) -> \ | ||||||||||||||
Tuple[np.array, Resize]: | ||||||||||||||
resize = Resize(frame).resize(self.engine_width, self.engine_height, | ||||||||||||||
Resize.ResizeType.EXACT) | ||||||||||||||
if transpose: | ||||||||||||||
resize.frame = np.transpose(resize.frame, (2, 0, 1)) | ||||||||||||||
if normalize: | ||||||||||||||
resize.frame = (1.0 / 255.0) * resize.frame | ||||||||||||||
if mean_subtraction is not None: | ||||||||||||||
if len(mean_subtraction) != 3: | ||||||||||||||
raise RuntimeError("Invalid mean subtraction") | ||||||||||||||
resize.frame = resize.frame.astype("float64") | ||||||||||||||
resize.frame[..., 0] -= mean_subtraction[0] | ||||||||||||||
resize.frame[..., 1] -= mean_subtraction[1] | ||||||||||||||
resize.frame[..., 2] -= mean_subtraction[2] | ||||||||||||||
return resize.frame, resize | ||||||||||||||
|
||||||||||||||
def allocate_buffers(self, batch_size: int = 1) -> \ | ||||||||||||||
Tuple[List[HostDeviceMem], List[HostDeviceMem], List[int], cuda.Stream]: | ||||||||||||||
"""Allocates host and device buffer for TRT engine inference. | ||||||||||||||
Args: | ||||||||||||||
batch_size: batch size for the input/output memory | ||||||||||||||
Returns: | ||||||||||||||
inputs [HostDeviceMem]: engine input memory | ||||||||||||||
outputs [HostDeviceMem]: engine output memory | ||||||||||||||
bindings [int]: buffer to device bindings | ||||||||||||||
stream (cuda.Stream): cuda stream for engine inference synchronization | ||||||||||||||
""" | ||||||||||||||
inputs = [] | ||||||||||||||
outputs = [] | ||||||||||||||
bindings = [] | ||||||||||||||
stream = cuda.Stream() | ||||||||||||||
for binding in self.trt_engine: | ||||||||||||||
size = trt.volume(self.trt_engine.get_binding_shape(binding)) * batch_size | ||||||||||||||
dtype = trt.nptype(self.trt_engine.get_binding_dtype(binding)) | ||||||||||||||
# Allocate host and device buffers | ||||||||||||||
host_mem = cuda.pagelocked_empty(size, dtype) | ||||||||||||||
device_mem = cuda.mem_alloc(host_mem.nbytes) | ||||||||||||||
# Append the device buffer to device bindings. | ||||||||||||||
bindings.append(int(device_mem)) | ||||||||||||||
# Append to the appropriate list. | ||||||||||||||
if self.trt_engine.binding_is_input(binding): | ||||||||||||||
inputs.append(HostDeviceMem(host_mem, device_mem)) | ||||||||||||||
else: | ||||||||||||||
outputs.append(HostDeviceMem(host_mem, device_mem)) | ||||||||||||||
return inputs, outputs, bindings, stream | ||||||||||||||
|
||||||||||||||
def do_inference(self, bindings: List[int], inputs: List[HostDeviceMem], outputs: List[HostDeviceMem], | ||||||||||||||
|
def do_inference(self, bindings: List[int], inputs: List[HostDeviceMem], outputs: List[HostDeviceMem], | |
def _do_inference(self, bindings: List[int], | |
inputs: List[HostDeviceMem], | |
outputs: List[HostDeviceMem], | |
stream: cuda.Stream, | |
batch_size: int = 1) -> List[List[float]]: |
apockill marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm starting to think that are too many constants and GridNet specific functions here, and it might be easier to make a separate class specifically for parsing GridNet bounding boxes.
For now, let's clean up the rest of the code first, then discuss how that would work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These constants are only necessary for detectors, maybe we need another parameter like is_detector
in the constructor to indicate if this capsule a detector or classifier?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or we can check if these constants exist before we call the post process function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, but I'm thinking that this is super duper specific to GridNet detectors particularly. Maybe we can just offer a function that for parsing GridNet detector outputs, and name it as such.
class GridNetParser:
def __init__(parameters):
...
def parse_detection_results(prediction):
...
class BaseTensorRTBackend:
...
The benefit would be to separate all of these GridNet specific parameters out of the BaseTensorRTBackend 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great idea, we should have separate parsers for different architectures.
Uh oh!
There was an error while loading. Please reload this page.