Skip to content

Commit 7fb1d11

Browse files
authored
prod_env_mat: allocate GPU memory out of frame loop (#2832)
Allocating GPU memory is not a cheap operator. This PR allocates memory for `int_temp`, `uint64_temp`, and `tensor_list[0, 1, 3, 4, 5, 6]` out of the frame loop, so they can be reused in each loop without allocating many times. In the original code, `tensor_list[3]`, `tensor_list[4]`, and `tensor_list[6]` may need to reallocate if the memory is not enough. This behavior still exists. The shape of `tensor_list[2]` is dynamic, so it is not refactored in this PR. With CUDA enabled, unit tests for C++ and Python can pass. The examples can be performed. The speedup can be observed when the number of frames (samples) in a batch is not small. --------- Signed-off-by: Jinzhe Zeng <[email protected]>
1 parent fa2c0b6 commit 7fb1d11

File tree

1 file changed

+245
-176
lines changed

1 file changed

+245
-176
lines changed

0 commit comments

Comments
 (0)