Commit 7fb1d11

authored

prod_env_mat: allocate GPU memory out of frame loop (#2832)

Allocating GPU memory is not a cheap operator. This PR allocates memory for `int_temp`, `uint64_temp`, and `tensor_list[0, 1, 3, 4, 5, 6]` out of the frame loop, so they can be reused in each loop without allocating many times. In the original code, `tensor_list[3]`, `tensor_list[4]`, and `tensor_list[6]` may need to reallocate if the memory is not enough. This behavior still exists. The shape of `tensor_list[2]` is dynamic, so it is not refactored in this PR. With CUDA enabled, unit tests for C++ and Python can pass. The examples can be performed. The speedup can be observed when the number of frames (samples) in a batch is not small. --------- Signed-off-by: Jinzhe Zeng <[email protected]>

1 parent fa2c0b6 commit 7fb1d11Copy full SHA for 7fb1d11

1 file changed

+245

-176

lines changed

source/op
- prod_env_mat_multi_device.cc

1 file changed

+245

-176

lines changed

Comments

(0)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commit 7fb1d11

1 file changed

1 file changed

File tree

1 file changed

1 file changed

0 commit comments