-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Description
The problem
When I run detectnet
on Jetson Nano (with Jetpack 4.6.6) as instructed by the tutorial:
detectnet --model=peoplenet pedestrians.mp4 pedestrians_peoplenet.mp4
,
the output video has a very low framerate of ~3 FPS. I would like the framerate to be exactly the same as that of the input video, even if it means longer processing.
My insight
The same kind of situation seems to happen with all tutorial examples involving videos, only peoplenet
has a particularly high processing time making the issue more noticeable. Some investigation lead me to believe that the videoSource
class works in the following way: after the first input.Capture()
call, the input stream (a video or camera feed) starts playing in a separate thread at its natural speed. Whether the frames are taken or not, they keep playing. So the main loop must keep calling input.Capture()
with at least the same frequency, or else the frames that are not captured in time get dropped.
This makes perfect sense to me when the input is a camera feed - if the processing is not as fast as the supply, some frames must be dropped to avoid a cumulative delay. But when the input is a static video file, I expected the application to take as much time as necessary to process the whole video frame by frame.
The desired behaviour
In the example above, the video lasts 13 seconds at 30 FPS. Assuming the neural network can only process at 3 FPS, I would like the application to work for 130 seconds to process every frame, outputting a 13 second video at 30 FPS. Instead, it seems to work for 13 seconds, drop 9 out of 10 frames and output a 13 second video at 3 FPS.
Is the desired behaviour possible to achieve with jetson-inference
?
PS The following error message seems to get printed exactly once per each dropped frame (not Capture()
'd in time):
nvbuf_utils: dmabuf_fd 1266 mapped entry NOT found
nvbuf_utils: NvReleaseFd Failed... Exiting...
but as I read from another topic, it is probably spurious and doesn't matter.