Slow processing with batched dlpreproc

Processing time for a N-buffers batched dlpreproc is much lower than the processing time for N individual dlpreproc processing in parallel:

```bash
IN_CAPS="video/x-raw, width=320, height=320, format=RGB"

GST_DEBUG="2,*tiovxsiso*:6,*perf*:6" gst-launch-1.0 \
videotestsrc ! $IN_CAPS ! mux. \
videotestsrc ! $IN_CAPS ! mux. \
videotestsrc ! $IN_CAPS ! mux. \
videotestsrc ! $IN_CAPS ! mux. \
videotestsrc ! $IN_CAPS ! mux. \
videotestsrc ! $IN_CAPS ! mux. \
tiovxmux name=mux ! \
tiovxdlpreproc  ! "application/x-tensor-tiovx(memory:batched)" ! perf ! \
tiovxdemux name=demux \
demux. ! queue ! fakesink \
demux. ! queue ! fakesink \
demux. ! queue ! fakesink \
demux. ! queue ! fakesink \
demux. ! queue ! fakesink \
demux. ! queue ! fakesink
```
This first pipeline will run at around 5fps

```bash
IN_CAPS="video/x-raw, width=320, height=320, format=RGB"

GST_DEBUG="2,*perf*:6" gst-launch-1.0 \
videotestsrc ! $IN_CAPS ! tiovxdlpreproc ! perf ! fakesink \
videotestsrc ! $IN_CAPS ! tiovxdlpreproc ! perf ! fakesink \
videotestsrc ! $IN_CAPS ! tiovxdlpreproc ! perf ! fakesink \
videotestsrc ! $IN_CAPS ! tiovxdlpreproc ! perf ! fakesink \
videotestsrc ! $IN_CAPS ! tiovxdlpreproc ! perf ! fakesink \
videotestsrc ! $IN_CAPS ! tiovxdlpreproc ! perf ! fakesink
```
Each of the 6 individual pipelines will run at 40fps.


All the delay in the batched pipeline appears to be in the processing time:
```
0:00:02.118839548  1771     0x171faf70 LOG                tiovxsiso gsttiovxsiso.c:864:gst_tiovx_siso_process_graph:<tiovxdlpreproc0> Enqueueing parameters
0:00:02.118885055  1771     0x171faf70 LOG                tiovxsiso gsttiovxsiso.c:883:gst_tiovx_siso_process_graph:<tiovxdlpreproc0> Processing graph
0:00:02.298343493  1771     0x171faf70 LOG                tiovxsiso gsttiovxsiso.c:896:gst_tiovx_siso_process_graph:<tiovxdlpreproc0> Dequeueing parameters
```
This corresponds to the following code: https://github.com/TexasInstruments/edgeai-gst-plugins/blob/develop/gst-libs/gst/tiovx/gsttiovxsiso.c#L882

which by removing the error handling can be summarized as:
```c
GST_LOG_OBJECT (self, "Enqueueing parameters");
  status =
      vxGraphParameterEnqueueReadyRef (priv->graph, INPUT_PARAMETER_INDEX,
      (vx_reference *) priv->input, priv->num_channels);
  status =
      vxGraphParameterEnqueueReadyRef (priv->graph, OUTPUT_PARAMETER_INDEX,
      (vx_reference *) priv->output, priv->num_channels);

  GST_LOG_OBJECT (self, "Processing graph");
  status = vxScheduleGraph (priv->graph);
  status = vxWaitGraph (priv->graph);

  GST_LOG_OBJECT (self, "Dequeueing parameters");
  status =
      vxGraphParameterDequeueDoneRef (priv->graph, INPUT_PARAMETER_INDEX,
      (vx_reference *) priv->input, priv->num_channels, &in_refs);
  status =
      vxGraphParameterDequeueDoneRef (priv->graph, OUTPUT_PARAMETER_INDEX,
      (vx_reference *) priv->output, priv->num_channels, &out_refs);
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Slow processing with batched dlpreproc #188

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Slow processing with batched dlpreproc #188

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions