Skip to content

Commit b79edc1

Browse files
authored
Merge pull request #793 from luxonis/nn_latency_docs
Updated nn latency docs
2 parents 32a4974 + a96b4fd commit b79edc1

File tree

1 file changed

+29
-1
lines changed

1 file changed

+29
-1
lines changed

docs/source/tutorials/low-latency.rst

Lines changed: 29 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -111,7 +111,7 @@ Encoded frames
111111
You can also reduce frame latency by using `Zero-Copy <https://github.com/luxonis/depthai-core/tree/message_zero_copy>`__
112112
branch of the DepthAI. This will pass pointers (at XLink level) to cv2.Mat instead of doing memcopy (as it currently does),
113113
so performance improvement would depend on the image sizes you are using.
114-
(Note: API differs and not all functionality is available as is on the `message_zero_copy` branch)
114+
(Note: API differs and not all functionality is available as-is on the `message_zero_copy` branch)
115115

116116

117117
Reducing latency when running NN
@@ -120,6 +120,34 @@ Reducing latency when running NN
120120
In the examples above we were only streaming frames, without doing anything else on the OAK camera. This section will focus
121121
on how to reduce latency when also running NN model on the OAK.
122122

123+
Resource utilization
124+
--------------------
125+
126+
Configuring `hardware resources <https://docs.luxonis.com/projects/hardware/en/latest/pages/rvc/rvc2.html#hardware-blocks-and-accelerators>`__
127+
on RVC will result in lower latency, but also in lower FPS.
128+
129+
By default, NN nodes are running 2 threads, 1 NCE/thread, and we suggest compiling the model for half of the
130+
available SHAVE cores of the pipeline. This configuration will provide best throughput, as all threads can run freely.
131+
Compiling the model for more SHAVE cores will only provide marginal improvement, due to:
132+
133+
1. `Model optimizer <https://docs.luxonis.com/en/latest/pages/model_conversion/#model-optimizer>`__ doing a great work at optimizing the model
134+
2. On-device parallelization of NN operations (splitting the operation task between multiple SHAVEs) doesn't scale linearly due to " `memory wall <https://en.wikipedia.org/wiki/Random-access_memory#Memory_wall>`__ "
135+
136+
To minimize the latency, though, we should allocate all resources to the single inference. To get lowest latency (which will result in much lower FPS),
137+
we suggest the following:
138+
139+
- Setting the number of threads to 1
140+
- Setting the number of NCE per thread to 2
141+
- Compiling the model for all available SHAVE cores - `documentation here <https://docs.luxonis.com/en/latest/pages/model_conversion/#compile-tool>`__)
142+
143+
.. code-block:: python
144+
145+
nn = pipeline.create(dai.node.NeuralNetwork)
146+
# Same for Yolo/MobileNet (Spatial) Detection node
147+
nn.setNumNCEPerInferenceThread(2)
148+
nn.setNumInferenceThreads(1)
149+
nn.setBlobPath('path/to/compiled/model_max_shaves.blob')
150+
123151
Lowering camera FPS to match NN FPS
124152
-----------------------------------
125153

0 commit comments

Comments
 (0)