You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/source/components/nodes/video_encoder.rst
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,7 +2,7 @@ VideoEncoder
2
2
============
3
3
4
4
VideoEncoder node is used to encode :ref:`ImgFrame` into either H264, H265, or MJPEG streams. Only NV12 or GRAY8 (which gets converted to NV12) format is
5
-
supported as an input.
5
+
supported as an input. All codecs are lossy (except lossless MJPEG), for more information please see `encoding quality docs <https://github.com/luxonis/depthai-experiments/tree/master/gen2-record-replay/encoding_quality>`__.
Copy file name to clipboardExpand all lines: docs/source/tutorials/low-latency.rst
+89-5Lines changed: 89 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -128,14 +128,98 @@ On PoE, the latency can vary quite a bit due to a number of factors:
128
128
* 100% OAK Leon CSS (CPU) usage. The Leon CSS core handles the POE communication (`see docs here <https://docs.luxonis.com/projects/hardware/en/latest/pages/rvc/rvc2.html#hardware-blocks-and-accelerators>`__), and if the CPU is 100% used, it will not be able to handle the communication as fast as it should.
129
129
* Another potential way to improve PoE latency would be to fine-tune network settings, like MTU, TCP window size, etc. (see `here <https://docs.luxonis.com/projects/hardware/en/latest/pages/guides/getting-started-with-poe.html#advance-network-settings>`__ for more info)
130
130
131
+
Bandwidth
132
+
#########
133
+
134
+
With large, unencoded frames, one can quickly saturate the bandwidth even at 30FPS, especially on PoE devices (1gbps link):
In the examples above we were only streaming frames, without doing anything else on the OAK camera. This section will focus
135
162
on how to reduce latency when also running NN model on the OAK.
136
163
137
-
Lowering camera FPS to match NN FPS
138
-
-----------------------------------
164
+
1. Increasing NN resources
165
+
--------------------------
166
+
167
+
One option to reduce latency is to increase the NN resources. This can be done by changing the number of allocated NCEs and SHAVES (see HW accelerator `docs here <https://docs.luxonis.com/projects/hardware/en/latest/pages/rvc/rvc2.html#hardware-blocks-and-accelerators>`__).
168
+
`Compile Tool <https://docs.luxonis.com/en/latest/pages/model_conversion/#compile-tool>`__ can compile a model for more SHAVE cores. To allocate more NCEs, you can use API below:
169
+
170
+
.. code-block:: python
171
+
172
+
import depthai as dai
173
+
174
+
pipeline = dai.Pipeline()
175
+
# nn = pipeline.createNeuralNetwork()
176
+
# nn = pipeline.create(dai.node.MobileNetDetectionNetwork)
177
+
nn = pipeline.create(dai.node.YoloDetectionNetwork)
178
+
nn.setNumInferenceThreads(1) # By default 2 threads are used
179
+
nn.setNumNCEPerInferenceThread(2) # By default, 1 NCE is used per thread
180
+
181
+
Models usually run at **max FPS** when using 2 threads (1 NCE/Thread), and compiling model for ``AVAILABLE_SHAVES / 2``.
182
+
183
+
Example of FPS & latency comparison for YoloV7-tiny:
184
+
185
+
.. list-table::
186
+
:header-rows: 1
187
+
188
+
* - NN resources
189
+
- Camera FPS
190
+
- Latency
191
+
- NN FPS
192
+
* - **6 SHAVEs, 2x Threads (1NCE/Thread)**
193
+
- 15
194
+
- 155 ms
195
+
- 15
196
+
* - 6 SHAVEs, 2x Threads (1NCE/Thread)
197
+
- 14
198
+
- 149 ms
199
+
- 14
200
+
* - 6 SHAVEs, 2x Threads (1NCE/Thread)
201
+
- 13
202
+
- 146 ms
203
+
- 13
204
+
* - 6 SHAVEs, 2x Threads (1NCE/Thread)
205
+
- 10
206
+
- 141 ms
207
+
- 10
208
+
* - **13 SHAVEs, 1x Thread (2NCE/Thread)**
209
+
- 30
210
+
- 145 ms
211
+
- 11.6
212
+
* - 13 SHAVEs, 1x Thread (2NCE/Thread)
213
+
- 12
214
+
- 128 ms
215
+
- 12
216
+
* - 13 SHAVEs, 1x Thread (2NCE/Thread)
217
+
- 10
218
+
- 118 ms
219
+
- 10
220
+
221
+
2. Lowering camera FPS to match NN FPS
222
+
--------------------------------------
139
223
140
224
Lowering FPS to not exceed NN capabilities typically provides the best latency performance, since the NN is able to
141
225
start the inference as soon as a new frame is available.
@@ -153,11 +237,11 @@ This time includes the following:
153
237
- And finally, eventual extra latency until it reaches the app
154
238
155
239
Note: if the FPS is increased slightly more, towards 19..21 FPS, an extra latency of about 10ms appears, that we believe
156
-
is related to firmware. We are activaly looking for improvements for lower latencies.
240
+
is related to firmware. We are actively looking for improvements for lower latencies.
157
241
158
242
159
-
NN input queue size and blocking behaviour
160
-
------------------------------------------
243
+
3. NN input queue size and blocking behavior
244
+
--------------------------------------------
161
245
162
246
If the app has ``detNetwork.input.setBlocking(False)``, but the queue size doesn't change, the following adjustment
0 commit comments