|
1 | 1 |
|
2 | 2 | == Under the Hood
|
3 | 3 |
|
| 4 | +=== Overview |
| 5 | + |
| 6 | +The Raspberry Pi AI Camera works rather differently to more traditional AI-based camera image processing systems, as shown in the diagram below. |
| 7 | + |
| 8 | +image::images/imx500-comparison.svg[Traditional versus IMX500 AI camera systems] |
| 9 | + |
| 10 | +On the left is a diagram of a more traditional AI camera system. Here, the camera delivers only images to the Raspberry Pi. The Raspberry Pi processes the images and is then responsible for performing AI inferencing. This may use an optional external AI accelerator, as shown, or it may happen (more slowly) in the CPU. |
| 11 | + |
| 12 | +On the right we have the IMX500-based system. The camera module contains a small ISP which turns the raw camera image data into an _input tensor_ which is fed directly to the AI accelerator within the camera. In turn, this produces an _output tensor_, containing the inferencing results, which is fed back to the Raspberry Pi itself. There is no need for an external accelerator, nor for the Raspberry Pi to run neural network software on the CPU. |
| 13 | + |
| 14 | +Some concepts that may be helpful to understand include: |
| 15 | + |
| 16 | +==== The _Input Tensor_ |
| 17 | + |
| 18 | +This is the part of the sensor image that is passed to the AI engine for inferencing. It is produced by a small on-board ISP which also crops and scales the camera image to the dimensions expected by the neural network that has been loaded. The input tensor is not normally made available to applications, though it is possible to access it for debugging purposes. |
| 19 | + |
| 20 | +==== The _Region of Interest_ |
| 21 | + |
| 22 | +The Region of Interest (or _ROI_) specifies exactly which part of the sensor image is cropped out before being rescaled to the size demanded by the neural network. It can be queried and set by an application. The units used are always pixels in the full resolution sensor output. |
| 23 | + |
| 24 | +By default, the ROI is set to be the full image received from the sensor (that is, nothing is actally cropped out). |
| 25 | + |
| 26 | +==== The _Output Tensor_ |
| 27 | + |
| 28 | +These are the results of inferencing performed by the neural network. The precise number and shape of the outputs will depend on the neural network, and application code will need to understand how to handle them. |
| 29 | + |
| 30 | +=== System Architecture |
| 31 | + |
4 | 32 | The diagram below shows the various camera software components (in green) used during our imaging/inference use case with the Raspberry Pi AI Camera module hardware (in red).
|
5 | 33 |
|
6 | 34 | image::images/imx500-block-diagram.svg[IMX500 block diagram]
|
@@ -204,39 +232,39 @@ The IMX500 class in Picamera2 provides the following helper functions:
|
204 | 232 | | `IMX500.get_full_sensor_resolution()`
|
205 | 233 | | Return the full sensor resolution of the IMX500.
|
206 | 234 |
|
207 |
| -| `IMX500.config()` |
| 235 | +| `IMX500.config` |
208 | 236 | | Returns a dictionary of the neural network configuration.
|
209 | 237 |
|
210 |
| -| `IMX500.convert_inference_coords()` |
211 |
| -| Converts from the input tensor coordinate space to the final ISP output image space. |
| 238 | +| `IMX500.convert_inference_coords(coords, metadata, picamera2)` |
| 239 | +| Converts the coordinates _coords_ from the input tensor coordinate space to the final ISP output image space. Must be passed Picamera2's image metadata for the image, and the Picamera2 object. |
212 | 240 |
|
213 | 241 | There are a number of scaling/cropping/translation operations occurring from the original sensor image to the fully processed ISP output image. This function converts coordinates provided by the output tensor to the equivalent coordinates after performing these operations.
|
214 | 242 |
|
215 | 243 | | `IMX500.show_network_fw_progress_bar()`
|
216 | 244 | | Displays a progress bar on the console showing the progress of the neural network firmware upload to the IMX500.
|
217 | 245 |
|
218 |
| -| `IMX500.get_roi_scaled()` |
219 |
| -| Returns the region of interest (ROI) in the ISP output coordinate space. |
| 246 | +| `IMX500.get_roi_scaled(request)` |
| 247 | +| Returns the region of interest (ROI) in the ISP output image coordinate space. |
220 | 248 |
|
221 |
| -| `IMX500.get_isp_output_size()` |
| 249 | +| `IMX500.get_isp_output_size(picamera2)` |
222 | 250 | | Returns the ISP output image size.
|
223 | 251 |
|
224 |
| -| `IMX5000.get_input_w_h()` |
| 252 | +| `IMX5000.get_input_size()` |
225 | 253 | | Returns the input tensor size based on the neural network model used.
|
226 | 254 |
|
227 |
| -| `IMX500.get_outputs()` |
228 |
| -| Returns the output tensors for a given frame request. |
| 255 | +| `IMX500.get_outputs(metadata)` |
| 256 | +| Returns the output tensors from the Picamera2 image metadata metadata. |
229 | 257 |
|
230 |
| -| `IMX500.get_output_shapes()` |
231 |
| -| Returns the shape of the output tensors for the neural network model used. |
| 258 | +| `IMX500.get_output_shapes(metadata)` |
| 259 | +| Returns the shape of the output tensors from the Picamera2 image metadata for the neural network model used. |
232 | 260 |
|
233 |
| -| `IMX500.set_inference_roi_abs()` |
234 |
| -| Sets an absolute region of interest (ROI) crop rectangle on the sensor image to use for inferencing on the IMX500. |
| 261 | +| `IMX500.set_inference_roi_abs(rectangle)` |
| 262 | +| Sets the region of interest (ROI) crop rectangle which determines which part of the sensor image is converted to the input tensor that is used for inferencing on the IMX500. The region of interest should be specified in units of pixels at the full sensor resolution, as a `(x_offset, y_offset, width, height)` tuple. |
235 | 263 |
|
236 |
| -| `IMX500.set_inference_aspect_ratio()` |
237 |
| -| Automatically calculates region of interest (ROI) crop rectangle on the sensor image to preserve the input tensor aspect ratio for a given neural network. |
| 264 | +| `IMX500.set_inference_aspect_ratio(aspect_ratio)` |
| 265 | +| Automatically calculates region of interest (ROI) crop rectangle on the sensor image to preserve the given aspect ratio. To make the ROI aspect ratio exactly match the input tensor for this network, use `imx500.set_inference_aspect_ratio(imx500.get_input_size())`. |
238 | 266 |
|
239 |
| -| `IMX500.get_kpi_info()` |
240 |
| -| Returns the frame level performance indicators logged by the IMX500. |
| 267 | +| `IMX500.get_kpi_info(metadata)` |
| 268 | +| Returns the frame level performance indicators logged by the IMX500 for the given image metadata. |
241 | 269 |
|
242 | 270 | |===
|
0 commit comments