Skip to content

Commit cebfc6b

Browse files
committed
AI-Camera initial draft
1 parent 385d10d commit cebfc6b

File tree

10 files changed

+410
-0
lines changed

10 files changed

+410
-0
lines changed
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
include::ai-camera/about.adoc[]
2+
3+
include::ai-camera/getting-started.adoc[]
4+
5+
include::ai-camera/details.adoc[]
6+
7+
include::ai-camera/model-conversion.adoc[]
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
[[ai-camera]]
2+
== About
3+
4+
The Raspberry Pi AI Camera using the Sony IMX500 imaging sensor is a great way of adding low-latency and high-performance AI capabilities to any camera application. The tight integration with https://www.raspberrypi.com/documentation/computers/camera_software.html[Raspberry Pi's camera software stack] allows users to deploy their own neural network models with minimal effort.
5+
6+
This tutorial goes through the steps necessary for running either a pre-packaged or custom made neural network model on the camera. It also goes through the steps required for interperting the inference data generated by neural networks running on the IMX500 in https://github.com/raspberrypi/rpicam-apps[`rpicam-apps`] and https://github.com/raspberrypi/picamera2[`Picamera2`].
Lines changed: 242 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,242 @@
1+
2+
== Under the Hood
3+
4+
The diagram below shows the various camera software components (in green) used during our imaging/inference use case with the Raspberry Pi AI Camera module hardware (in red).
5+
6+
image::images/imx500-block-diagram.svg[IMX500 block diagram]
7+
8+
On startup, the IMX500 sensor modules gets loaded with firmware to run a particular neural network model. On streaming, the IMX500 generates an image stream together with an inference stream. This inference stream holds the inputs and outputs of the neural network model, also known as input/output tensors.
9+
10+
=== Device drivers
11+
12+
At the lowest level, the camera module is configured over the I2C bus from the IMX500 sensor kernel driver. The CSI2 driver (`CFE` on Pi 5, `Unicam` on all other Pi platforms) sets up the receiver to write the image data stream into a frame buffer, together with the embedded data and inference data streams into another buffer in memory.
13+
14+
The firmware files are also transferred over the I2C bus wires with either the standard I2C protocol, or a custom high speed protocol in the case of Pi 5. This is handled through the RP2040 SPI driver in the kernel. The RP2040 microcontroller is responsible for management of the firmware transfer operation on the camera module. It bridges the I2C transfers from the kernel to the IMX500 via a SPI bus. The RP2040 also caches firmware files on the on-board flash chip for fast upload to the IMX500, avoiding the need to transfer the entire firmware blob over the I2C bus.
15+
16+
=== libcamera
17+
18+
Once `libcamera` dequeues the image and inference data buffers from the kernel, the IMX500 specific `cam-helper` library (part of the Raspberry Pi IPA within libcamera), parses the inference buffer to get the input/output tensors. These tensors as packaged as Raspberry Pi vendor specific https://libcamera.org/api-html/namespacelibcamera_1_1controls.html[`libcamera Controls`] to be returned out to the application for consumption. The following controls are returned:
19+
20+
[%header,cols="a,a"]
21+
|===
22+
| Control
23+
| Description
24+
25+
| `CnnOutputTensor`
26+
| Floating point array storing the output tensor.
27+
28+
| `CnnInputTensor`
29+
| Floating point array storing the input tensor.
30+
31+
| `CnnOutputTensorInfo`
32+
| Network specific parameters describing the output tensors structure:
33+
34+
[source,c]
35+
----
36+
struct OutputTensorInfo {
37+
uint32_t tensorDataNum;
38+
uint32_t numDimensions;
39+
uint16_t size[MaxNumDimensions];
40+
};
41+
42+
struct CnnOutputTensorInfo {
43+
char networkName[NetworkNameLen];
44+
uint32_t numTensors;
45+
OutputTensorInfo info[MaxNumTensors];
46+
};
47+
----
48+
49+
| `CnnInputTensorInfo`
50+
| Network specific parameters describing the input tensors structure:
51+
52+
[source,c]
53+
----
54+
struct CnnInputTensorInfo {
55+
char networkName[NetworkNameLen];
56+
uint32_t width;
57+
uint32_t height;
58+
uint32_t numChannels;
59+
};
60+
----
61+
62+
|===
63+
64+
=== rpicam-apps
65+
66+
`rpicam-apps` provides an IMX500 postprocessing stage base class that implements helpers for IMX500 postprocessing stages - https://github.com/raspberrypi/rpicam-apps/blob/post_processing_stages/imx500_post_processing_stage.hpp[`IMX500PostProcessingStage`]. This base class can be used to derive a new postprocessing stage for any neural network model running on the IMX500. For example, in https://github.com/raspberrypi/rpicam-apps/blob/post_processing_stages/imx500_mobilenet_ssd.cpp[`imx500_mobilenet_ssd.cpp`], the following derived class is instanciated (not complete implementation):
67+
68+
[source,cpp]
69+
----
70+
class ObjectInference : public IMX500PostProcessingStage
71+
{
72+
public:
73+
ObjectInference(RPiCamApp *app) : IMX500PostProcessingStage(app) {}
74+
75+
char const *Name() const override;
76+
77+
void Read(boost::property_tree::ptree const &params) override;
78+
79+
void Configure() override;
80+
81+
bool Process(CompletedRequestPtr &completed_request) override;
82+
};
83+
----
84+
85+
On every frame received by the application, the `Process()` function is called (`ObjectInference::Process()` in the above case). In this function, you can extract the output tensor for futher processing and/or analysis by the stage:
86+
87+
[source,cpp]
88+
----
89+
auto output = completed_request->metadata.get(controls::rpi::CnnOutputTensor);
90+
if (!output)
91+
{
92+
LOG_ERROR("No output tensor found in metadata!");
93+
return false;
94+
}
95+
96+
std::vector<float> output_tensor(output->data(), output->data() + output->size());
97+
----
98+
99+
Once this is completed, the final results can either be visualised or saved in metadata and consumed by either another downstream stage, or the top level application itself. In the object inference case:
100+
101+
[source,cpp]
102+
----
103+
if (objects.size())
104+
completed_request->post_process_metadata.Set("object_detect.results", objects);
105+
----
106+
107+
The `object_detect_draw_cv` postprocessing stage running downstream fetches these results from the metadata and draws the bounding boxes onto the image in the `ObjectDetectDrawCvStage::Process()` function:
108+
109+
[source,cpp]
110+
----
111+
std::vector<Detection> detections;
112+
completed_request->post_process_metadata.Get("object_detect.results", detections);
113+
----
114+
115+
A full list of helper functions provided by `IMX500PostProcessingStage` is listed below:
116+
117+
[%header,cols="a,a"]
118+
|===
119+
| Function
120+
| Description
121+
122+
| `Read()`
123+
| Typically called from `<Derived Class>::Read()`, this function reads the config parameters for input tensor parsing and saving.
124+
125+
This function also reads the neural network model file string (`"network_file"`) and sets up the firmware to be loaded on camera open.
126+
127+
| `Process()`
128+
| Typically called from `<Derived Class>::Process()` this function processes and saves the input tensor to a file if required by the JSON config file.
129+
130+
| `SetInferenceRoiAbs()`
131+
| Sets an absolute region of interest (ROI) crop rectangle on the sensor image to use for inferencing on the IMX500.
132+
133+
| `SetInferenceRoiAuto()`
134+
| Automatically calculates region of interest (ROI) crop rectangle on the sensor image to preserve the input tensor aspect ratio for a given neural network.
135+
136+
| `ShowFwProgressBar()`
137+
| Displays a progress bar on the console showing the progress of the neural network firmware upload to the IMX500.
138+
139+
| `ConvertInferenceCoordinates()`
140+
| Converts from the input tensor coordinate space to the final ISP output image space.
141+
142+
There are a number of scaling/cropping/translation operations occurring from the original sensor image to the fully processed ISP output image. This function converts coordinates provided by the output tensor to the equivalent coordinates after performing these operations.
143+
144+
|===
145+
146+
=== Picamera2
147+
148+
IMX500 integration in `Picamera2` is very similar to what is available in `rpicam-apps`. `Picamera2` has an IMX500 helper class that provides the same functionality as the `rpicam-apps` `IMX500PostProcessingStage` base class. This can be imported to any python script with:
149+
150+
[source,python]
151+
----
152+
from picamera2.devices.imx500 import IMX500
153+
154+
# This must be called before instantiation of Picamera2
155+
imx500 = IMX500(model_file)
156+
----
157+
158+
To retrieve the output tensors, you fetch them from the controls and use it for for futher processing and/or analysis by the python script.
159+
160+
For example, in an object inference use case such as https://github.com/raspberrypi/picamera2/tree/main/examples/imx500/imx500_object_detection_demo.py[imx500_object_detection_demo.py], the object bounding boxes and confidence values are extracted in `parse_detections()` and draw the boxes on the image in `draw_detections()`:
161+
162+
[source,python]
163+
----
164+
class Detection:
165+
def __init__(self, coords, category, conf, metadata):
166+
"""Create a Detection object, recording the bounding box, category and confidence."""
167+
self.category = category
168+
self.conf = conf
169+
obj_scaled = imx500.convert_inference_coords(coords, metadata, picam2)
170+
self.box = (obj_scaled.x, obj_scaled.y, obj_scaled.width, obj_scaled.height)
171+
172+
def draw_detections(request, detections, stream="main"):
173+
"""Draw the detections for this request onto the ISP output."""
174+
labels = get_labels()
175+
with MappedArray(request, stream) as m:
176+
for detection in detections:
177+
x, y, w, h = detection.box
178+
label = f"{labels[int(detection.category)]} ({detection.conf:.2f})"
179+
cv2.putText(m.array, label, (x + 5, y + 15), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 255), 1)
180+
cv2.rectangle(m.array, (x, y), (x + w, y + h), (0, 0, 255, 0))
181+
if args.preserve_aspect_ratio:
182+
b = imx500.get_roi_scaled(request)
183+
cv2.putText(m.array, "ROI", (b.x + 5, b.y + 15), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (255, 0, 0), 1)
184+
cv2.rectangle(m.array, (b.x, b.y), (b.x + b.width, b.y + b.height), (255, 0, 0, 0))
185+
186+
def parse_detections(request, stream='main'):
187+
"""Parse the output tensor into a number of detected objects, scaled to the ISP out."""
188+
outputs = imx500.get_outputs(request.get_metadata())
189+
boxes, scores, classes = outputs[0][0], outputs[1][0], outputs[2][0]
190+
detections = [ Detection(box, category, score, metadata)
191+
for box, score, category in zip(boxes, scores, classes) if score > threshold]
192+
draw_detections(request, detections, stream)
193+
----
194+
195+
Note that there is no additional hysteresis or temporal filtering applied to the output like in the `rpicam-apps` example. However this should be easy enough to add to this example if needed.
196+
197+
The IMX500 class in `Picamera2` provides the following helper functions:
198+
199+
[%header,cols="a,a"]
200+
|===
201+
| Function
202+
| Description
203+
204+
| `IMX500.get_full_sensor_resolution()`
205+
| Return the full sensor resolution of the IMX500.
206+
207+
| `IMX500.config()`
208+
| Returns a dictionary of the neural network configuration.
209+
210+
| `IMX500.convert_inference_coords()`
211+
| Converts from the input tensor coordinate space to the final ISP output image space.
212+
213+
There are a number of scaling/cropping/translation operations occurring from the original sensor image to the fully processed ISP output image. This function converts coordinates provided by the output tensor to the equivalent coordinates after performing these operations.
214+
215+
| `IMX500.show_network_fw_progress_bar()`
216+
| Displays a progress bar on the console showing the progress of the neural network firmware upload to the IMX500.
217+
218+
| `IMX500.get_roi_scaled()`
219+
| Returns the region of interest (ROI) in the ISP output coordinate space.
220+
221+
| `IMX500.get_isp_output_size()`
222+
| Returns the ISP output image size.
223+
224+
| `IMX5000.get_input_w_h()`
225+
| Returns the input tensor size based on the neural network model used.
226+
227+
| `IMX500.get_outputs()`
228+
| Returns the output tensors for a given frame request.
229+
230+
| `IMX500.get_output_shapes()`
231+
| Returns the shape of the output tensors for the neural network model used.
232+
233+
| `IMX500.set_inference_roi_abs()`
234+
| Sets an absolute region of interest (ROI) crop rectangle on the sensor image to use for inferencing on the IMX500.
235+
236+
| `IMX500.set_inference_aspect_ratio()`
237+
| Automatically calculates region of interest (ROI) crop rectangle on the sensor image to preserve the input tensor aspect ratio for a given neural network.
238+
239+
| `IMX500.get_kpi_info()`
240+
| Returns the frame level performance indicators logged by the IMX500.
241+
242+
|===
Lines changed: 109 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,109 @@
1+
== Getting Started
2+
3+
The instructions below describe how to get started and run the pre-packaged MobileNet SSD and PoseNet neural network models on the Raspberry Pi AI Camera.
4+
5+
All the following commands must be run in a terminal window.
6+
7+
=== Prerequisites
8+
9+
This tutorial assumes you are using the AI Camera attached to either a Raspberry Pi 4 Model B or Raspberry Pi 5 board. However, other Raspberry Pi models with a camera connector (e.g. the Raspberry Pi Zero 2 W or Raspberry Pi 3 Model B+) are also applicable to this tutorial.
10+
11+
You must also have the latest Raspberry Pi OS image (Bookworm at the time of writing this) flashed onto an SD card and be fully up-to-date with:
12+
13+
[source,console]
14+
----
15+
sudo apt update
16+
sudo apt full-upgrade -y
17+
----
18+
19+
=== Install the IMX500 firmware
20+
The AI camera requires run-time firmware to be downloaded onto the IMX500 sensor on startup. This firmware files must be install installed on the system with:
21+
22+
[source,console]
23+
----
24+
sudo apt install imx500-firmware imx500-models rpicam-apps-imx500-postprocess python3-opencv
25+
----
26+
27+
This will install `/lib/firmware/imx500_loader.fpk` and `/lib/firmware/imx500_main.fpk` firmware files required for operating the IMX500 sensor. It will also place a number of neural network model firmware files in `/usr/share/imx500-models/`.
28+
29+
[NOTE]
30+
The IMX500 kernel device driver loads all the firmware files (loader, main, network) when the camera is started. This may take several minutes if the neural network model firmware has not been previously cached. For our demos below, a progress bar will be displayed on the console indicating the loading progress.
31+
32+
=== Reboot
33+
Finally reboot the Raspberry Pi:
34+
35+
[source,console]
36+
----
37+
sudo reboot
38+
----
39+
40+
== Example Applications
41+
42+
Once all the system packages are updated and firmware files installed, we can start running some example applications. As mentioned earlier, the Raspberry Pi AI Camera comes fully integrated with our camera software (`libcamera`, `rpicam-apps` and `Picamera2` applications).
43+
44+
=== rpicam-apps
45+
46+
The https://www.raspberrypi.com/documentation/computers/camera_software.html#libcamera-and-rpicam-apps[rpicam-apps camera applications] now include IMX500 object inference and pose estimation stages that can be run in the postprocessing pipeline. Further details on the postprocessing pipeline can be found https://www.raspberrypi.com/documentation/computers/camera_software.html#post-processing[here].
47+
48+
The postprocessing JSON files used by `rpicam-apps` in the below demos can be found in `/usr/share/rpicam-assets/`
49+
50+
==== Object Inference
51+
52+
The MobileNet SSD neural network performs basic object detection and provides bounding boxes and confidence values for each object found. The `imx500_mobilenet_ssd.json` JSON file contains the configuration parameters for the IMX500 object inferencing postprocessing stage using the MobileNet SSD neural network. On a default installation, there should be no reason to change this. Leave all other parameters unchanged for now.
53+
54+
Next, run one of the camera application, for example `rpicam-hello` with this postprocessing file:
55+
56+
[source,console]
57+
----
58+
rpicam-hello -t 0s --post-process-file /usr/share/rpicam-assets/imx500_mobilenet_ssd.json --viewfinder-width 1920 --viewfinder-height 1080 --framerate 30
59+
----
60+
61+
This will pop-up a viewfinder with bounding boxes overlaid on objects the neural network has recognised.
62+
63+
image::images/imx500-mobilenet.jpg[IMX500 MobileNet]
64+
65+
If you want to record this as a video, you can use `rpicam-vid` instead, for example:
66+
67+
[source,console]
68+
----
69+
rpicam-vid -t 10s -o output.264 --post-process-file /usr/share/rpicam-assets/imx500_mobilenet_ssd.json --width 1920 --height 1080 --framerate 30
70+
----
71+
72+
`imx500_mobilenet_ssd.json` lists two stages to run in the postprocessing pipeline - the `imx500_mobilenet_ssd` stage and the `object_detect_draw_cv` stage. The former picks out bounding boxes and confidence values generated by the neural network in the output tensor, while the latter actually draws the boxes and labels on the image.
73+
74+
A number of configuration parameters can be tweaked to alter the behaviour of the `imx500_object_inference` stage. For example, `max_detections` gives the maximum number of objects to be detected and `threshold` gives a minimum confidence value to return for an object. The raw inference output data of this network can be quite noisy, so this stage also preforms some temporal filtering and applies hysteresis, which can be disabled by removing the `temporal_filter` config block.
75+
76+
==== Pose Estimation
77+
78+
The PoseNet neural network performs pose estimation and provides key points on the body associated with joints and limbs. To run the PoseNet example, we use the same comand line as used for the MobileNet SSD example, but replacing `imx500_mobilenet_ssd.json` with `imx500_posenet.json`:
79+
80+
[source,console]
81+
----
82+
rpicam-hello -t 0s --post-process-file /usr/share/rpicam-assets/imx500_posenet.json --viewfinder-width 1920 --viewfinder-height 1080 --framerate 30
83+
----
84+
85+
`imx500_posenet.json` lists two stages to run in the postprocessing pipeline - the `imx500_posenet` stage and the `plot_pose_cv` stage. The `imx500_posenet` stage fetches the raw output tensor form the PoseNet neural network, while the `plot_pose_cv` stage draws the actual line overlays on the image. However unlike the MobileNet SSD case, the PoseNet output tensor must be futher postprocessed in software to generate the final output of body key points.
86+
87+
image::images/imx500-posenet.jpg[IMX500 PoseNet]
88+
89+
Again,`imx500_posenet.json` has a number of configuration parameters that can be tweaked to alter the behaviour of the stages's output, e.g. confidence threshold, number of detections, etc.
90+
91+
=== Picamera2
92+
93+
https://datasheets.raspberrypi.com/camera/picamera2-manual.pdf[`Picamera2`] does not have a specific postprocessing framework like `rpicam-apps` does. However, this is because implementing such postprocessing is far simpler in Python compared to C++.
94+
95+
Example scripts to perform image classification, object inference/segmentation, and pose estimation using `Picamera2` can be found in https://github.com/raspberrypi/picamera2-imx500/blob/main/examples/imx500/. The https://github.com/raspberrypi/picamera2-imx500/blob/main/examples/imx500/README.md[`README.md`] file lists all the neural networks that can be used and the available parameters for the example scripts.
96+
97+
For example, to run the YOLOv8 object inference demo use the following command:
98+
99+
[source,console]
100+
----
101+
python imx500_object_detection_demo.py --model /usr/share/imx500-models/imx500_network_yolov8n_pp.rpk --ignore-dash-labels -r
102+
----
103+
104+
and for pose estimation:
105+
106+
[source,console]
107+
----
108+
python imx500_pose_estimation_yolov8n_demo.py --model /usr/share/imx500-models/imx500_network_yolov8n_pose.rpk
109+
----

documentation/asciidoc/accessories/ai-camera/images/imx500-block-diagram.svg

Lines changed: 1 addition & 0 deletions
Loading
2.86 MB
Loading
2.54 MB
Loading

0 commit comments

Comments
 (0)