Skip to content
Closed

Imx500 #3862

Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
4d0e960
AI-Camera initial draft
naushir Sep 2, 2024
12ec62a
Fix small issue with Connect Remote Shell docs
gllxflr Aug 31, 2024
00baba2
remove old config option
fuzzybear3 Sep 1, 2024
671a867
Fix list format
ykla Sep 4, 2024
925e81d
Fixed an error in the NFS server setup documentation.
ykla Sep 4, 2024
4d72a6d
Fix glaring typo
lurch Sep 5, 2024
04077e1
Add tabs to the documentation
nathan-contino Sep 6, 2024
a8a54ae
docs: contribute: update Linux kernel contribution guideline
cinghioGithub Jul 26, 2024
fd6e5d0
docs: contribute: addressed PR review comments
cinghioGithub Jul 29, 2024
09aeb2a
correct root of SD card to first partition
solsticedhiver Sep 6, 2024
0aa8f01
Update documentation/asciidoc/computers/configuration/headless.adoc
nathan-contino Sep 7, 2024
6190ad1
Update documentation/asciidoc/computers/configuration/headless.adoc
nathan-contino Sep 7, 2024
b96c461
Enumeration of PCIe devices behind a switch is preliminary supported.
ykla Sep 14, 2024
f2c3976
Revert "Enumeration of PCIe devices behind a switch is preliminary su…
nathan-contino Sep 18, 2024
9e1d36c
Update official_sdk.adoc
nathan-contino Sep 18, 2024
28e408f
remove outdated Ventura warning
nathan-contino Sep 18, 2024
fd76302
Fix github repository link target title
nathan-contino Sep 18, 2024
7aecb67
Generify chip references
nathan-contino Sep 18, 2024
8b395a7
Minor C SDK documentation revisions
nathan-contino Sep 18, 2024
6461a19
Bump asciidoctor from 2.0.20 to 2.0.23
dependabot[bot] Sep 18, 2024
cfefdfc
Bump minima from 2.5.1 to 2.5.2
dependabot[bot] Sep 17, 2024
b26c643
Bump tzinfo-data from 1.2024.1 to 1.2024.2
dependabot[bot] Sep 17, 2024
e633b17
Use tabs for virtualenv options
nathan-contino Sep 18, 2024
12971e7
Copy edit initial ai camera documentation
nathan-contino Sep 20, 2024
d137a5b
Merge branch 'develop' into imx500
nathan-contino Sep 30, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion Gemfile.lock
Original file line number Diff line number Diff line change
Expand Up @@ -96,7 +96,7 @@ GEM
webrick (1.8.1)

PLATFORMS
ruby
x86_64-linux

DEPENDENCIES
asciidoctor
Expand Down
7 changes: 7 additions & 0 deletions documentation/asciidoc/accessories/ai-camera.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
include::ai-camera/about.adoc[]

include::ai-camera/getting-started.adoc[]

include::ai-camera/details.adoc[]

include::ai-camera/model-conversion.adoc[]
9 changes: 9 additions & 0 deletions documentation/asciidoc/accessories/ai-camera/about.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
[[ai-camera]]
== About

The Raspberry Pi AI Camera uses the Sony IMX500 imaging sensor to provide low-latency and high-performance AI capabilities to any camera application. The tight integration with https://www.raspberrypi.com/documentation/computers/camera_software.adoc[Raspberry Pi's camera software stack] allows users to deploy their own neural network models with minimal effort.

image::images/ai-camera.png[The Raspberry Pi AI Camera]

This section demonstrates how to run either a pre-packaged or custom neural network model on the camera. Additionally, this section includes the steps required to interpret inference data generated by neural networks running on the IMX500 in https://github.com/raspberrypi/rpicam-apps[`rpicam-apps`] and https://github.com/raspberrypi/picamera2[Picamera2].

242 changes: 242 additions & 0 deletions documentation/asciidoc/accessories/ai-camera/details.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,242 @@

== Under the Hood

The diagram below shows the various camera software components (in green) used during our imaging/inference use case with the Raspberry Pi AI Camera module hardware (in red).

image::images/imx500-block-diagram.svg[IMX500 block diagram]

At startup, the IMX500 sensor module loads firmware to run a particular neural network model. During streaming, the IMX500 generates _both_ an image stream and an inference stream. This inference stream holds the inputs and outputs of the neural network model, also known as input/output **tensors**.

=== Device drivers

At the lowest level, the the IMX500 sensor kernel driver configures the camera module over the I2C bus. The CSI2 driver (`CFE` on Pi 5, `Unicam` on all other Pi platforms) sets up the receiver to write the image data stream into a frame buffer, together with the embedded data and inference data streams into another buffer in memory.

The firmware files also transfer over the I2C bus wires. On most devices, this uses the standard I2C protocol, but Raspberry Pi 5 uses a custom high speed protocol. The RP2040 SPI driver in the kernel handles firmware file transfer, since the transfer uses the RP2040 microcontroller. The microcontroller bridges the I2C transfers from the kernel to the IMX500 via a SPI bus. Additionally, the RP2040 caches firmware files in on-board storage. This avoids the need to transfer entire firmware blobs over the I2C bus, significantly speeding up firmware loading for firmware you've already used.

=== `libcamera`

Once `libcamera` dequeues the image and inference data buffers from the kernel, the IMX500 specific `cam-helper` library (part of the Raspberry Pi IPA within `libcamera`) parses the inference buffer to access the input/output tensors. These tensors are packaged as Raspberry Pi vendor-specific https://libcamera.org/api-html/namespacelibcamera_1_1controls.html[`libcamera` controls]. `libcamera` returns the following controls:

[%header,cols="a,a"]
|===
| Control
| Description

| `CnnOutputTensor`
| Floating point array storing the output tensor.

| `CnnInputTensor`
| Floating point array storing the input tensor.

| `CnnOutputTensorInfo`
| Network specific parameters describing the output tensors structure:

[source,c]
----
struct OutputTensorInfo {
uint32_t tensorDataNum;
uint32_t numDimensions;
uint16_t size[MaxNumDimensions];
};

struct CnnOutputTensorInfo {
char networkName[NetworkNameLen];
uint32_t numTensors;
OutputTensorInfo info[MaxNumTensors];
};
----

| `CnnInputTensorInfo`
| Network specific parameters describing the input tensors structure:

[source,c]
----
struct CnnInputTensorInfo {
char networkName[NetworkNameLen];
uint32_t width;
uint32_t height;
uint32_t numChannels;
};
----

|===

=== `rpicam-apps`

`rpicam-apps` provides an IMX500 post-processing stage base class that implements helpers for IMX500 post-processing stages: https://github.com/raspberrypi/rpicam-apps/blob/post_processing_stages/imx500_post_processing_stage.hpp[`IMX500PostProcessingStage`]. Use this base class to derive a new post-processing stage for any neural network model running on the IMX500. For an example, see https://github.com/raspberrypi/rpicam-apps/blob/post_processing_stages/imx500_mobilenet_ssd.cpp[`imx500_mobilenet_ssd.cpp`]:

[source,cpp]
----
class ObjectInference : public IMX500PostProcessingStage
{
public:
ObjectInference(RPiCamApp *app) : IMX500PostProcessingStage(app) {}

char const *Name() const override;

void Read(boost::property_tree::ptree const &params) override;

void Configure() override;

bool Process(CompletedRequestPtr &completed_request) override;
};
----

For every frame received by the application, the `Process()` function is called (`ObjectInference::Process()` in the above case). In this function, you can extract the output tensor for further processing or analysis:

[source,cpp]
----
auto output = completed_request->metadata.get(controls::rpi::CnnOutputTensor);
if (!output)
{
LOG_ERROR("No output tensor found in metadata!");
return false;
}

std::vector<float> output_tensor(output->data(), output->data() + output->size());
----

Once completed, the final results can either be visualised or saved in metadata and consumed by either another downstream stage, or the top level application itself. In the object inference case:

[source,cpp]
----
if (objects.size())
completed_request->post_process_metadata.Set("object_detect.results", objects);
----

The `object_detect_draw_cv` post-processing stage running downstream fetches these results from the metadata and draws the bounding boxes onto the image in the `ObjectDetectDrawCvStage::Process()` function:

[source,cpp]
----
std::vector<Detection> detections;
completed_request->post_process_metadata.Get("object_detect.results", detections);
----

The following table contains a full list of helper functions provided by `IMX500PostProcessingStage`:

[%header,cols="a,a"]
|===
| Function
| Description

| `Read()`
| Typically called from `<Derived Class>::Read()`, this function reads the config parameters for input tensor parsing and saving.

This function also reads the neural network model file string (`"network_file"`) and sets up the firmware to be loaded on camera open.

| `Process()`
| Typically called from `<Derived Class>::Process()` this function processes and saves the input tensor to a file if required by the JSON config file.

| `SetInferenceRoiAbs()`
| Sets an absolute region of interest (ROI) crop rectangle on the sensor image to use for inferencing on the IMX500.

| `SetInferenceRoiAuto()`
| Automatically calculates region of interest (ROI) crop rectangle on the sensor image to preserve the input tensor aspect ratio for a given neural network.

| `ShowFwProgressBar()`
| Displays a progress bar on the console showing the progress of the neural network firmware upload to the IMX500.

| `ConvertInferenceCoordinates()`
| Converts from the input tensor coordinate space to the final ISP output image space.

There are a number of scaling/cropping/translation operations occurring from the original sensor image to the fully processed ISP output image. This function converts coordinates provided by the output tensor to the equivalent coordinates after performing these operations.

|===

=== Picamera2

IMX500 integration in Picamera2 is very similar to what is available in `rpicam-apps`. Picamera2 has an IMX500 helper class that provides the same functionality as the `rpicam-apps` `IMX500PostProcessingStage` base class. This can be imported to any python script with:

[source,python]
----
from picamera2.devices.imx500 import IMX500

# This must be called before instantiation of Picamera2
imx500 = IMX500(model_file)
----

To retrieve the output tensors, fetch them from the controls. You can then apply additional processing in your python script.

For example, in an object inference use case such as https://github.com/raspberrypi/picamera2/tree/main/examples/imx500/imx500_object_detection_demo.py[imx500_object_detection_demo.py], the object bounding boxes and confidence values are extracted in `parse_detections()` and draw the boxes on the image in `draw_detections()`:

[source,python]
----
class Detection:
def __init__(self, coords, category, conf, metadata):
"""Create a Detection object, recording the bounding box, category and confidence."""
self.category = category
self.conf = conf
obj_scaled = imx500.convert_inference_coords(coords, metadata, picam2)
self.box = (obj_scaled.x, obj_scaled.y, obj_scaled.width, obj_scaled.height)

def draw_detections(request, detections, stream="main"):
"""Draw the detections for this request onto the ISP output."""
labels = get_labels()
with MappedArray(request, stream) as m:
for detection in detections:
x, y, w, h = detection.box
label = f"{labels[int(detection.category)]} ({detection.conf:.2f})"
cv2.putText(m.array, label, (x + 5, y + 15), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 255), 1)
cv2.rectangle(m.array, (x, y), (x + w, y + h), (0, 0, 255, 0))
if args.preserve_aspect_ratio:
b = imx500.get_roi_scaled(request)
cv2.putText(m.array, "ROI", (b.x + 5, b.y + 15), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (255, 0, 0), 1)
cv2.rectangle(m.array, (b.x, b.y), (b.x + b.width, b.y + b.height), (255, 0, 0, 0))

def parse_detections(request, stream='main'):
"""Parse the output tensor into a number of detected objects, scaled to the ISP out."""
outputs = imx500.get_outputs(request.get_metadata())
boxes, scores, classes = outputs[0][0], outputs[1][0], outputs[2][0]
detections = [ Detection(box, category, score, metadata)
for box, score, category in zip(boxes, scores, classes) if score > threshold]
draw_detections(request, detections, stream)
----

Unlike the `rpicam-apps` example, this example applies no additional hysteresis or temporal filtering.

The IMX500 class in Picamera2 provides the following helper functions:

[%header,cols="a,a"]
|===
| Function
| Description

| `IMX500.get_full_sensor_resolution()`
| Return the full sensor resolution of the IMX500.

| `IMX500.config()`
| Returns a dictionary of the neural network configuration.

| `IMX500.convert_inference_coords()`
| Converts from the input tensor coordinate space to the final ISP output image space.

There are a number of scaling/cropping/translation operations occurring from the original sensor image to the fully processed ISP output image. This function converts coordinates provided by the output tensor to the equivalent coordinates after performing these operations.

| `IMX500.show_network_fw_progress_bar()`
| Displays a progress bar on the console showing the progress of the neural network firmware upload to the IMX500.

| `IMX500.get_roi_scaled()`
| Returns the region of interest (ROI) in the ISP output coordinate space.

| `IMX500.get_isp_output_size()`
| Returns the ISP output image size.

| `IMX5000.get_input_w_h()`
| Returns the input tensor size based on the neural network model used.

| `IMX500.get_outputs()`
| Returns the output tensors for a given frame request.

| `IMX500.get_output_shapes()`
| Returns the shape of the output tensors for the neural network model used.

| `IMX500.set_inference_roi_abs()`
| Sets an absolute region of interest (ROI) crop rectangle on the sensor image to use for inferencing on the IMX500.

| `IMX500.set_inference_aspect_ratio()`
| Automatically calculates region of interest (ROI) crop rectangle on the sensor image to preserve the input tensor aspect ratio for a given neural network.

| `IMX500.get_kpi_info()`
| Returns the frame level performance indicators logged by the IMX500.

|===
128 changes: 128 additions & 0 deletions documentation/asciidoc/accessories/ai-camera/getting-started.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,128 @@
== Getting started

The instructions below describe how to run the pre-packaged MobileNet SSD and PoseNet neural network models on the Raspberry Pi AI Camera.

=== Prerequisites

These instructions assumes you are using the AI Camera attached to either a Raspberry Pi 4 Model B or Raspberry Pi 5 board. With minor changes, you can follow these instructions on other Raspberry Pi models with a camera connector, including the Raspberry Pi Zero 2 W and Raspberry Pi 3 Model B+.

First, ensure that your Raspberry Pi runs the latest software. Run the following command to update:

[source,console]
----
$ sudo apt update && sudo apt full-upgrade
----

=== Install the IMX500 firmware

The AI camera must download runtime firmware onto the IMX500 sensor during startup. To install these firmware files onto your Raspberry Pi, run the following command:

[source,console]
----
$ sudo apt install imx500-firmware imx500-models rpicam-apps-imx500-postprocess python3-opencv
----

This command:

* installs the `/lib/firmware/imx500_loader.fpk` and `/lib/firmware/imx500_main.fpk` firmware files required to operate the IMX500 sensor
* places a number of neural network model firmware files in `/usr/share/imx500-models/`

NOTE: The IMX500 kernel device driver loads all the firmware files (loader, main, and network) when the camera starts. This may take several minutes if the neural network model firmware has not been previously cached. The demos below display a progress bar on the console to indicate firmware loading progress.

=== Reboot

Now that you've installed the prerequisites, restart your Raspberry Pi:

[source,console]
----
$ sudo reboot
----

== Run example applications

Once all the system packages are updated and firmware files installed, we can start running some example applications. As mentioned earlier, the Raspberry Pi AI Camera integrates fully with `libcamera`, `rpicam-apps`, and `Picamera2`.

=== `rpicam-apps`

The xref:../computers/camera_software.adoc#rpicam-apps[`rpicam-apps` camera applications] include IMX500 object inference and pose estimation stages that can be run in the post-processing pipeline. For more information about the post-processing pipeline, see xref:../computers/camera_software.adoc#post-process-file[the post-processing documentation].

The examples on this page use post-processing JSON files located in `/usr/share/rpicam-assets/`.

==== Object inference

The MobileNet SSD neural network performs basic object detection, providing bounding boxes and confidence values for each object found. `imx500_mobilenet_ssd.json` contains the configuration parameters for the IMX500 object inferencing post-processing stage using the MobileNet SSD neural network.

`imx500_mobilenet_ssd.json` declares a post-processing pipeline that contains two stages:

. `imx500_mobilenet_ssd`, which picks out bounding boxes and confidence values generated by the neural network in the output tensor
. `object_detect_draw_cv`, which draws bounding boxes and labels on the image

The MobileNet SSD tensor requires no significant post-processing on your Raspberry Pi to generate the final output of bounding boxes. All object detection runs directly on the AI Camera.

The following command runs `rpicam-hello` with object detection post-processing:

[source,console]
----
$ rpicam-hello -t 0s --post-process-file /usr/share/rpicam-assets/imx500_mobilenet_ssd.json --viewfinder-width 1920 --viewfinder-height 1080 --framerate 30
----

After running the command, you should see a viewfinder that overlays bounding boxes on objects recognised by the neural network:

image::images/imx500-mobilenet.jpg[IMX500 MobileNet]

To record video with object detection overlays, use `rpicam-vid` instead. The following command runs `rpicam-hello` with object detection post-processing:

[source,console]
----
$ rpicam-vid -t 10s -o output.264 --post-process-file /usr/share/rpicam-assets/imx500_mobilenet_ssd.json --width 1920 --height 1080 --framerate 30
----

You can configure the `imx500_object_inference` stage in many ways.

For example, `max_detections` defines the maximum number of objects that the pipeline will detect at any given time. `threshold` defines the minimum confidence value required for the pipeline to consider any input as an object.

The raw inference output data of this network can be quite noisy, so this stage also preforms some temporal filtering and applies hysteresis. To disable this filtering, remove the `temporal_filter` config block.

==== Pose estimation

The PoseNet neural network performs pose estimation, labelling key points on the body associated with joints and limbs. `imx500_posenet.json` contains the configuration parameters for the IMX500 pose estimation post-processing stage using the PoseNet neural network.

`imx500_posenet.json` declares a post-processing pipeline that contains two stages:

* `imx500_posenet`, which fetches the raw output tensor from the PoseNet neural network
* `plot_pose_cv`, which draws line overlays on the image

The AI Camera performs basic detection, but the output tensor requires additional post-processing on your host Raspberry Pi to produce final output.

The following command runs `rpicam-hello` with pose estimation post-processing:

[source,console]
----
$ rpicam-hello -t 0s --post-process-file /usr/share/rpicam-assets/imx500_posenet.json --viewfinder-width 1920 --viewfinder-height 1080 --framerate 30
----

image::images/imx500-posenet.jpg[IMX500 PoseNet]

You can configure the `imx500_posenet` stage in many ways.

For example, `max_detections` defines the maximum number of body points that the pipeline will detect at any given time. `threshold` defines the minimum confidence value required for the pipeline to consider input as a body point.

=== Picamera2

For examples of image classification, object inference, object segmentation, and pose estimation using Picamera2, see https://github.com/raspberrypi/picamera2-imx500/blob/main/examples/imx500/[the `picamera2-imx500` GitHub repository].

Download the repository to your Raspberry Pi to run the examples. You'll find example files in the root directory, with additional information in the `README.md` file.

Run the following script from the repository to run YOLOv8 object inference:

[source,console]
----
$ python imx500_object_detection_demo.py --model /usr/share/imx500-models/imx500_network_yolov8n_pp.rpk --ignore-dash-labels -r
----

To try pose estimation in Picamera2, run the following script from the repository:

[source,console]
----
$ python imx500_pose_estimation_yolov8n_demo.py --model /usr/share/imx500-models/imx500_network_yolov8n_pose.rpk
----
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Loading