AI camera documentation updates

davidplowman · naushir · commit fecad7aabf0e · 2024-09-29T09:28:56.000+01:00
Principally:

A new "Overview" to start the "Under the Hood" section.

Model Conversion substantially overhauled, though more still needed.
diff --git a/documentation/asciidoc/accessories/ai-camera/details.adoc b/documentation/asciidoc/accessories/ai-camera/details.adoc
@@ -1,6 +1,34 @@
 
 == Under the Hood
 
+=== Overview
+
+The Raspberry Pi AI Camera works rather differently to more traditional AI-based camera image processing systems, as shown in the diagram below.
+
+image::images/imx500-comparison.svg[Traditional versus IMX500 AI camera systems]
+
+On the left is a diagram of a more traditional AI camera system. Here, the camera delivers only images to the Raspberry Pi. The Raspberry Pi processes the images and is then responsible for performing AI inferencing. This may use an optional external AI accelerator, as shown, or it may happen (more slowly) in the CPU.
+
+On the right we have the IMX500-based system. The camera module contains a small ISP which turns the raw camera image data into an _input tensor_ which is fed directly to the AI accelerator within the camera. In turn, this produces an _output tensor_, containing the inferencing results, which is fed back to the Raspberry Pi itself. There is no need for an external accelerator, nor for the Raspberry Pi to run neural network software on the CPU.
+
+Some concepts that may be helpful to understand include:
+
+==== The _Input Tensor_
+
+This is the part of the sensor image that is passed to the AI engine for inferencing. It is produced by a small on-board ISP which also crops and scales the camera image to the dimensions expected by the neural network that has been loaded. The input tensor is not normally made available to applications, though it is possible to access it for debugging purposes.
+
+==== The _Region of Interest_
+
+The Region of Interest (or _ROI_) specifies exactly which part of the sensor image is cropped out before being rescaled to the size demanded by the neural network. It can be queried and set by an application. The units used are always pixels in the full resolution sensor output.
+
+By default, the ROI is set to be the full image received from the sensor (that is, nothing is actally cropped out).
+
+==== The _Output Tensor_
+
+These are the results of inferencing performed by the neural network. The precise number and shape of the outputs will depend on the neural network, and application code will need to understand how to handle them.
+
+=== System Architecture
+
 The diagram below shows the various camera software components (in green) used during our imaging/inference use case with the Raspberry Pi AI Camera module hardware (in red).
 
 image::images/imx500-block-diagram.svg[IMX500 block diagram]
@@ -204,39 +232,39 @@ The IMX500 class in Picamera2 provides the following helper functions:
 | `IMX500.get_full_sensor_resolution()`
 | Return the full sensor resolution of the IMX500.
 
-| `IMX500.config()`
+| `IMX500.config`
 | Returns a dictionary of the neural network configuration.
 
-| `IMX500.convert_inference_coords()`
-| Converts from the input tensor coordinate space to the final ISP output image space.
+| `IMX500.convert_inference_coords(coords, metadata, picamera2)`
+| Converts the coordinates _coords_ from the input tensor coordinate space to the final ISP output image space. Must be passed Picamera2's image metadata for the image, and the Picamera2 object.
 
 There are a number of scaling/cropping/translation operations occurring from the original sensor image to the fully processed ISP output image. This function converts coordinates provided by the output tensor to the equivalent coordinates after performing these operations.
 
 | `IMX500.show_network_fw_progress_bar()`
 | Displays a progress bar on the console showing the progress of the neural network firmware upload to the IMX500.
 
-| `IMX500.get_roi_scaled()`
-| Returns the region of interest (ROI) in the ISP output coordinate space.
+| `IMX500.get_roi_scaled(request)`
+| Returns the region of interest (ROI) in the ISP output image coordinate space.
 
-| `IMX500.get_isp_output_size()`
+| `IMX500.get_isp_output_size(picamera2)`
 | Returns the ISP output image size.
 
-| `IMX5000.get_input_w_h()`
+| `IMX5000.get_input_size()`
 | Returns the input tensor size based on the neural network model used.
 
-| `IMX500.get_outputs()`
-| Returns the output tensors for a given frame request.
+| `IMX500.get_outputs(metadata)`
+| Returns the output tensors from the Picamera2 image metadata metadata.
 
-| `IMX500.get_output_shapes()`
-| Returns the shape of the output tensors for the neural network model used.
+| `IMX500.get_output_shapes(metadata)`
+| Returns the shape of the output tensors from the Picamera2 image metadata for the neural network model used.
 
-| `IMX500.set_inference_roi_abs()`
-| Sets an absolute region of interest (ROI) crop rectangle on the sensor image to use for inferencing on the IMX500.
+| `IMX500.set_inference_roi_abs(rectangle)`
+| Sets the region of interest (ROI) crop rectangle which determines which part of the sensor image is converted to the input tensor that is used for inferencing on the IMX500. The region of interest should be specified in units of pixels at the full sensor resolution, as a `(x_offset, y_offset, width, height)` tuple.
 
-| `IMX500.set_inference_aspect_ratio()`
-| Automatically calculates region of interest (ROI) crop rectangle on the sensor image to preserve the input tensor aspect ratio for a given neural network.
+| `IMX500.set_inference_aspect_ratio(aspect_ratio)`
+| Automatically calculates region of interest (ROI) crop rectangle on the sensor image to preserve the given aspect ratio. To make the ROI aspect ratio exactly match the input tensor for this network, use `imx500.set_inference_aspect_ratio(imx500.get_input_size())`.
 
-| `IMX500.get_kpi_info()`
-| Returns the frame level performance indicators logged by the IMX500.
+| `IMX500.get_kpi_info(metadata)`
+| Returns the frame level performance indicators logged by the IMX500 for the given image metadata.
 
 |===
diff --git a/documentation/asciidoc/accessories/ai-camera/getting-started.adoc b/documentation/asciidoc/accessories/ai-camera/getting-started.adoc
@@ -19,15 +19,17 @@ The AI camera must download runtime firmware onto the IMX500 sensor during start
 
 [source,console]
 ----
-$ sudo apt install imx500-firmware imx500-models rpicam-apps-imx500-postprocess python3-opencv
+$ sudo apt install imx500-all
 ----
 
 This command:
 
-* installs the `/lib/firmware/imx500_loader.fpk` and `/lib/firmware/imx500_main.fpk` firmware files required to operate the IMX500 sensor
+* installs the `/lib/firmware/imx500_loader.fpk` and `/lib/firmware/imx500_firmware.fpk` firmware files required to operate the IMX500 sensor
 * places a number of neural network model firmware files in `/usr/share/imx500-models/`
+* installs the IMX500 post-processing software stages in `rpicam-apps`
+* installs the Sony network model packaging tools
 
-NOTE: The IMX500 kernel device driver loads all the firmware files (loader, main, and network) when the camera starts. This may take several minutes if the neural network model firmware has not been previously cached. The demos below display a progress bar on the console to indicate firmware loading progress.
+NOTE: The IMX500 kernel device driver loads all the firmware files when the camera starts. This may take several minutes if the neural network model firmware has not been previously cached. The demos below display a progress bar on the console to indicate firmware loading progress.
 
 === Reboot
 
@@ -44,13 +46,13 @@ Once all the system packages are updated and firmware files installed, we can st
 
 === `rpicam-apps`
 
-The xref:../computers/camera_software.adoc#rpicam-apps[`rpicam-apps` camera applications] include IMX500 object inference and pose estimation stages that can be run in the post-processing pipeline. For more information about the post-processing pipeline, see xref:../computers/camera_software.adoc#post-process-file[the post-processing documentation].
+The xref:../computers/camera_software.adoc#rpicam-apps[`rpicam-apps` camera applications] include IMX500 object detection and pose estimation stages that can be run in the post-processing pipeline. For more information about the post-processing pipeline, see xref:../computers/camera_software.adoc#post-process-file[the post-processing documentation].
 
 The examples on this page use post-processing JSON files located in `/usr/share/rpicam-assets/`.
 
-==== Object inference
+==== Object detection
 
-The MobileNet SSD neural network performs basic object detection, providing bounding boxes and confidence values for each object found. `imx500_mobilenet_ssd.json` contains the configuration parameters for the IMX500 object inferencing post-processing stage using the MobileNet SSD neural network.
+The MobileNet SSD neural network performs basic object detection, providing bounding boxes and confidence values for each object found. `imx500_mobilenet_ssd.json` contains the configuration parameters for the IMX500 object detection post-processing stage using the MobileNet SSD neural network.
 
 `imx500_mobilenet_ssd.json` declares a post-processing pipeline that contains two stages:
 
@@ -77,7 +79,7 @@ To record video with object detection overlays, use `rpicam-vid` instead. The fo
 $ rpicam-vid -t 10s -o output.264 --post-process-file /usr/share/rpicam-assets/imx500_mobilenet_ssd.json --width 1920 --height 1080 --framerate 30
 ----
 
-You can configure the `imx500_object_inference` stage in many ways.
+You can configure the `imx500_object_detection` stage in many ways.
 
 For example, `max_detections` defines the maximum number of objects that the pipeline will detect at any given time. `threshold` defines the minimum confidence value required for the pipeline to consider any input as an object.
 
@@ -105,15 +107,22 @@ image::images/imx500-posenet.jpg[IMX500 PoseNet]
 
 You can configure the `imx500_posenet` stage in many ways.
 
-For example, `max_detections` defines the maximum number of body points that the pipeline will detect at any given time. `threshold` defines the minimum confidence value required for the pipeline to consider input as a body point.
+For example, `max_detections` defines the maximum number of bodies that the pipeline will detect at any given time. `threshold` defines the minimum confidence value required for the pipeline to consider input as a body.
 
 === Picamera2
 
-For examples of image classification, object inference, object segmentation, and pose estimation using Picamera2, see https://github.com/raspberrypi/picamera2-imx500/blob/main/examples/imx500/[the `picamera2-imx500` GitHub repository].
+For examples of image classification, object detection, object segmentation, and pose estimation using Picamera2, see https://github.com/raspberrypi/picamera2/blob/main/examples/imx500/[the `picamera2` GitHub repository].
 
-Download the repository to your Raspberry Pi to run the examples. You'll find example files in the root directory, with additional information in the `README.md` file.
+Most of the examples use OpenCV for some additional processing, so if you haven't done so previously, please run:
 
-Run the following script from the repository to run YOLOv8 object inference:
+[source,console]
+----
+$ sudo apt install python3-opencv python3-munkres
+----
+
+Now download the https://github.com/raspberrypi/picamera2[the `picamera2` repository] to your Raspberry Pi to run the examples. You'll find example files in the root directory, with additional information in the `README.md` file.
+
+Run the following script from the repository to run YOLOv8 object detection:
 
 [source,console]
 ----
@@ -124,5 +133,5 @@ To try pose estimation in Picamera2, run the following script from the repositor
 
 [source,console]
 ----
-$ python imx500_pose_estimation_yolov8n_demo.py --model /usr/share/imx500-models/imx500_network_yolov8n_pose.rpk
+$ python imx500_pose_estimation_higherhrnet_demo.py
 ----
diff --git a/documentation/asciidoc/accessories/ai-camera/images/imx500-comparison.svg b/documentation/asciidoc/accessories/ai-camera/images/imx500-comparison.svg
diff --git a/documentation/asciidoc/accessories/ai-camera/model-conversion.adoc b/documentation/asciidoc/accessories/ai-camera/model-conversion.adoc
@@ -1,48 +1,87 @@
-== Model Conversion
+== Model Deployment
 
-Sony provides tools that enable users to convert pre-existing TensorFlow or PyTorch models to run on the Raspberry Pi AI Camera. Additionally, users can also build and train entirely new models for the IMX500.
+The process of deploying a new neural network model to the Raspberry Pi AI Camera will normally consist of the following steps:
 
-=== Install the IMX500 tools package
+. A neural network model must be provided.
+. The model must be quantised and compressed so that it can be run using the resources available in the IMX500 camera.
+. The compressed model must be converted to IMX500 format.
+. Finally, the model must be packaged into a firmware file that can be loaded at runtime into the camera.
 
-First, install the necessary tools:
+The first three steps will normally be performed on a more powerful computer such as a desktop or server, whilst the final packaging step must be performed on a Raspberry Pi.
+
+=== Model Creation
+
+The creation of neural network models is beyond the scope of this guide. Existing models can be re-used, or new ones created using popular frameworks like TensorFlow or PyTorch.
+
+For more information, readers are referred to the official https://developer.aitrios.sony-semicon.com/en/raspberrypi-ai-camera[AITRIOS Developer] website.
+
+=== Quantisation and Compression
+
+Models are quantised and compressed using Sony's _Model Compression Toolkit_. This can be installed with
 
 [source,console]
 ----
-$ sudo apt install imx500-tools
+pip install model_compression_toolkit
 ----
 
-== Convert and package the model
+and information and tutorials can be found at the project's https://github.com/sony/model_optimization[GitHub page].
+
+The _Model Compression Toolkit_ will genearate a quantised model in either Keras (for TensorFlow) or ONNX (for PyTorch) format.
+
+=== Conversion
+
+First, we must install the necessary converter tools. If you are using TensorFlow, please run
+
+[source,console]
+----
+pip install imx500-converter[tf]
+----
 
-Next, run the tools to convert and package the model.
+TIP: Be careful that you have installed the same version of TensorFlow as you used to compress your model. This avoids problems where the above may install a more recent version of TensorFlow that is not compatible with your model.
 
-The following command converts a model file stored in the `<model-folder>` directory into a converted, IMX500-compatible model stored in `<converted-model-folder>`:
+or if you are using PyTorch, please run
 
 [source,console]
 ----
-$ imx500-convert.sh -i <model-folder> -o <converted-model-folder>
+pip install imx500-converter[pt]
 ----
 
+TIP: If you need to install both these packages, we strongly recommend doing so in separate Python virtual environments (for example, using `python -m venv <virtual-environment-name>`). This avoids any problems with TensorFlow and PyTorch causing conflicts with one another.
 
-Then, run the following command to package the converted model stored in the `<converted-model-folder>` directory into a package stored in `<packaged-model-folder>`.
+Next, we can convert the model. For TensorFlow, use
 
 [source,console]
 ----
-$ imx500-package.sh -i <converted-model-folder> -o <packaged-model-folder>
+imxconv-tf -i <compressed Keras model> -o <output folder>
 ----
 
-=== Prepare firmware for deployment
+and for PyTorch, use
 
-Finally, prepare the firmware for the Raspberry Pi AI Camera. This preparation swaps the Endian-ness of the byte ordering, then appends some sensor register information provided by the model conversion steps above into the firmware file.
+[source,console]
+----
+imxconv-pt -i <compressed ONNX model> -o <output folder>
+----
+
+In both cases, the output folder will be created containing, among other things, a memory usage report, plus a `packerOut.zip` file which is what we will need to copy to the Pi for the final step.
+
+Again, for more information on the model conversion process, please refer to the official https://developer.aitrios.sony-semicon.com/en/raspberrypi-ai-camera/documentation/imx500-converter[IMX500 Converter] documentation.
+
+=== Packaging
+
+The final step, which we run on a Raspberry Pi, is packaging the model into an _RPK_ file.  This _RPK_ file is then uploaded to the IMX500 camera when running the neural network model. Before proceeding, we must install the necessary tools:
+
+[source,console]
+----
+$ sudo apt install imx500-tools
+----
 
-Run the following commands to prepare the firmware into a file named `imx500_network.fpk`:
+Now we can run
 
 [source,console]
 ----
-$ objcopy -I binary -O binary --reverse-bytes=4 /<packaged-model>/network.fpk network.fpk.REVERSED
-$ ni_to_reg /<packaged-model>/network_info.txt > registers.bin
-$ cat network.fpk.REVERSED registers.bin > imx500_network.fpk
+imx500-package.sh -i <path to packerOut.zip> -o <output folder>
 ----
 
-You can then load the prepared `imx500_network.fpk` file onto your Raspberry Pi AI Camera using the helper functions described above.
+The output folder should finally contain a file `network.rpk`, the name of which is what we pass to our IMX500 camera applications.
 
-For more information about the AI Camera and the tools used to work with it, visit the https://developer.sony.com/imx500/[Sony IMX500 developer website].
+More specific instructions on all these tools, and their constraints is out of scope for this tutorial. For a more comprehensive set of instructions and further specifics on the tools used, please see the official https://developer.aitrios.sony-semicon.com/en/raspberrypi-ai-camera/documentation/imx500-packager[IMX500 Packager] documentation.