|
| 1 | + |
| 2 | +== Under the Hood |
| 3 | + |
| 4 | +The diagram below shows the various camera software components (in green) used during our imaging/inference use case with the Raspberry Pi AI Camera module hardware (in red). |
| 5 | + |
| 6 | +image::images/imx500-block-diagram.svg[IMX500 block diagram] |
| 7 | + |
| 8 | +On startup, the IMX500 sensor modules gets loaded with firmware to run a particular neural network model. On streaming, the IMX500 generates an image stream together with an inference stream. This inference stream holds the inputs and outputs of the neural network model, also known as input/output tensors. |
| 9 | + |
| 10 | +=== Device drivers |
| 11 | + |
| 12 | +At the lowest level, the camera module is configured over the I2C bus from the IMX500 sensor kernel driver. The CSI2 driver (`CFE` on Pi 5, `Unicam` on all other Pi platforms) sets up the receiver to write the image data stream into a frame buffer, together with the embedded data and inference data streams into another buffer in memory. |
| 13 | + |
| 14 | +The firmware files are also transferred over the I2C bus wires with either the standard I2C protocol, or a custom high speed protocol in the case of Pi 5. This is handled through the RP2040 SPI driver in the kernel. The RP2040 microcontroller is responsible for management of the firmware transfer operation on the camera module. It bridges the I2C transfers from the kernel to the IMX500 via a SPI bus. The RP2040 also caches firmware files on the on-board flash chip for fast upload to the IMX500, avoiding the need to transfer the entire firmware blob over the I2C bus. |
| 15 | + |
| 16 | +=== libcamera |
| 17 | + |
| 18 | +Once `libcamera` dequeues the image and inference data buffers from the kernel, the IMX500 specific `cam-helper` library (part of the Raspberry Pi IPA within libcamera), parses the inference buffer to get the input/output tensors. These tensors as packaged as Raspberry Pi vendor specific https://libcamera.org/api-html/namespacelibcamera_1_1controls.html[`libcamera Controls`] to be returned out to the application for consumption. The following controls are returned: |
| 19 | + |
| 20 | +[%header,cols="a,a"] |
| 21 | +|=== |
| 22 | +| Control |
| 23 | +| Description |
| 24 | + |
| 25 | +| `CnnOutputTensor` |
| 26 | +| Floating point array storing the output tensor. |
| 27 | + |
| 28 | +| `CnnInputTensor` |
| 29 | +| Floating point array storing the input tensor. |
| 30 | + |
| 31 | +| `CnnOutputTensorInfo` |
| 32 | +| Network specific parameters describing the output tensors structure: |
| 33 | + |
| 34 | +[source,c] |
| 35 | +---- |
| 36 | +struct OutputTensorInfo { |
| 37 | + uint32_t tensorDataNum; |
| 38 | + uint32_t numDimensions; |
| 39 | + uint16_t size[MaxNumDimensions]; |
| 40 | +}; |
| 41 | +
|
| 42 | +struct CnnOutputTensorInfo { |
| 43 | + char networkName[NetworkNameLen]; |
| 44 | + uint32_t numTensors; |
| 45 | + OutputTensorInfo info[MaxNumTensors]; |
| 46 | +}; |
| 47 | +---- |
| 48 | + |
| 49 | +| `CnnInputTensorInfo` |
| 50 | +| Network specific parameters describing the input tensors structure: |
| 51 | + |
| 52 | +[source,c] |
| 53 | +---- |
| 54 | +struct CnnInputTensorInfo { |
| 55 | + char networkName[NetworkNameLen]; |
| 56 | + uint32_t width; |
| 57 | + uint32_t height; |
| 58 | + uint32_t numChannels; |
| 59 | +}; |
| 60 | +---- |
| 61 | + |
| 62 | +|=== |
| 63 | + |
| 64 | +=== rpicam-apps |
| 65 | + |
| 66 | +`rpicam-apps` provides an IMX500 postprocessing stage base class that implements helpers for IMX500 postprocessing stages - https://github.com/raspberrypi/rpicam-apps/blob/post_processing_stages/imx500_post_processing_stage.hpp[`IMX500PostProcessingStage`]. This base class can be used to derive a new postprocessing stage for any neural network model running on the IMX500. For example, in https://github.com/raspberrypi/rpicam-apps/blob/post_processing_stages/imx500_mobilenet_ssd.cpp[`imx500_mobilenet_ssd.cpp`], the following derived class is instanciated (not complete implementation): |
| 67 | + |
| 68 | +[source,cpp] |
| 69 | +---- |
| 70 | +class ObjectInference : public IMX500PostProcessingStage |
| 71 | +{ |
| 72 | +public: |
| 73 | + ObjectInference(RPiCamApp *app) : IMX500PostProcessingStage(app) {} |
| 74 | +
|
| 75 | + char const *Name() const override; |
| 76 | +
|
| 77 | + void Read(boost::property_tree::ptree const ¶ms) override; |
| 78 | +
|
| 79 | + void Configure() override; |
| 80 | +
|
| 81 | + bool Process(CompletedRequestPtr &completed_request) override; |
| 82 | +}; |
| 83 | +---- |
| 84 | + |
| 85 | +On every frame received by the application, the `Process()` function is called (`ObjectInference::Process()` in the above case). In this function, you can extract the output tensor for futher processing and/or analysis by the stage: |
| 86 | + |
| 87 | +[source,cpp] |
| 88 | +---- |
| 89 | +auto output = completed_request->metadata.get(controls::rpi::CnnOutputTensor); |
| 90 | +if (!output) |
| 91 | +{ |
| 92 | + LOG_ERROR("No output tensor found in metadata!"); |
| 93 | + return false; |
| 94 | +} |
| 95 | +
|
| 96 | +std::vector<float> output_tensor(output->data(), output->data() + output->size()); |
| 97 | +---- |
| 98 | + |
| 99 | +Once this is completed, the final results can either be visualised or saved in metadata and consumed by either another downstream stage, or the top level application itself. In the object inference case: |
| 100 | + |
| 101 | +[source,cpp] |
| 102 | +---- |
| 103 | +if (objects.size()) |
| 104 | + completed_request->post_process_metadata.Set("object_detect.results", objects); |
| 105 | +---- |
| 106 | + |
| 107 | +The `object_detect_draw_cv` postprocessing stage running downstream fetches these results from the metadata and draws the bounding boxes onto the image in the `ObjectDetectDrawCvStage::Process()` function: |
| 108 | + |
| 109 | +[source,cpp] |
| 110 | +---- |
| 111 | +std::vector<Detection> detections; |
| 112 | +completed_request->post_process_metadata.Get("object_detect.results", detections); |
| 113 | +---- |
| 114 | + |
| 115 | +A full list of helper functions provided by `IMX500PostProcessingStage` is listed below: |
| 116 | + |
| 117 | +[%header,cols="a,a"] |
| 118 | +|=== |
| 119 | +| Function |
| 120 | +| Description |
| 121 | + |
| 122 | +| `Read()` |
| 123 | +| Typically called from `<Derived Class>::Read()`, this function reads the config parameters for input tensor parsing and saving. |
| 124 | + |
| 125 | +This function also reads the neural network model file string (`"network_file"`) and sets up the firmware to be loaded on camera open. |
| 126 | + |
| 127 | +| `Process()` |
| 128 | +| Typically called from `<Derived Class>::Process()` this function processes and saves the input tensor to a file if required by the JSON config file. |
| 129 | + |
| 130 | +| `SetInferenceRoiAbs()` |
| 131 | +| Sets an absolute region of interest (ROI) crop rectangle on the sensor image to use for inferencing on the IMX500. |
| 132 | + |
| 133 | +| `SetInferenceRoiAuto()` |
| 134 | +| Automatically calculates region of interest (ROI) crop rectangle on the sensor image to preserve the input tensor aspect ratio for a given neural network. |
| 135 | + |
| 136 | +| `ShowFwProgressBar()` |
| 137 | +| Displays a progress bar on the console showing the progress of the neural network firmware upload to the IMX500. |
| 138 | + |
| 139 | +| `ConvertInferenceCoordinates()` |
| 140 | +| Converts from the input tensor coordinate space to the final ISP output image space. |
| 141 | + |
| 142 | +There are a number of scaling/cropping/translation operations occurring from the original sensor image to the fully processed ISP output image. This function converts coordinates provided by the output tensor to the equivalent coordinates after performing these operations. |
| 143 | + |
| 144 | +|=== |
| 145 | + |
| 146 | +=== Picamera2 |
| 147 | + |
| 148 | +IMX500 integration in `Picamera2` is very similar to what is available in `rpicam-apps`. `Picamera2` has an IMX500 helper class that provides the same functionality as the `rpicam-apps` `IMX500PostProcessingStage` base class. This can be imported to any python script with: |
| 149 | + |
| 150 | +[source,python] |
| 151 | +---- |
| 152 | +from picamera2.devices.imx500 import IMX500 |
| 153 | +
|
| 154 | +# This must be called before instantiation of Picamera2 |
| 155 | +imx500 = IMX500(model_file) |
| 156 | +---- |
| 157 | + |
| 158 | +To retrieve the output tensors, you fetch them from the controls and use it for for futher processing and/or analysis by the python script. |
| 159 | + |
| 160 | +For example, in an object inference use case such as https://github.com/raspberrypi/picamera2/tree/main/examples/imx500/imx500_object_detection_demo.py[imx500_object_detection_demo.py], the object bounding boxes and confidence values are extracted in `parse_detections()` and draw the boxes on the image in `draw_detections()`: |
| 161 | + |
| 162 | +[source,python] |
| 163 | +---- |
| 164 | +class Detection: |
| 165 | + def __init__(self, coords, category, conf, metadata): |
| 166 | + """Create a Detection object, recording the bounding box, category and confidence.""" |
| 167 | + self.category = category |
| 168 | + self.conf = conf |
| 169 | + obj_scaled = imx500.convert_inference_coords(coords, metadata, picam2) |
| 170 | + self.box = (obj_scaled.x, obj_scaled.y, obj_scaled.width, obj_scaled.height) |
| 171 | +
|
| 172 | +def draw_detections(request, detections, stream="main"): |
| 173 | + """Draw the detections for this request onto the ISP output.""" |
| 174 | + labels = get_labels() |
| 175 | + with MappedArray(request, stream) as m: |
| 176 | + for detection in detections: |
| 177 | + x, y, w, h = detection.box |
| 178 | + label = f"{labels[int(detection.category)]} ({detection.conf:.2f})" |
| 179 | + cv2.putText(m.array, label, (x + 5, y + 15), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 255), 1) |
| 180 | + cv2.rectangle(m.array, (x, y), (x + w, y + h), (0, 0, 255, 0)) |
| 181 | + if args.preserve_aspect_ratio: |
| 182 | + b = imx500.get_roi_scaled(request) |
| 183 | + cv2.putText(m.array, "ROI", (b.x + 5, b.y + 15), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (255, 0, 0), 1) |
| 184 | + cv2.rectangle(m.array, (b.x, b.y), (b.x + b.width, b.y + b.height), (255, 0, 0, 0)) |
| 185 | +
|
| 186 | +def parse_detections(request, stream='main'): |
| 187 | + """Parse the output tensor into a number of detected objects, scaled to the ISP out.""" |
| 188 | + outputs = imx500.get_outputs(request.get_metadata()) |
| 189 | + boxes, scores, classes = outputs[0][0], outputs[1][0], outputs[2][0] |
| 190 | + detections = [ Detection(box, category, score, metadata) |
| 191 | + for box, score, category in zip(boxes, scores, classes) if score > threshold] |
| 192 | + draw_detections(request, detections, stream) |
| 193 | +---- |
| 194 | + |
| 195 | +Note that there is no additional hysteresis or temporal filtering applied to the output like in the `rpicam-apps` example. However this should be easy enough to add to this example if needed. |
| 196 | + |
| 197 | +The IMX500 class in `Picamera2` provides the following helper functions: |
| 198 | + |
| 199 | +[%header,cols="a,a"] |
| 200 | +|=== |
| 201 | +| Function |
| 202 | +| Description |
| 203 | + |
| 204 | +| `IMX500.get_full_sensor_resolution()` |
| 205 | +| Return the full sensor resolution of the IMX500. |
| 206 | + |
| 207 | +| `IMX500.config()` |
| 208 | +| Returns a dictionary of the neural network configuration. |
| 209 | + |
| 210 | +| `IMX500.convert_inference_coords()` |
| 211 | +| Converts from the input tensor coordinate space to the final ISP output image space. |
| 212 | + |
| 213 | +There are a number of scaling/cropping/translation operations occurring from the original sensor image to the fully processed ISP output image. This function converts coordinates provided by the output tensor to the equivalent coordinates after performing these operations. |
| 214 | + |
| 215 | +| `IMX500.show_network_fw_progress_bar()` |
| 216 | +| Displays a progress bar on the console showing the progress of the neural network firmware upload to the IMX500. |
| 217 | + |
| 218 | +| `IMX500.get_roi_scaled()` |
| 219 | +| Returns the region of interest (ROI) in the ISP output coordinate space. |
| 220 | + |
| 221 | +| `IMX500.get_isp_output_size()` |
| 222 | +| Returns the ISP output image size. |
| 223 | + |
| 224 | +| `IMX5000.get_input_w_h()` |
| 225 | +| Returns the input tensor size based on the neural network model used. |
| 226 | + |
| 227 | +| `IMX500.get_outputs()` |
| 228 | +| Returns the output tensors for a given frame request. |
| 229 | + |
| 230 | +| `IMX500.get_output_shapes()` |
| 231 | +| Returns the shape of the output tensors for the neural network model used. |
| 232 | + |
| 233 | +| `IMX500.set_inference_roi_abs()` |
| 234 | +| Sets an absolute region of interest (ROI) crop rectangle on the sensor image to use for inferencing on the IMX500. |
| 235 | + |
| 236 | +| `IMX500.set_inference_aspect_ratio()` |
| 237 | +| Automatically calculates region of interest (ROI) crop rectangle on the sensor image to preserve the input tensor aspect ratio for a given neural network. |
| 238 | + |
| 239 | +| `IMX500.get_kpi_info()` |
| 240 | +| Returns the frame level performance indicators logged by the IMX500. |
| 241 | + |
| 242 | +|=== |
0 commit comments