2025/04/23 Meeting Notes #179

himanshunaidu · 2025-04-22T21:20:32Z

himanshunaidu
Apr 22, 2025
Maintainer

Replaced existing segmentation processing with an entirely new segmentation pipeline
This pipeline does more than perform semantic segmentation on the camera frame and color each segment based on its class.
This pipeline now does the following:
a. Performs semantic segmentation on the camera frame
b. Gets the objects from each segment using Contour Detection (VNDetectContoursRequest)
c. Gets object features such as the centroid, bounding box and polygon
d. Performs Homography transformation on the previously detected objects (if present) using the previous image frame and current image frame (VNHomographicImageRegistrationRequest).
Streamlined and modularized Depth mapping and Object location calculation
Now, the object location can be calculated using the already obtained centroid for depth calculation.
Can be easily extended to using a radius around the centroid for depth calculation.
NOTE: Not sure if we want to extend it to using trimmed mean of depth values for depth calculation.

Fixed some evaluation issues in the test scripts
The new metrics that we will be getting would be much more accurate.

Including:
#41
#91

Fix orientation issues
While the AnnotationView has been updated to be able to utilize the detected objects for mapping, there are several orientation issues (inconsistencies with the camera frame, segmentation mask, depth map, detected objects, etc.)
This is preventing a smooth usage of the detected objects for location calculation.
More context given here
Test Homography Transformation issues and fix accordingly
While Apple's VNHomographicImageRegistrationRequest seems to be working in controlled settings, it doesn't seem to be working well in real-time on the application.
Need to check how much of this issue is due to the extremely low frame rate, VNHomographicImageRegistrationRequest itself, and any other factors.
Will be creating a test application that can more thoroughly test this. Based on this we can decide on whether we can still continue using VNHomographicImageRegistrationRequest, or if we would need to implement this from scratch.
Fix Performance issues
Some of these include:
Implementing a Metal version of contour detection and comparing with VNDetectContoursRequest.
Assessing the performance of current object tracking when there are many more objects for comparison, and assessing accordingly on how to optimize it.
And more given in the Github issues.
Implement Union of Masks
Will help reduce over-segmentation errors.
Can be implemented once AnnotationView is clicked, provided we have the segmentation frames in the Deque (currently, we only store camera frames).

Train a new pedestrian centric model using Coco-stuff and Hand-labelled datasets
Analyze conversion issues with ESPNetv2
Get More Specific Performance Metrics
Evaluation Metrics Details:
Datasets: Cityscapes, AnnotatedData for OASIS
Metrics: mIoU (and F1), Class-specific mIoU (and F1), ROM/RUM (Old and New)
Models: BiSeNetv2 (1024x512, 512x256) on CoreML, ESPNetv2 (1024x512, 512x256)
Compare and validate old and new implementations of ROM/RUM
Fix bug created with general torch model evaluation (ESPNetv2 and BiSeNetv2)