2025/04/23 Meeting Notes #179
himanshunaidu
started this conversation in
Meeting Notes
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Progress Update
Segmentation Post-Processing
Replaced existing segmentation processing with an entirely new segmentation pipeline
This pipeline does more than perform semantic segmentation on the camera frame and color each segment based on its class.
This pipeline now does the following:
a. Performs semantic segmentation on the camera frame
b. Gets the objects from each segment using Contour Detection (VNDetectContoursRequest)
c. Gets object features such as the centroid, bounding box and polygon
d. Performs Homography transformation on the previously detected objects (if present) using the previous image frame and current image frame (VNHomographicImageRegistrationRequest).
Streamlined and modularized Depth mapping and Object location calculation
Now, the object location can be calculated using the already obtained centroid for depth calculation.
Can be easily extended to using a radius around the centroid for depth calculation.
NOTE: Not sure if we want to extend it to using trimmed mean of depth values for depth calculation.
Computer Vision ML Pipeline
The new metrics that we will be getting would be much more accurate.
Other Bug Fixes
Including:
#41
#91
Doubts
Next Steps
Segmentation Post-Processing
Fix orientation issues
While the AnnotationView has been updated to be able to utilize the detected objects for mapping, there are several orientation issues (inconsistencies with the camera frame, segmentation mask, depth map, detected objects, etc.)
This is preventing a smooth usage of the detected objects for location calculation.
More context given here
Test Homography Transformation issues and fix accordingly
While Apple's VNHomographicImageRegistrationRequest seems to be working in controlled settings, it doesn't seem to be working well in real-time on the application.
Need to check how much of this issue is due to the extremely low frame rate, VNHomographicImageRegistrationRequest itself, and any other factors.
Will be creating a test application that can more thoroughly test this. Based on this we can decide on whether we can still continue using VNHomographicImageRegistrationRequest, or if we would need to implement this from scratch.
Fix Performance issues
Some of these include:
Implementing a Metal version of contour detection and comparing with VNDetectContoursRequest.
Assessing the performance of current object tracking when there are many more objects for comparison, and assessing accordingly on how to optimize it.
And more given in the Github issues.
Implement Union of Masks
Will help reduce over-segmentation errors.
Can be implemented once AnnotationView is clicked, provided we have the segmentation frames in the Deque (currently, we only store camera frames).
Computer Vision ML Pipeline
Train a new pedestrian centric model using Coco-stuff and Hand-labelled datasets
Analyze conversion issues with ESPNetv2
Get More Specific Performance Metrics
Evaluation Metrics Details:
Datasets: Cityscapes, AnnotatedData for OASIS
Metrics: mIoU (and F1), Class-specific mIoU (and F1), ROM/RUM (Old and New)
Models: BiSeNetv2 (1024x512, 512x256) on CoreML, ESPNetv2 (1024x512, 512x256)
Compare and validate old and new implementations of ROM/RUM
Fix bug created with general torch model evaluation (ESPNetv2 and BiSeNetv2)
Other Concerns
This means that when the user gets back to Content View once done with AnnotationView, we do not know where are the new frames being recorded, relative to the previously captured frame.
Thus, in the current implementation, it is still impossible to perform object tracking across captured frames.
Need to see on how we can track objects across these captured frames.
Here are the possible solutions I can think of:
a. Unfortunately wildly inefficient: Keep the camera running even after going to AnnotationView
Heavy battery impact, and very error-prone.
b. Use location and IMU sensor to detect where the new frames are being recorded relative to the previously captured frame.
Still quite error-prone. But this will probably be the way to go.
Beta Was this translation helpful? Give feedback.
All reactions