2025/04/02 Meeting Notes #150

himanshunaidu · 2025-04-02T19:18:51Z

himanshunaidu
Apr 2, 2025
Maintainer

Discussion on the current application functionality
Including UI/UX of real-time segmentation, segmentation post-processing (watershed for instance segmentation which is currently not desired), and location estimation.
Discussion on the possible integration of on-device VLMs
The current code base for segmentation is well-modularized to be able to accommodate different models. Thus we can move on from CNNs to VLMs without much friction.
(The only part of the code base that doesn't seem very well-modularized is the location estimation logic, which can be addressed when updating the logic to make it similar to that of OASIS)
Discussion on Performance Metrics
The current pixel-wise and region-wise metrics that are implemented, are a good start. We would however, want to verify the implementations, maybe using OASIS code as reference.
The focus should now be on integrating these metrics in the training pipeline.
Right now, we can show these results on CityScapes, but we would want this to be done on our own datasets.
Discussion on Data Annotation Pipeline
Methods such as Grounded SAM might be plausible methods to make annotation work easier. Further research on this can be done once we have the performance metrics working.
Discussion on Segmentation Post-Processing
It would be prudent to implement logic similar to OASIS for object detection and tracking. Need to get the code for usage of multiple consecutive frames to process segmentation masks.

Let's not focus on monocular depth estimation for now, and instead aim for a first iteration that works well on iOS Pro devices
Let's focus on first getting the correct performance metrics for our use case, that we can refer to decide on the next steps
We really need to work on being able to convey a high-level picture on the project development roadmap, rather than just work on a set of disconnected components and discuss them separately.

Implement object detection and tracking, sidewalk segmentation etc. as a first version on iOS
Reach out to the OASIS developers regarding usage of multiple consecutive frames to process segmentation masks
Implement a basic training pipeline on CityScapes that outputs pixel-wise and region-wise performance metrics