GSoC Project_8 : Refining Zero-Shot Object Segmentation by Combining Vision Foundation Models #29541

TempestBirds729804 · 2025-03-18T11:06:52Z

TempestBirds729804
Mar 18, 2025

Dear Daan, Klaas, and Samet,

你好！My name is Yan Zhang, a third-year undergraduate student majoring in Computer Science at Hefei University of Technology. I'm planning to apply for CS master's programs at U.S. universities for 2026 Fall. I’ve been particularly interested in computer vision, especially object detection and segmentation tasks. I am familiar with Python and C++ development and have experience working with computer vision frameworks such as OpenCV and PyTorch. I’ve spent some time working with CLIP-based feature extraction in past projects.

Previously, I participated as the fourth author in a national invention patent project — “A contrastive micro-expression recognition method based on text-position attention”, where we implemented a CLIP-based micro-expression recognition pipeline. I’m also involved in a pipeline inspection robot project, working on the visual environment recognition module. This work is in collaboration with the Faculty of Mechanical Engineering at my university.

The GSoC project “Refining Zero-Shot Object Segmentation by Combining Vision Foundation Models” immediately caught my attention. I find the idea of combining models like DINOv2 and SAM to build a more general and robust segmentation pipeline both exciting and meaningful. I’d love to explore this further and contribute to the OpenVINO ecosystem. 🚀

My first PR is #29515 .

Looking forward to learning and collaborating with the community!

Best regards,
Yan Zhang

kwonseungchan · 2025-03-19T13:08:28Z

kwonseungchan
Mar 19, 2025

Dear Daan , Klaas , Samet

My name is Seung Chan Kwon, and I am currently a second-year master's student at Soongsil University, pursuing studies in computer vision and Python. I am very interested in the GSoC 2025 project with OpenVINO: "Refining Zero-Shot Object Segmentation by Combining Vision Foundation Models."

I am familiar with using segmentation models, having received the Chairman’s Award in the University National Center of Excellence in Software Joint AI Competition for my project on Satellite Image Building Area Segmentation. Additionally, I have experience working with the SAM (Segment Anything Model) in my undergraduate capstone project. I also contributed to solving the SAM tutorial issue, as you can see here:
facebookresearch/segment-anything#585.

Therefore, I am very eager to contribute to this project.
I have a few questions regarding this project:

You mentioned the goal of generalizing across diverse datasets. I would like to know if this also includes working with more extreme datasets, such as the Tiny Person dataset or AI-TOD.
In order to achieve better generalization, is the focus on improving the model itself, designing better augmentation techniques, or on data-centric approaches such as dataset collection and filtering out unnecessary data?
Lastly, is there anything you would recommend I prepare or study further to better understand and contribute to this project?

Thank you for your time and consideration.

@adrianboguszewski — could you please connect me with the mentors?

1 reply

adrianboguszewski Mar 21, 2025
Collaborator

@Daankrol @samet-akcay

TempestBirds729804 · 2025-03-21T17:49:30Z

TempestBirds729804
Mar 21, 2025
Author

@Daankrol @samet-akcay

Hope you're all doing well! I want to follow up on my previous message and let you know that I've been diving into more resources related to the “Refining Zero-Shot Object Segmentation by Combining Vision Foundation Models” project. Although I haven't received a response to my first message yet, I've been spending time exploring the topic and getting a better understanding. But after thinking more about how to approach that idea, I have a few questions right now🤔:

Do I need to manually tweak these models to fit OpenVINO's IR format, or can I just convert them directly using OpenVINO?
Regarding the quantization of these models, is OpenVINO’s automatic quantization or custom quantization suitable for this kind of task? What kind of trade-offs should I expect?
When using OpenVINO, what practices do we have right now for optimizing inference speed and accuracy, particularly on resource-constrained devices like VPUs?

Also, I submitted my first PR a while ago. It addressed a Broadcast operation format issue(f32) after looking into some conformance test-related source code and running a few tests. But, the task wasn’t really related to the project I‘m focusing now. I want to take on something more relevant to the project, so It would be great if you could give me some guidance.

Looking forward to hearing from you.

0 replies

Daankrol · 2025-03-25T13:58:15Z

Daankrol
Mar 25, 2025

@TempestBirds729804 @kwonseungchan
The idea is to follow the interface/implementation of Visual Prompting in https://github.com/openvinotoolkit/model_api/tree/master
You are free to choose a method for mask refinement. It could be heuristic based, ML based or something new.
You could check out repos such as SAM, DINO, PerSAM or Matcher.
We currently have no benchmarking framework in mind, so you are free to choose a suitable approach.

2 replies

TempestBirds729804 Mar 25, 2025
Author

@Daankrol
Thanks for the confirmation.
I’ve been working on implementation plan and drafting ideas for a few days. Just wanna check: would you recommend putting together a small demo at this stage(As you said, based on my personal understanding to the project), or should I just spend the time on refining the proposal and planning?🤔

Daankrol Apr 2, 2025

I think it would be wise to focus on the proposal first and then work on your demo if you still have time.

GSoC Project_8 : Refining Zero-Shot Object Segmentation by Combining Vision Foundation Models #29541

Uh oh!

Uh oh!

TempestBirds729804 Mar 18, 2025

Replies: 3 comments · 3 replies

Uh oh!

kwonseungchan Mar 19, 2025

Uh oh!

adrianboguszewski Mar 21, 2025 Collaborator

Uh oh!

Uh oh!

TempestBirds729804 Mar 21, 2025 Author

Uh oh!

Daankrol Mar 25, 2025

Uh oh!

Uh oh!

TempestBirds729804 Mar 25, 2025 Author

Uh oh!

Daankrol Apr 2, 2025

TempestBirds729804
Mar 18, 2025

Replies: 3 comments 3 replies

kwonseungchan
Mar 19, 2025

adrianboguszewski Mar 21, 2025
Collaborator

TempestBirds729804
Mar 21, 2025
Author

Daankrol
Mar 25, 2025

TempestBirds729804 Mar 25, 2025
Author