GSoC 2025 Proposal 4: DeepForest Vision Agent connection with LandingAI #973

Samia35-2973 · 2025-03-13T23:15:55Z

Samia35-2973
Mar 13, 2025

Hi Henry, Ben, and Ethan,

I am Samia Haque, a recent graduate with a B.Sc. in Software Engineering (major in Data Science). I have been actively working on integrating computer vision with post-hoc interpretability techniques and LLM-based automation, exploring how these technologies can enhance decision-making in various domains.

My Skills and Experience

Object Detection & Post-hoc AI Explainability:
I am currently working on multi-stage traffic anomaly detection, where I utilized YOLOv9 and YOLOv10 to detect traffic congested regions from images. I am also working on its post-hoc interpretability and LLM integration.
LLM Integration & AI Automation:
I have experience in prompt engineering, dataset curation, and human feedback-driven improvements for LLMs. While I am not deeply involved in LLM fine-tuning or RLHF at a model-training level, my work in structuring datasets and evaluating LLM outputs aligns with Vision Agent’s AI-driven image analysis pipeline.

I am excited to contribute to GSoC 2025, particularly through Proposal 4: DeepForest Vision Agent connection with LandingAI because it combines object detection with AI-driven automation. The idea of integrating a Vision Agent for automated labeling and active learning complements my ongoing exploration of AI-driven decision-making systems. Currently, I am studying the DeepForest codebase, documentation, and related research papers to better understand its architecture.

I would appreciate guidance on recommended resources or discussions for better understanding Vision Agent’s integration.

Thanks,
Samia

Contact Details:

GitHub: Samia35-2973
LinkedIn: Samia Haque
Email: samiatisha35@gmail.com
Discord Username: samia_tisha

henrykironde · 2025-03-14T05:09:49Z

henrykironde
Mar 14, 2025
Maintainer

Hi @Samia35-2973, Thank you for reaching out and for your interest in contributing to DeepForest through GSoC 2025!

https://github.com/landing-ai/vision-agent?tab=readme-ov-file

Rationale:

Text-based queries of images for labeling and organization.

Approach:

Create configuration for DeepForest users to register LLM keys
Object detection and segmentation workflows
Develop a user-friendly interface for selecting new images based on agent responses
Evaluate the effectiveness of the active learning module in improving model accuracy.
Expected Outcomes:

An agent-interaction module for DeepForest using VisionAgent. Documentation on using the agent module.

Base on the project details as above,
To get started, I recommend thoroughly reviewing LandingAI’s Vision Agent: LandingAI Vision Agent GitHub

Here is an example of the demo from the landing-ai service given a text query of hat shows the detections as shown below.

Try out all DeepForest models and their detection classes to understand how they could be leveraged for a user-friendly interface that selects new images based on agent responses.

Think about how we can design a user-friendly interface that allows users to input text-based queries, load a model, and detect objects in images based on the query.

1 reply

Samia35-2973 Mar 14, 2025
Author

Thanks for this helpful guide @henrykironde. I'll start exploring LandingAI’s Vision Agent and the DeepForest model testing ASAP. I'll keep you updated as I make progress.

bw4sz · 2025-03-17T21:35:57Z

bw4sz
Mar 17, 2025
Maintainer

Thanks for your interest. I will say that this is more exploratory than the average GSOC proposal, but I think has great potential. There are huge amounts of historical imagery that were annotated in ways that prevent machine learning development, things like place markers and dots over the images. I think connecting the LLMs and Deepforest together to generate code or perform agent-like tasks has alot of potential. Here is a sample idea to play with, i'll post an image when the authors let me.

I think one place that LLMs can start to play a role is in resurrecting historically annotated data. There are huge datasets from prior research efforts pre2020 that were not annotated with any goal of machine learning development. Consider this image in which a non-profit directly 'dotted' the image, literally overtop the image pixel, obscuring the target classes. We have the original image, the counts, and a 'screenshot' of the final dotted image. I suspect that an object detector (like our bird detector) plus a LLM may be able to recover the positions of these annotations and convert them into COCO like annotations.

1 reply

Samia35-2973 Mar 19, 2025
Author

Thanks @bw4sz, for sharing this idea. I find the challenge of recovering annotation positions and converting them into COCO format from historically annotated data really interesting. I'm currently exploring the Vision Agent, working with images and queries, and I would really appreciate the sample image when it becomes available so I can experiment with this idea. Are there any specific datasets or past research papers you'd recommend reviewing for similar challenges? I’m also curious about any potential challenges you foresee in aligning historical annotations with modern ML-compatible formats.

bw4sz · 2025-03-20T17:45:35Z

bw4sz
Mar 20, 2025
Maintainer

This particular case is just an example, but yes I will pass you an image. At a higher level, we want some connection between DeepForest predictions, images, and LLMs with API keys. It may be independent of vision agent, the project name is somewhat wrong, it could be more like "Connecting ecological computer vision with large language models".Here is a different example from today, a user has 100,000 images, and runs each image through the bird detector. We want to then ask a LLM, like claude or chatgpt, or vision agent, to describe the relationships in the image. I am imagining the user provides a API key and we engineer the infrastructure to apply it to images and detections.

13 replies

Samia35-2973 Apr 3, 2025
Author

I’m trying to upload my proposal as a draft to the GSoC dashboard, and I need to specify the project size. Could you please let me know what I should input there?

ethanwhite Apr 4, 2025
Maintainer

You can choose the size that matches your availability to do the work. If you have the time to conduct a longer project it's generally a good idea to propose a longer project.

Samia35-2973 Apr 4, 2025
Author

Thanks, Ethan! I chose the larger project size and have also uploaded the draft to the GSoC dashboard.

bw4sz May 14, 2025
Maintainer

No preferences on specific LLM agent, the world is moving so quickly that its nearly impossible to keep track of the options and how they connect it. Choose one that serves as a reasonable starting place and we can innovate from there.

Samia35-2973 May 15, 2025
Author

I see. Then, I’ll begin with Gemini as the initial LLM agent since I’ve already built a prototype with it. As the project progresses, I’ll also explore other agents, especially LandingAI’s Vision-Agent to evaluate their fit.

Samia35-2973 · 2025-07-03T15:59:30Z

Samia35-2973
Jul 3, 2025
Author

Hi. I've added a basic integration of the Gemini model with the DeepForest tool in this pull request: weecology/deepforest-agent#2. The goal is to test whether Gemini can effectively interact with and call DeepForest tools. I've also included the current workflow diagram in the README file. Could you please run and test it out?

2 replies

bw4sz Jul 3, 2025
Maintainer

Very exciting. On my to do list. Great stuff so far!

Samia35-2973 Jul 8, 2025
Author

I just made some changes on this weecology/deepforest-agent#2 pull request. Now it seems to be working better than before. I removed the predict_image method in favor of always using predict_tile along with the extension and return_plot parameters. I also fixed the original image handling by appending it properly in messages for Gemini, and improved the system prompt for better responses. The Workflow diagram is changed accordingly in the README file. Please pull the latest changes and run it again.

GSoC 2025 Proposal 4: DeepForest Vision Agent connection with LandingAI #973

Uh oh!

Samia35-2973 Mar 13, 2025

My Skills and Experience

Replies: 4 comments · 17 replies

Uh oh!

henrykironde Mar 14, 2025 Maintainer

Uh oh!

Samia35-2973 Mar 14, 2025 Author

Uh oh!

Uh oh!

bw4sz Mar 17, 2025 Maintainer

Uh oh!

Samia35-2973 Mar 19, 2025 Author

Uh oh!

bw4sz Mar 20, 2025 Maintainer

Uh oh!

Samia35-2973 Apr 3, 2025 Author

Uh oh!

ethanwhite Apr 4, 2025 Maintainer

Uh oh!

Samia35-2973 Apr 4, 2025 Author

Uh oh!

bw4sz May 14, 2025 Maintainer

Uh oh!

Samia35-2973 May 15, 2025 Author

Uh oh!

Samia35-2973 Jul 3, 2025 Author

Uh oh!

bw4sz Jul 3, 2025 Maintainer

Uh oh!

Samia35-2973 Jul 8, 2025 Author

Samia35-2973
Mar 13, 2025

Replies: 4 comments 17 replies

henrykironde
Mar 14, 2025
Maintainer

Samia35-2973 Mar 14, 2025
Author

bw4sz
Mar 17, 2025
Maintainer

Samia35-2973 Mar 19, 2025
Author

bw4sz
Mar 20, 2025
Maintainer

Samia35-2973 Apr 3, 2025
Author

ethanwhite Apr 4, 2025
Maintainer

Samia35-2973 Apr 4, 2025
Author

bw4sz May 14, 2025
Maintainer

Samia35-2973 May 15, 2025
Author

Samia35-2973
Jul 3, 2025
Author

bw4sz Jul 3, 2025
Maintainer

Samia35-2973 Jul 8, 2025
Author