Replies: 10 comments 7 replies
-
|
I'm dropping this paper in here as a 2nd step. https://arxiv.org/pdf/2407.04619 |
Beta Was this translation helpful? Give feedback.
-
|
I'm hijacking this thread to discussion pieces for active learning, in general we have lots of little pieces around our codebases that need to be put together and then there is opportunities for building further. I see generally four steps. I think step 1 would be inside of deepforest, and steps 2-4 are in a subpackage deepforest[label-studio], our official position is that deepforest does not depend on any annotation platform or format.
I wrote a tiny example of this for a recent (spanish) workshop |
Beta Was this translation helpful? Give feedback.
-
|
Here is a promising SAM and Grounding DINO combination for counting, but we would really use it for detect, but it could be lose localization for annotation. https://github.com/niki-amini-naieni/CountGD/, there is a docker container. https://huggingface.co/spaces/nikigoli/countgd?docker=true https://labelstud.io/guide/ml_create A user could interactively get the data annotations. |
Beta Was this translation helpful? Give feedback.
-
Is the unlabeled data same that is present in |
Beta Was this translation helpful? Give feedback.
-
|
I am facing INTERNAL SERVER ERRORS in my ML-backend when I try to do a combination of GroundingDINO + SAM from the guide. The is it better to build on the GroundingDINO example or the SAM example Or should I pick one rather than mixing bits of both? |
Beta Was this translation helpful? Give feedback.
-
|
I think you are out in front of the dev cycle. Start much much simpler. I
wouldn't touch any of those more advanced pieces for several weeks (months).
…On Tue, May 6, 2025 at 9:00 AM Nakshatra ***@***.***> wrote:
image.png (view on web)
<https://github.com/user-attachments/assets/d0dbcb2f-02b5-4f56-ac01-8a37c5322bb9>
I am facing INTERNAL SERVER ERRORS in my ML-backend when I try to do a
combination of GroundingDINO + SAM from the guide. The /setup call is
failing before I even get to the predict() step, so I think I’m mixing
two incompatible patterns.
is it better to build on the GroundingDINO example
<https://labelstud.io/tutorials/grounding_dino#Grounding-DINO-backend-integration>
or the SAM example
<https://github.com/HumanSignal/label-studio-ml-backend/tree/master/label_studio_ml/examples/grounding_sam>
Or should I pick one rather than mixing bits of both?
—
Reply to this email directly, view it on GitHub
<#978 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAJHBLHYGUZFFUUOPBFJFKD25DMCHAVCNFSM6AAAAABZJH2KGGVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTGMBVGIYTOMQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
Ben Weinstein, Ph.D.
Research Scientist
University of Florida
|
Beta Was this translation helpful? Give feedback.
-
|
definitely.
…On Wed, May 7, 2025 at 12:32 PM Nakshatra ***@***.***> wrote:
Okk, I’ll focus on a much simpler dummy backend first, I overwhelmed
myself with those examples
—
Reply to this email directly, view it on GitHub
<#978 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAJHBLFNWZNEEYS5GI7CRKT25JNUJAVCNFSM6AAAAABZJH2KGGVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTGMBWG4ZDGMI>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
Ben Weinstein, Ph.D.
Research Scientist
University of Florida
|
Beta Was this translation helpful? Give feedback.
-
|
I have successfully created a basic backend from the SAM model, however my model is not auto detecting the bird nests. I tried to annotate some sample images from the dataset to try out the auto annotating part, but it didn't work as expected. I followed this tutorial but it didn't work: https://labelstud.io/blog/get-started-using-segment-anything/ |
Beta Was this translation helpful? Give feedback.
-
|
Okay, @naxatra2, I want to review these steps, before you get bogged down with SAM or anything, that is really secondary. Let's start slowly and review these slowly. I'm glad you are excited to move forward, but getting out in front of the key aims can make everything confusing. We will definitely get there. I made a brief video outlining the key concepts and their relation to existing code we have developed. https://www.loom.com/share/4f958b46493d468ba648216596538b54?sid=dff05fcc-f8a0-4dc0-81ef-a43513462136 The overall goal is to create an active learning pipeline. That means that if a user has 10,000 images and a trained model, we use the results of that model to guide which images we should next annotate. Therefore the key aspect is not the label-studio part, which is secondary, the key aspect is a workflow that decides what images to annotate. |
Beta Was this translation helpful? Give feedback.
-
|
I have added 2 more sampling technqiues in def select_images(
preannotations: pd.DataFrame,
strategy: str,
n: int = 10,
target_labels: list[str] = None,
min_score: float = 0.3,
embeddings: dict[str, np.ndarray] = None
) -> tuple[list[str], pd.DataFrame | None]:
"""
Select images to annotate based on the strategy.
Supports:
- random
- most-detections
- target-labels
- rarest
- uncertainty (selects images closest to decision boundary)
- diversity (selects a diverse set via k-means)
- qbc (combines uncertainty and diversity selections)
"""
if preannotations is None or preannotations.empty:
return [], None
pre = preannotations[preannotations["score"] >= min_score]
chosen: list[str] = []
# other strategies are written here ....
elif strategy == "uncertainty":
# images with classification scores closest to 0.5 (most uncertain)
if "cropmodel_score" in pre.columns:
un_df = pre.copy()
un_df["uncertainty_score"] = np.abs(un_df["cropmodel_score"] - 0.5)
else:
un_df = pre.copy()
un_df["uncertainty_score"] = np.abs(un_df["score"] - 0.5)
img_scores = (
un_df.groupby("image_path")["uncertainty_score"]
.mean()
)
chosen = img_scores.nsmallest(n).index.tolist()
elif strategy == "diversity":
if embeddings is None:
raise ValueError("Embeddings are required for the 'diversity' strategy.")
unique_imgs = pre["image_path"].unique().tolist()
matrix = np.vstack([embeddings[img] for img in unique_imgs])
kmeans = KMeans(n_clusters=min(n, len(unique_imgs))).fit(matrix)
centers = kmeans.cluster_centers_
chosen = []
for c in centers:
dists = np.linalg.norm(matrix - c, axis=1)
chosen.append(unique_imgs[int(np.argmin(dists))])
elif strategy == "qbc":
# Query-By-Committee: combine uncertainty + diversity picks
uncert, _ = select_images(preannotations, "uncertainty", n=n, min_score=min_score, embeddings=embeddings)
divers, _ = select_images(preannotations, "diversity", n=n, min_score=min_score, embeddings=embeddings)
combined = list(dict.fromkeys(uncert + divers))
chosen = combined[:n]
else:
raise ValueError(f"Invalid strategy '{strategy}'")
selected_df = preannotations[preannotations["image_path"].isin(chosen)]
return chosen, selected_dfIn the uncertainty block, we first filter out all detections below the In the diversity block, we again work on the filtered set of images and require a precomputed embedding vector for each unique I have also added a third option which basically considers both of these sampling techniques and gives the best output. |
Beta Was this translation helpful? Give feedback.




Uh oh!
There was an error while loading. Please reload this page.
-
Hi @bw4sz, @henrykironde , @ethanwhite,
I hope you're all doing well! I'm Nakshatra Piplad, a 3rd-year undergraduate student at IIT Kharagpur, India.
A while ago, I reached out to the mentors about this topic (#947), but since the projects were still being revised at the time, I focus on understanding the workflow better. Now that things have taken shape, I’m really excited about the idea of developing an Active Learning Module for DeepForest (Proposal 2) and would love to discuss it further.
I've been actively engaging with the community by working on some of the
"good first issues"and contributing to discussions. This has helped me get comfortable with the codebase and understand how things work.For this proposal, I set up the BOEM repository locally and explored its active learning approach. However, I noticed that apart from the code itself, there isn’t much documentation available, which made it hard to grasp everything in detail. If there are any useful resources that could help, I’d really appreciate it! I tried researching on my own, but I still have a few doubts, which could have been resolved if there were some docs in the BOEM repo. For eg:
generate_pool_predictions()inBOEM/src/active_learning.pyrandomly samples images from the pool without accounting for variations in tree density, species, or environmental conditions. Would it make sense to adjust the sampling strategy based on biodiversity levels, so that high-diversity forests get more sampling while more uniform forests get less?Also, since I’ve already worked on some introductory issues and with the GSoC application period coming up, I’d like to put together a solid proposal and get feedback from the mentors before submission. What’s the best way to go about this?
Looking forward to your thoughts and any suggestions you might have!
Thanks and best,
Nakshatra Piplad
Beta Was this translation helpful? Give feedback.
All reactions