Flock and group detector #1312

bw4sz · 2026-02-18T21:27:01Z

bw4sz
Feb 18, 2026
Maintainer

We often see images with sets of individuals from the same species. The current crop model approach only includes information from each detection, which makes it hard to have inference from a spatial neighborhood. Even when you use a multi-class detection algorithm like Retinanet or Detr, it's not clear if we are structuring enough information to explicitly include predictions from co-occurring objects in the scene. We want some way of more explicitly and formally structuring these detections. I brainstormed a bunch of ideas and think they could be a nice contribution. We could create a dataset of groups of animals or flocks.

Graph Neural Networks (GNNs) over detections
After your base detector runs, treat each bounding box as a node in a graph. Connect nodes spatially (within some distance threshold) or by visual similarity. A GNN then passes messages between nodes, allowing each detection to "see" its neighbors before making a final prediction. This is probably the most CS approach and might be overkill for most users. @jveitchmichaelis i've read DETR kinda does this?
Attention-based contextual re-scoring
Similar idea, but simpler: extract a feature vector per detection (ROI-pooled features or embedding), then run a small transformer over the set of detections in the image. Each box attends to all others before a final classification head. This is lightweight and plugs cleanly onto any existing detector. Think of it like a "detection-level ViT."
Clustering + consensus voting
Cluster detections spatially (DBSCAN?), then within each cluster, pool the softmax outputs and vote. Majority class wins, or you weight votes by confidence. I've not done this with high dimensional data.

I think we would something posthoc that could sit on top of a DeepForest model and has a couple threshold parameters that users could tune by hand to serve as a 'flock_detector' or 'group_smoother'.

vickysharma-prog · 2026-02-21T15:17:18Z

vickysharma-prog
Feb 21, 2026

Interesting problem! A few thoughts:

The attention-based approach seems like a nice middle ground - lightweight enough to be practical but still captures spatial context. Could this work as a simple post-processing module that takes DeepForest predictions + original image features and outputs refined predictions?
For the clustering approach, one thing to consider - DBSCAN might struggle with varying flock densities in the same image. Maybe hierarchical clustering (HDBSCAN) could handle that better?
Quick question: are we looking to just improve classification accuracy within groups, or also explicitly output "this is a flock of N birds" as a detection itself?
Would love to explore this further!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flock and group detector #1312

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Flock and group detector #1312

Uh oh!

bw4sz Feb 18, 2026 Maintainer

Replies: 1 comment

Uh oh!

vickysharma-prog Feb 21, 2026

bw4sz
Feb 18, 2026
Maintainer

vickysharma-prog
Feb 21, 2026