Skip to content

Commit 6885871

Browse files
Merge pull request #1889 from roboflow/docs-fusion
Docs fusion
2 parents 7180f56 + fd97276 commit 6885871

File tree

5 files changed

+315
-73
lines changed
  • inference/core/workflows/core_steps/fusion

5 files changed

+315
-73
lines changed

inference/core/workflows/core_steps/fusion/buffer/v1.py

Lines changed: 62 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -16,11 +16,59 @@
1616
)
1717

1818
LONG_DESCRIPTION = """
19-
Returns an array of the last `length` values passed to it. The newest
20-
elements are added to the beginning of the array.
19+
Maintain a sliding window buffer of the last N values by storing recent inputs in a FIFO (First-In-First-Out) queue, with newest elements added to the beginning and oldest elements automatically removed when the buffer exceeds the specified length, enabling temporal data collection, frame history tracking, batch processing preparation, and sliding window analysis workflows.
2120
22-
Useful for keeping a sliding window of images or detections for
23-
later processing, visualization, or comparison.
21+
## How This Block Works
22+
23+
This block maintains a rolling buffer that stores the most recent values passed to it, creating a sliding window of data over time. The block:
24+
25+
1. Receives input data of any type (images, detections, values, etc.) and configuration parameters (buffer length and padding option)
26+
2. Maintains an internal buffer that persists across workflow executions:
27+
- Buffer is initialized as an empty list when the block is first created
28+
- Buffer state persists for the lifetime of the workflow execution
29+
- Each buffer block instance maintains its own separate buffer
30+
3. Adds new data to the buffer:
31+
- Inserts the newest value at the beginning (index 0) of the buffer array
32+
- Most recent values appear first in the buffer
33+
- Older values are shifted to later positions in the array
34+
4. Manages buffer size:
35+
- When buffer length exceeds the specified `length` parameter, removes the oldest elements
36+
- Keeps only the most recent `length` values
37+
- Automatically maintains the sliding window size
38+
5. Applies optional padding:
39+
- If `pad` is True: Fills the buffer with `None` values until it reaches exactly `length` elements
40+
- Ensures consistent buffer size even when fewer than `length` values have been received
41+
- If `pad` is False: Buffer size grows from 0 to `length` as values are added, then stays at `length`
42+
6. Returns the buffered array:
43+
- Outputs a list containing the buffered values in order (newest first)
44+
- List length equals `length` (if padding enabled) or current buffer size (if padding disabled)
45+
- Values are ordered from most recent (index 0) to oldest (last index)
46+
47+
The buffer implements a sliding window pattern where new data enters at the front and old data exits at the back when capacity is reached. This creates a temporal history of recent values, useful for operations that need to look back at previous frames, detections, or measurements. The buffer works with any data type, making it flexible for images, detections, numeric values, or other workflow outputs.
48+
49+
## Common Use Cases
50+
51+
- **Frame History Tracking**: Maintain a history of recent video frames for temporal analysis (e.g., track frame sequences, maintain recent image history, collect frames for comparison), enabling temporal frame analysis workflows
52+
- **Detection History**: Buffer recent detections for trend analysis or comparison (e.g., track detection changes over time, compare current vs previous detections, analyze detection patterns), enabling detection history workflows
53+
- **Batch Processing Preparation**: Collect multiple values before processing them together (e.g., batch process recent images, aggregate multiple detections, prepare data for batch operations), enabling batch processing workflows
54+
- **Sliding Window Analysis**: Perform analysis on a rolling window of data (e.g., analyze trends over recent frames, calculate moving averages, detect changes in sequences), enabling sliding window analysis workflows
55+
- **Visualization Sequences**: Maintain recent data for animation or sequence visualization (e.g., create frame sequences, visualize temporal changes, display recent history), enabling temporal visualization workflows
56+
- **Temporal Comparison**: Compare current values with recent historical values (e.g., compare current frame with previous frames, detect changes over time, analyze temporal patterns), enabling temporal comparison workflows
57+
58+
## Connecting to Other Blocks
59+
60+
This block receives data of any type and produces a buffered output array:
61+
62+
- **After any block** that produces values to buffer (e.g., buffer images from image sources, buffer detections from detection models, buffer values from analytics blocks), enabling data buffering workflows
63+
- **Before blocks that process arrays** to provide batched or historical data (e.g., process buffered images, analyze detection arrays, work with value sequences), enabling array processing workflows
64+
- **Before visualization blocks** to display sequences or temporal data (e.g., visualize frame sequences, display detection history, show temporal patterns), enabling temporal visualization workflows
65+
- **Before analysis blocks** that require historical data (e.g., analyze trends over time, compare current vs historical, process temporal sequences), enabling temporal analysis workflows
66+
- **Before aggregation blocks** to provide multiple values for aggregation (e.g., aggregate buffered values, process multiple detections, combine recent data), enabling aggregation workflows
67+
- **In temporal processing pipelines** where maintaining recent history is required (e.g., track changes over time, maintain frame sequences, collect data for temporal analysis), enabling temporal processing workflows
68+
69+
## Requirements
70+
71+
This block works with any data type (images, detections, values, etc.). The buffer maintains state across workflow executions within the same workflow instance. The `length` parameter determines the maximum number of values to keep in the buffer. When `pad` is enabled, the buffer will always return exactly `length` elements (padded with `None` if needed). When `pad` is disabled, the buffer grows from 0 to `length` elements as values are added, then maintains `length` elements by removing oldest values. The buffer persists for the lifetime of the workflow execution and resets when the workflow is restarted.
2472
"""
2573

2674
SHORT_DESCRIPTION = "Returns an array of the last `length` values passed to it."
@@ -45,17 +93,21 @@ class BlockManifest(WorkflowBlockManifest):
4593
data: Selector(
4694
kind=[WILDCARD_KIND, LIST_OF_VALUES_KIND, IMAGE_KIND],
4795
) = Field(
48-
description="Reference to step outputs at depth level n to be concatenated and moved into level n-1.",
49-
examples=["$steps.visualization"],
96+
description="Input data of any type to add to the buffer. Can be images, detections, values, or any other workflow output. Newest values are added to the beginning of the buffer array. The buffer maintains a sliding window of the most recent values.",
97+
examples=[
98+
"$steps.visualization",
99+
"$steps.object_detection_model.predictions",
100+
"$steps.image",
101+
],
50102
)
51103
length: int = Field(
52-
description="The number of elements to keep in the buffer. Older elements will be removed.",
53-
examples=[5],
104+
description="Maximum number of elements to keep in the buffer. When the buffer exceeds this length, the oldest elements are automatically removed. Determines the size of the sliding window. Must be greater than 0. Typical values range from 2-10 for frame sequences, or higher for longer histories.",
105+
examples=[5, 10, 3],
54106
)
55107
pad: bool = Field(
56-
description="If True, the end of the buffer will be padded with `None` values so its size is always exactly `length`.",
108+
description="Enable padding to maintain consistent buffer size. If True, the buffer is padded with `None` values until it reaches exactly `length` elements, ensuring the output always has `length` items even when fewer values have been received. If False, the buffer grows from 0 to `length` as values are added, then maintains `length` by removing oldest values. Use padding when downstream blocks require a fixed-size array.",
57109
default=False,
58-
examples=[True],
110+
examples=[True, False],
59111
)
60112

61113
@classmethod

inference/core/workflows/core_steps/fusion/detections_classes_replacement/v1.py

Lines changed: 80 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -31,12 +31,73 @@
3131
)
3232

3333
LONG_DESCRIPTION = """
34-
Combine results of detection model with classification results performed separately for
35-
each and every bounding box.
34+
Replace class labels of detection bounding boxes with classes predicted by a classification model applied to cropped regions, combining generic detection results with specialized classification predictions to enable two-stage detection workflows, fine-grained classification, and class refinement workflows where generic detections are refined with specific class labels from specialized classifiers.
3635
37-
Bounding boxes without top class predicted by classification model are discarded,
38-
for multi-label classification results, most confident label is taken as bounding box
39-
class.
36+
## How This Block Works
37+
38+
This block combines results from a detection model (with bounding boxes and generic classes) with classification predictions (from a specialized classifier applied to cropped regions) to replace generic class labels with specific ones. The block:
39+
40+
1. Receives two inputs with different dimensionality levels:
41+
- `object_detection_predictions`: Detection results (dimensionality level 1) containing bounding boxes with generic classes (e.g., "dog", "person", "vehicle")
42+
- `classification_predictions`: Classification results (dimensionality level 2) from a classifier applied to cropped regions of each detection (e.g., "Golden Retriever", "Labrador" for dog detections)
43+
2. Matches classifications to detections:
44+
- Uses `PARENT_ID_KEY` (detection_id) in classification predictions to link each classification result to its source detection
45+
- Creates a mapping from detection IDs to classification results
46+
3. Extracts leading class from each classification prediction:
47+
48+
**For single-label classifications:**
49+
- Uses the "top" class (predicted class) from the classification result
50+
- Extracts class name, class ID, and confidence from the classification prediction
51+
52+
**For multi-label classifications:**
53+
- Finds the class with the highest confidence score
54+
- Uses the most confident label as the replacement class
55+
- Extracts class name, class ID, and confidence from the highest-confidence prediction
56+
57+
4. Handles missing classifications:
58+
- Detections without corresponding classification predictions are discarded by default
59+
- If `fallback_class_name` is provided, detections without classifications use the fallback class instead of being discarded
60+
- Fallback class ID is set to the provided value, or `sys.maxsize` if not specified or negative
61+
5. Filters detections:
62+
- Keeps only detections that have classification results (or fallback if specified)
63+
- Removes detections that cannot be matched to classification predictions
64+
6. Replaces class information:
65+
- Replaces class names in detections with classification class names
66+
- Replaces class IDs in detections with classification class IDs
67+
- Replaces confidence scores in detections with classification confidence scores
68+
- Updates all detection metadata to reflect the new class information
69+
7. Generates new detection IDs:
70+
- Creates new unique detection IDs for updated detections (prevents ID conflicts)
71+
- Ensures detection IDs are unique after class replacement
72+
8. Returns updated detections:
73+
- Outputs detections with replaced classes, maintaining bounding box coordinates and other properties
74+
- Output dimensionality matches input detection predictions (dimensionality level 1)
75+
76+
The block enables two-stage detection workflows where a generic detection model locates objects and a specialized classification model provides fine-grained labels. This is useful when you need generic localization (e.g., "dog") combined with specific classification (e.g., "Golden Retriever", "German Shepherd") without losing spatial information.
77+
78+
## Common Use Cases
79+
80+
- **Two-Stage Detection and Classification**: Combine generic detection with specialized classification for fine-grained labeling (e.g., detect "dog" then classify breed, detect "vehicle" then classify type, detect "person" then classify age group), enabling two-stage detection workflows
81+
- **Class Refinement**: Refine generic class labels with specific classifications from specialized models (e.g., refine "animal" to specific species, refine "vehicle" to specific models, refine "food" to specific dishes), enabling class refinement workflows
82+
- **Multi-Model Workflows**: Combine detection and classification models to leverage the strengths of both (e.g., use generic detector for localization and specialist classifier for identification, combine coarse and fine-grained models, leverage specialized classifiers with general detectors), enabling multi-model workflows
83+
- **Hierarchical Classification**: Apply hierarchical classification where detection provides high-level classes and classification provides detailed sub-classes (e.g., detect "mammal" then classify species, detect "plant" then classify variety, detect "structure" then classify type), enabling hierarchical classification workflows
84+
- **Crop-Based Classification**: Use classification results from cropped regions to enhance detection results (e.g., classify crops to improve detection labels, apply specialized classifiers to detected regions, refine detections with crop classifications), enabling crop-based classification workflows
85+
- **Fine-Grained Object Recognition**: Enable fine-grained recognition by combining localization and detailed classification (e.g., recognize specific product models, identify specific animal breeds, classify specific vehicle types), enabling fine-grained recognition workflows
86+
87+
## Connecting to Other Blocks
88+
89+
This block receives detection and classification predictions and produces detections with replaced classes:
90+
91+
- **After detection and classification model blocks** to combine generic detection with specialized classification (e.g., object detection + classification to refined detections, detection model + classifier to labeled detections), enabling detection-classification fusion workflows
92+
- **After crop blocks** that create crops from detections for classification (e.g., crop detections then classify crops, create crops for classification then replace classes), enabling crop-classification workflows
93+
- **Before visualization blocks** to display detections with refined classes (e.g., visualize refined detections, display detections with specific labels, show classification-enhanced detections), enabling refined detection visualization workflows
94+
- **Before filtering blocks** to filter detections with refined classes (e.g., filter by specific classes, filter refined detections, apply filters to classified detections), enabling refined detection filtering workflows
95+
- **Before analytics blocks** to perform analytics on refined detections (e.g., analyze specific classes, perform analytics on classified detections, track refined detection metrics), enabling refined detection analytics workflows
96+
- **In workflow outputs** to provide refined detections as final output (e.g., two-stage detection outputs, classification-enhanced detection outputs, refined detection results), enabling refined detection output workflows
97+
98+
## Requirements
99+
100+
This block requires object detection predictions (with bounding boxes) and classification predictions from crops of those bounding boxes. The classification predictions must have `PARENT_ID_KEY` (detection_id) to link classifications to their source detections. The block accepts different dimensionality levels: detection predictions at level 1 and classification predictions at level 2 (from crops). For single-label classifications, the "top" class is used. For multi-label classifications, the most confident class is selected. Detections without classification results are discarded unless `fallback_class_name` is provided. The block outputs detections with replaced classes, class IDs, and confidences, with new detection IDs generated. Output dimensionality matches input detection predictions (level 1).
40101
"""
41102

42103
SHORT_DESCRIPTION = "Replace classes of detections with classes predicted by a chained classification model."
@@ -70,26 +131,31 @@ class BlockManifest(WorkflowBlockManifest):
70131
]
71132
) = Field(
72133
title="Regions of Interest",
73-
description="The output of a detection model describing the bounding boxes that will have classes replaced.",
74-
examples=["$steps.my_object_detection_model.predictions"],
134+
description="Detection predictions (object detection, instance segmentation, or keypoint detection) containing bounding boxes with generic class labels that will be replaced with classification results. These detections should correspond to the regions that were cropped and classified. Detections must have detection IDs that match the PARENT_ID_KEY in classification predictions. Detections at dimensionality level 1.",
135+
examples=[
136+
"$steps.object_detection_model.predictions",
137+
"$steps.instance_segmentation_model.predictions",
138+
],
75139
)
76140
classification_predictions: Selector(kind=[CLASSIFICATION_PREDICTION_KIND]) = Field(
77141
title="Classification results for crops",
78-
description="The output of classification model for crops taken based on RoIs pointed as the other parameter",
79-
examples=["$steps.my_classification_model.predictions"],
142+
description="Classification predictions from a classifier applied to cropped regions of the detections. Each classification result must have PARENT_ID_KEY (detection_id) linking it to its source detection. Supports both single-label (uses 'top' class) and multi-label (uses most confident class) classifications. Classification results at dimensionality level 2 (one classification per crop/detection).",
143+
examples=[
144+
"$steps.classification_model.predictions",
145+
"$steps.breed_classifier.predictions",
146+
],
80147
)
81148
fallback_class_name: Union[Optional[str], Selector(kind=[STRING_KIND])] = Field(
82149
default=None,
83150
title="Fallback class name",
84-
description="The class name to be used as a fallback if no class is predicted for a bounding box",
85-
examples=["unknown"],
151+
description="Optional class name to use for detections that don't have corresponding classification predictions. If not provided (default None), detections without classifications are discarded. If provided, detections without classifications use this fallback class name instead of being removed. Useful for preserving detections when classification fails or is unavailable.",
152+
examples=[None, "unknown", "unclassified"],
86153
)
87154
fallback_class_id: Union[Optional[int], Selector(kind=[INTEGER_KIND])] = Field(
88155
default=None,
89156
title="Fallback class id",
90-
description="The class id to be used as a fallback if no class is predicted for a bounding box;"
91-
f"if not specified or negative, the class id will be set to {sys.maxsize}",
92-
examples=[77],
157+
description="Optional class ID to use with fallback_class_name for detections without classification predictions. If not specified or negative, the class ID is set to sys.maxsize. Only used when fallback_class_name is provided. Should match the class ID mapping used in your model.",
158+
examples=[None, 77, 999],
93159
)
94160

95161
@classmethod

0 commit comments

Comments
 (0)