Skip to content
Original file line number Diff line number Diff line change
Expand Up @@ -10,12 +10,45 @@ Based on the lifelong learning system framework built by KubeEdge's edgecloud co
2. achieve >0.45 accuracy for open-world object segmentation (e.g., AP, mIoU).
3. use the reproduced algorithms to build a data annotation tool, which is integrated into the Sedna and Ianvs lifelong learning modules based on the Ianvs architecture for practical application.

# 3. Proposal
## 3. Expected Metrics

To evaluate how well the open-world segmentation framework performs and functions overall, we will apply the quantitative and qualitative metrics that come next.


### 3.1 Segmentation Quality

| Metric | Target | Description |
|--------|--------|-------------|
| **mIoU (Mean Intersection over Union)** | ≥ **0.45** | It looks at how much the predicted masks match up with the actual ground-truth ones for every class involved. |
| **AP (Average Precision)** | ≥ **0.45** | This setup checks precision and recall in object segmentation tasks. That matters a lot for spotting small obstacles. |
| **Boundary Accuracy** | Qualitative tracking | It also verifies the smoothness along with the accuracy of those segmentation edges. |


### 3.2 Annotation Efficiency

| Metric | Expected Effect | Description |
|--------|----------------|-------------|
| **Reduction in Manual Annotation Workload** | Significant improvement | SAM and SSA help out with annotations. They cut down human effort by quite a bit. |
| **Practical Annotation Usability** | ≥ **70% usable output** | The approach gauges how usable the annotations from SAM really are in practice. |
| **Correction Requirement** | Only needed for difficult cases | Just the tricky corner cases or mistakes need hands-on fixes — no need to redo everything from scratch. |

These metrics ensure objective tracking of segmentation progress and annotation efficiency throughout model development.

### 3.3 System Performance

| Metric | Expected Result |
|--------|-----------------|
| Stable long-batch annotation support | ✔ |
| Edge + Cloud inference compatibility | ✔ |
| Scalable integration with Ianvs Lifelong Learning | ✔ |


# 4. Proposal
Open world object segmentation is the core task of this project, and various basic computer vision tasks can be solved by fine-tuning the pre-trained large-scale open world segmentation model **SAM**.

This project reproduces the open world segmentation algorithm **Segment Anything Model** based on sedna and Ianvs lifelong learning, and chooses to fine-tune it on the Autopilot Small Obstacle Detection dataset according to the actual situation. This project tries to use **SAM and its related models** to realize the automatic annotation of open domain data supporting local and cloud deployment, users can send the demanded data to the cloud for processing through the annotation tool at the edge end, and then realize the complete data annotation process through the reasoning module based on SAM and its related algorithms in the cloud.

## 3.1 Algorithm architecture
## 4.1 Algorithm architecture
The overall algorithm process is illustrated in the following diagram.


Expand All @@ -30,7 +63,7 @@ In the first step, images are automatically annotated using the SSA model. Next,

The relationship between the algorithm and the Ianvs lifelong learning architecture is depicted in the figure above, with its primary location in the unknown task processing module.

## 3.2 User flow
## 4.2 User flow
![sam_annotation_user_flow](images/sam_annotation_user_flow.png)

1. Deploy distributed synergy AI benchmarking Ianvs
Expand All @@ -41,33 +74,43 @@ The relationship between the algorithm and the Ianvs lifelong learning architect
6. The cloud further trains and updates the model to achieve a closed loop of annotation-train-reasoning-evaluation
7. Derive the final model that meets the requirements of the actual task for further application

# 4. Design Details
## 5. High-Level Architecture

The system runs as an ongoing loop for annotation and training feedback. It sits inside Ianvs Lifelong Learning. Automatic segmentation happens with SAM/SSA there. People review it through Label Studio. Then they retrain the whole thing in the cloud. All this helps build a model that keeps evolving. It handles all kinds of road anomalies in open-world settings.

### 📊 System Architecture Overview
A closed-loop edge-to-cloud learning pipeline where SAM and SSA generate automatic annotations, humans refine only edge cases, and Ianvs retrains and redeploys the improved model continuously.

![High_Level_Architecture](images/High_Level_Architecture_SAM.jpeg)


# 6. Design Details
This section mainly introduces the algorithm model used in the system, the small obstacle detection dataset for automatic driving, and the principle of algorithm-assisted labeling.

## 4.1 Segment Model
### 4.1.1 Segment Anything Model(SAM)
## 6.1 Segment Model
### 6.1.1 Segment Anything Model(SAM)

The paper presents the [Segment Anything](https://arxiv.org/abs/2304.02643) (SA) project: a new task, a new model and a new dataset for image segmentation. Its use of the proposed SAM model in a data collection loop builds the largest segmentation dataset to date **SA-1B**, generating more than 1 billion masks on top of 11 million image data.Since the model is designed and trained to be promptable, it can be migrated to new image tasks in a ZERO-shot fashion.

The authors, influenced by pre-trained large language models in NLP, used the prompt engineering mechanism to create a similar large-scale vision base model with strong zero-shot (zero samples), few-shot (fewer samples) generalization capabilities to flexibly solve a variety of downstream tasks.The SAM has three main modules: the image encoder, Flexible Cue Encoder and Fast Mask Decoder.

![sam_model](images/sam_model.png)

### 4.1.2 Semantic Segment Anything Model
### 6.1.2 Semantic Segment Anything Model
![ssa_model](images/ssa_model.png)
While SAM is a powerful model capable of segmenting anything in images, it lacks the ability to predict the semantic categories of each mask. To address this limitation, [Semantic Segment Anything (SSA)](https://github.com/fudan-zvg/Semantic-Segment-Anything) was introduced.
Prior to the introduction of SAM, most semantic segmentation applications already had their own models. These models could provide rough category classifications for regions but with fuzzy and imprecise edges, lacking accurate masks. To overcome this issue, we proposed an open framework called SSA, which leverages SAM to enhance the performance of existing models. Specifically, the original semantic segmentation model provides category predictions, while the powerful SAM provides masks.
Additionally, the authors introduced an SSA engine, an open-vocabulary automatic annotation engine that benefits from the combined architecture of dense segmentation and open-vocabulary segmentation. This provides satisfactory annotations for most samples and can offer more detailed annotations using image captioning techniques.

## 4.2 Datasets
## 6.2 Datasets
The dataset used in this project is mainly a small obstacle detection dataset within the field of autonomous driving.

### 4.2.1 StreetHazards
### 6.2.1 StreetHazards
StreetHazards is a component of the 2019 CAOS benchmark. The training set includes three towns from CARLA and the dataset provides a variety of scenarios.

![dataset_sh](images/dataset_sh.png)

### 4.2.2 Lost and Found
### 6.2.2 Lost and Found
The Lost and Found dataset was introduced in 2016 by Pinggera et al. and is the first dataset focusing on small road hazard detection with binocular camera acquisition.

![dataset_laf](images/dataset_laf.png)
Expand All @@ -78,7 +121,7 @@ It provides stereo masks for semantic segmentation tasks allowing pixel-level an

Blum et al. introduced it to the Fishyscapes (FS) benchmark, the FS Lost and Found dataset, in 2019.

### 4.2.3 RoadAnomaly
### 6.2.3 RoadAnomaly
RoadAnomaly21 is a part of the SegmentMeIfYouCan benchmark, whose previous version was released in 2019 by Lis et al. It is designed for general anomaly detection scenarios in full street scenes, where the anomalies can be categorized as animals, unknown vehicles and tents, pianos or roadblocks. These anomalies can appear anywhere in the image, even in the sky.

![dataset_ra](images/dataset_ra.png)
Expand All @@ -91,7 +134,7 @@ RoadAnomaly21 is a part of the SegmentMeIfYouCan benchmark, whose previous versi
| Lost and Found <br> FS Lost and Found | 2016 <br> 2019 | 2104 <br> 275/100 | 2048×1024 |
| RoadAnomaly21 | 2021 | 100/10 | 2048×1024 <br> 1280×720 |

## 4.3 Ancillary labeling
## 6.3 Ancillary labeling

The principle of assisted annotation is to use the existing model for reasoning, and save the reasoning information into the label file format of the annotation software. Then the labeling software is manually operated to load the generated label file, and it only needs to check whether the target of each picture is standardized, and whether there are missed or wrongly labeled targets. This project plans to build a visualized auxiliary annotation tool through **SAM and its related models** to realize semi-automatic annotation or automatic annotation of image data, and try to realize manual adjustment of the annotation results.

Expand All @@ -101,7 +144,7 @@ The principle of assisted annotation is to use the existing model for reasoning,

Since the segmentation result of SAM does not include semantic labels, it hinders the achievement of fully automated labeling. Semantic segmentation is an image segmentation algorithm whose goal is to assign a semantic category label to each pixel in the input image, thereby segmenting the image into regions with semantic information. This segmentation provides an automated method for data annotation, making the process more efficient and accurate. The project plans to integrate an automatic annotation module through SAM-related semantic segmentation algorithms (such as SSA).

# 5. Road Map
# 7. Road Map

The route map is as follows:

Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading