Skip to content

Commit 99bf9e3

Browse files
update proposal
Signed-off-by: Frank-lilinjie <lilinjie@bupt.edu.cn>
1 parent 43ec9af commit 99bf9e3

File tree

2 files changed

+116
-17
lines changed

2 files changed

+116
-17
lines changed

docs/proposals/algorithms/lifelong-learning/Unknown Task Recognition Algorithm Reproduction based on Lifelong Learning of Sedna.md

Lines changed: 116 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,17 @@
1-
# Unknown Task Recognition Algorithm Reproduction based on Lifelong Learning of Sedna
1+
# Unknown Task Recognition Algorithm Reproduction based on Lifelong Learning of Ianvs
22
Traditional machine learning performs test-set inference by training known samples to which its knowledge is limited. The model which is trained by traditional machine learning cannot be effectively recognized the unknown samples from new classes and they will be processed as known samples. Therefore, the recognition and processing of unknown samples or unknown tasks will become the main research direction of artificial intelligence in the future. This project aims to reproduce the CVPR2021 paper "Learning placeholders for open-set recognition" in the Semantic Segmentation dataset. This paper proposes placeholders that imitate the emergence of new classes, thus helping to transform closed-set training into open-set training.
33

44

55

66
## Goals
7-
1. The reproduction should be completed based on the Cityscapes Semantic Segmentation dataset.
8-
2. The reproduced code is successfully merged into the lifelong learning module of Sedna.
7+
1. The reproduction should be completed based on the Cityscapes Semantic Segmentation dataset and the Synthia dataset.
8+
2. The reproduced code is successfully merged into the lifelong learning module of Ianvs.
99
3. The recognition accuracy (e.g. f1_score) of unknown classes is greater than 0.9.
1010

1111

1212

1313
## Proposal
14-
The goal of this Sedna-based lifelong learning recurrence unknown task identification algorithm project is to identify unknown samples and known samples in the inference dataset and categorize them for subsequent task assignment in the lifelong learning system created by Sedna, after the initial training phase of task definition and model training.
14+
The goal of this Ianvs-based lifelong learning recurrence unknown task identification algorithm project is to identify unknown samples and known samples in the inference dataset and categorize them for subsequent task assignment in the lifelong learning system created by Ianvs, after the initial training phase of task definition and model training.
1515

1616
This project needs to complete the task definition part and the unknown task identification part.
1717

@@ -29,11 +29,11 @@ This part of the training model algorithm uses the RFNet method mentioned in the
2929

3030
The entire network architecture of RFNet is shown in Fig. In the encoder part of the architecture, we design two independent branches to extract features for RGB and depth images separately RGB branch as the main branch and the Depth branch as the subordinate branch. In both branches, we choose ResNet18 [30] as the backbone to extract features from inputs because ResNet-18 has moderate depth and residual structure, and its small operation footprint is compatible with the real-time operation. After each layer of ResNet-18, the output features from the Depth branch are fused to the RGB branch after the Attention Feature Complementary (AFC) module. The spatial pyramid pooling (SPP) block gathers the fused RGB-D features from two branches and produces feature maps with multi-scale information. Finally, referring to SwiftNet, we design the efficient upsampling modules to restore the resolution of these feature maps with skip connections from the RGB branch.
3131

32-
![](images/Overview_of_RFNet.png)
32+
<img src="images/Overview_of_RFNet.png" style="zoom:50%;" />
3333

3434
Fig. shows some examples from the validation set of Cityscapes and Lost and Found, which demonstrates the excellent segmentation accuracy of our RFNet in various scenarios with or without small obstacles.
3535

36-
![](images/example.png)
36+
<img src="images/example.png" style="zoom:33%;" />
3737

3838

3939

@@ -45,19 +45,114 @@ For this project, we use two datasets to train the models separately: cityscape
4545

4646
#### workflow
4747

48-
![](images/task_definition.png)
48+
<img src="images/task_definition.png" style="zoom:33%;" />
4949

5050

5151

5252
#### Dataset
5353

5454
##### Cityscape
5555

56-
![](images/city_RGB.png)
56+
###### Background
57+
58+
Cityscapes has 5000 images of driving scenes in urban environments (2975train, 500 val,1525test). The Cityscapes dataset, the Cityscapes dataset, is a new large-scale dataset containing a set of different stereo video sequences of street scenes recorded in 50 different cities.
59+
60+
Below shows one example RNB figures in the dataset.
61+
62+
<img src="images/city_RGB.png" style="zoom: 33%;" />
63+
64+
###### Data Explorer
65+
66+
The Cityscapes dataset contains 2975 images. The Cityscapes dataset contains 2975 images, including street view images and corresponding labels. The Cityscapes dataset, jointly provided by three German organizations including Daimler, contains stereo vision data for more than 50 cities.
67+
68+
The directories of this dataset is as follows:
69+
70+
```
71+
├─disparity
72+
│ ├─test
73+
│ │ ├─berlin
74+
│ │ ├─bielefeld
75+
│ │ ├─bonn
76+
│ │ ├─...
77+
│ │ └─munich
78+
│ ├─train
79+
│ │ ├─aachen
80+
│ │ ├─bochum
81+
│ │ ├─...
82+
│ │ └─zurich
83+
│ └─val
84+
│ ├─frankfurt
85+
│ ├─lindau
86+
│ └─munster
87+
├─gtFine
88+
│ ├─test
89+
│ │ ├─berlin
90+
│ │ ├─bielefeld
91+
│ │ ├─bonn
92+
│ │ ├─...
93+
│ │ └─munich
94+
│ ├─train
95+
│ │ ├─aachen
96+
│ │ ├─bochum
97+
│ │ ├─...
98+
│ │ └─zurich
99+
│ └─val
100+
│ ├─frankfurt
101+
│ ├─lindau
102+
│ └─munster
103+
└─leftImg8bit
104+
├─test
105+
│ ├─berlin
106+
│ ├─bielefeld
107+
│ ├─bonn
108+
│ ├─...
109+
│ └─munich
110+
├─train
111+
│ ├─aachen
112+
│ ├─bochum
113+
│ ├─...
114+
│ └─zurich
115+
└─val
116+
├─frankfurt
117+
├─lindau
118+
└─munster
119+
```
57120

58121
##### SYNTHIA-RAND-CITYSCAPES
59122

60-
![](images/mn_RGB.png)
123+
###### Background
124+
125+
SYNTHIA datasets consists of a collection of photo-realistic frames rendered from a virtual city and comes with precise pixel-level semantic annotations for 13 classes: misc, sky, building, road, sidewalk, fence, vegetation, pole, car, sign, pedestrian, cyclist, lane-marking.
126+
127+
Below shows one example RNB figures in the dataset.
128+
129+
<img src="images/mn_RGB.png" style="zoom:33%;" />
130+
131+
###### Data Explorer
132+
133+
It is a new set containing 9,000 random images with labels compatible with the CITYSCAPES test set. The list of classes is: void, sky, building, road, sidewalk, fence, vegetation, pole, car, traffic sign, pedestrian, bicycle, motorcycle, parking-slot, road-work, traffic light, terrain, rider, truck, bus, train, wall, lanemarking. These images are generated as random perturbation of the virtual world, therefore no temporal consistency is provided.
134+
135+
The directories of this dataset is as follows:
136+
137+
```
138+
├─Depth
139+
│ ├─0000000.png
140+
│ ├─...
141+
│ └─0009399.png
142+
├─GT
143+
│ ├─COLOR
144+
│ │ ├─0000000.png
145+
│ │ ├─...
146+
│ │ ├─0009399.png
147+
│ └─LABELS
148+
│ ├─0000000.png
149+
│ ├─...
150+
│ └─0009399.png
151+
└─RNB
152+
├─0000000.png
153+
├─...
154+
└─0009399.png
155+
```
61156

62157

63158

@@ -69,26 +164,26 @@ This project aims to reproduce the CVPR2021 paper "Learning placeholders for ope
69164

70165
The following is the workflow of the unknown task identification module. When faced with an inference task, the unknown task identification algorithm can give a timely indication of which data are known and which are unknown in the data set.
71166

72-
![](images/unknow.png)
167+
<img src="images/unknow.png" style="zoom:50%;" />
73168

74169
#### Main Work
75170

76171
Data placeholders and classification placeholders are set up in the paper to handle the unknown class recognition problem. Among them, the purpose of the data placeholders is to mimic the emergence of new classes and to transform closed training into open training. The purpose of reserving classification placeholders for new classes is to augment the closed-set classifier with a virtual classifier that adaptively outputs class-specific thresholds to distinguish known classes from unknown classes. Specifically, the paper augments the closed-set classifier with an additional classification placeholder that represents a class-specific threshold between known and unknown. The paper reserves the classification placeholder for open classes to obtain invariant information between target and non-target classes. To efficiently predict the distribution of new classes, the paper uses data placeholders, which mimic open classes at a limited complexity cost, as a way to enable the transformation of closed classifiers into open classifiers and adaptively predict class-specific thresholds during testing.
77172

78173
#### Algorithm Principle
79174

80-
![](images/algorithm.png)
175+
<img src="images/algorithm.png" style="zoom:50%;" />
81176

82177
##### Learning classifier placeholders
83178

84179
Retaining classifier placeholders aims at setting up additional virtual classifiers and optimizing them to represent the threshold between known and unknown classes. Assuming a well-trained closed-set classifier W; the paper first augments the output layer with additional virtual classifiers, as shown in Eq.
85180

86-
![](images/eq1.png)
181+
<img src="images/eq1.png" style="zoom:50%;" />
87182

88183

89184
The closed set classifier and the virtual classifier embed the same set (matrix) and create only one additional linear layer. The added indices are passed through the softmax layer to generate the posterior probabilities. By fine-tuning the model so that the virtual classifier outputs the second highest probability of a known class, the invariant information between the known class classifier and the virtual classifier can be transferred to the detection process by this method. Since the output is increased by using the virtual classifier, the classification loss can be expressed as Eq:
90185

91-
![](images/eq2.png)
186+
<img src="images/eq2.png" style="zoom:50%;" />
92187

93188

94189
L denotes cross entropy or other loss function. The first term in the formula corresponds to the output of the optimized expansion, which pushes the samples into the corresponding class groups to maintain accurate identification in the closed set. In the second term, matching the task to K+1 classes makes the virtual classifier output the second highest probability, tries to associate the position of the virtual classifier in the center space, and controls the distance to the virtual classifier as the second closest distance among all class centers. Thus, it seeks a trade-off between correctly classifying closed set instances and retaining the probability of a new class as a classifier placeholder. During training, it can be positioned between target and non-target classes. In the case of novel classes, the predictive power of the virtual classifier can be high because all known classes are non-target classes. Therefore, it is considered as an instance-related threshold that can be well adapted to each known class.
@@ -99,26 +194,30 @@ L denotes cross entropy or other loss function. The first term in the formula co
99194

100195
The purpose of learning data placeholders is to transform closed-set training into open-set training. The combined data placeholders should have two main characteristics, the distribution of these samples should look novel and the generation process should be fast. In this paper, we simulate new patterns with multiple mixtures. Equation 6 in the paper gives two samples from different categories and mixes them in the middle layer.
101196

102-
![](images/eq3.png)
197+
<img src="images/eq3.png" style="zoom:50%;" />
103198

104199
The results of the mixture are passed through the later layers to obtain the new model embedding post. considering that the interpolation between two different clusters is usually a low confidence prediction region. The paper treats the embedding φpost (̃xpre) as an embedding of the open set class and trains it as a new class.
105200

106-
![](images/eq4.png)
201+
<img src="images/eq4.png" style="zoom:50%;" />
107202

108203
It is clear that the formulation in the paper does not consume additional time complexity, which would generate new situations between multiple decision boundaries. In addition, streamwise blending allows better use of interpolation of deeper hidden representations to generate new patterns in the improved embedding space, which better represents the new distribution. As illustrated in the figure above, the blending instance pushes the decision boundaries in the embedding space to two separate locations of the classes. With the help of the data placeholders, the embedding of the known classes will be tighter, leaving more places for the new classes.
109204

110205

111206

112207
**PROSER algorithm training process**
113208

114-
![](images/process.png)
209+
<img src="images/process.png" style="zoom:50%;" />
210+
211+
115212

116213
### Embedded in Ianvs
117214

118-
![ ](images/ianvs_arch.png)
215+
![](images/workflow_ianvs_unknow_task_recognition.png)
119216

120217
After training, the unknown task algorithm model is placed in the Lifelong Learning Paradigm section of the Test Case Controller module as one of the algorithms for unknown task identification in the lifelong learning paradigm.
121218

219+
After users upload their own lifelong learning algorithms to the local Ianvs, the Ianvs component provides the dataset and testing environment, and offers a built-in unknown task recognition algorithm as an aid for testing. The test results are updated in the local leaderboard.
220+
122221

123222

124223
## Roadmap
43.4 KB
Loading

0 commit comments

Comments
 (0)