Skip to content

Commit e7f5ffa

Browse files
authored
Merge pull request #3552 from wdkwyf/smartlab_models
[Smartlab]Update smartlab models
2 parents e801239 + 6d06462 commit e7f5ffa

File tree

10 files changed

+295
-244
lines changed

10 files changed

+295
-244
lines changed

data/dataset_definitions.yml

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1462,6 +1462,17 @@ datasets:
14621462
input_suffix: in
14631463
reference_suffix: out
14641464

1465+
- name: online_mstcn_plus_encoder_dataset
1466+
data_source: annotation
1467+
reader: numpy_reader
1468+
annotation_conversion:
1469+
converter: feature_regression
1470+
reference_dir: image
1471+
input_dir: image
1472+
input_suffix: in
1473+
reference_suffix: out
1474+
use_bin_data: True
1475+
14651476
- name: smartlab_detection_10cl_top
14661477
data_source: object_detection/streams_1/top/images
14671478
annotation_conversion:

models/intel/device_support.md

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -136,8 +136,10 @@
136136
| yolo-v2-tiny-ava-sparse-30-0001 | YES | YES | YES |
137137
| yolo-v2-tiny-ava-sparse-60-0001 | YES | YES | YES |
138138
| yolo-v2-tiny-vehicle-detection-0001 | YES | YES | YES |
139-
| smartlab-object-detection-0001 | YES | | |
140-
| smartlab-object-detection-0002 | YES | | |
141-
| smartlab-object-detection-0003 | YES | | |
142-
| smartlab-object-detection-0004 | YES | | |
139+
| smartlab-object-detection-0001 | YES | YES | |
140+
| smartlab-object-detection-0002 | YES | YES | |
141+
| smartlab-object-detection-0003 | YES | YES | |
142+
| smartlab-object-detection-0004 | YES | YES | |
143+
| smartlab-action-recognition-0001 | YES | YES | |
143144
| smartlab-sequence-modelling-0001 | YES | YES | |
145+
| smartlab-sequence-modelling-0002 | YES | YES | |

models/intel/index.md

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -475,8 +475,12 @@ Deep Learning models for online sequence modeling.
475475

476476
| Model Name | Complexity (GFLOPs) | Size (Mp) |
477477
|------------|---------------------|-----------|
478-
| [smartlab-sequence-modelling-0001](./smartlab-sequence-modelling-0001/README.md) | 0.049 | 1.02 |
479-
478+
| [smartlab-sequence-modelling-0001](./smartlab-sequence-modelling-0001/README.md) | 0.11 | 2.537 |
479+
| [smartlab-sequence-modelling-0002](./smartlab-sequence-modelling-0002/README.md) | 0.049 | 1.02 |
480+
| [smartlab-action-recognition-0001](./smartlab-action-recognition-0001/README.md) | | |
481+
| smartlab-action-recognition-0001-encoder-side | 0.611 | 3.387 |
482+
| smartlab-action-recognition-0001-encoder-top | 0.611 | 3.387 |
483+
| smartlab-action-recognition-0001-decoder | 0.008 | 4.099 |
480484
## See Also
481485

482486
* [Open Model Zoo Demos](../../demos/README.md)
Lines changed: 75 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,75 @@
1+
# smartlab-action-recognition-0001 (composite)
2+
3+
## Use Case and High-Level Description
4+
5+
There are 3 models for smartlab action recogntion including two encoder models and one decoder model.
6+
7+
These models are fine-tuned with smartlab dataset to predict actions and can classfy 3 types of action including "noise_action", "put_take" and "adjust_rider".
8+
9+
## Example of the input data
10+
![](./assets/frame0001.jpg)
11+
12+
## Example of the output
13+
Output `put_take` action
14+
15+
## Composite model specification
16+
| Metric | Value |
17+
| ---------------------------------------------- | ------------------ |
18+
| Accuracy on the DSI1867 | TODO |
19+
| Source framework | PyTorch\* |
20+
21+
## Encoder models specification
22+
23+
The smartlab-action-recognition-0001-encoder-* have Mobilenet-V2 like backbone with convolutional encoder part of the action recognition.
24+
25+
There are two models called: `smartlab-action-recognition-0001-encoder-side` and `smartlab-action-recognition-0001-encoder-top`, which have the same strcuture but different weights.
26+
27+
| Metric | Value |
28+
| ------- | ----- |
29+
| GFlops | 0.611 |
30+
| MParams | 3.387 |
31+
32+
### Inputs
33+
34+
Image, name: `input_image`, shape: `1, 3, 224, 224` in the `B, C, H, W` format, where:
35+
36+
- `B` - batch size
37+
- `C` - number of channels
38+
- `H` - image height
39+
- `W` - image width
40+
Expected color order is `BGR`
41+
42+
### Outputs
43+
44+
1. Name: `output_feature`, shape: `1, 1280`. Features from encoder part of action recogntion head.
45+
46+
## Decoder model specification
47+
48+
The smartlab-action-recognition-0001-decoder is a fully connected decoder part which accepts features from top and front views, computed by encoder and predicts score for action across following label list: `no_action`, `noise_action`, `adjust_rider`
49+
50+
| Metric | Value |
51+
| ------- | ----- |
52+
| GFlops | 0.008 |
53+
| MParams | 4.099 |
54+
55+
### Inputs
56+
57+
1. Name: `input_feature_1`, shape: `1, 1280`. Encoded features from topview.
58+
2. Name: `input_feature_2`, shape: `1, 1280`. Encoded features from frontview.
59+
60+
### Outputs
61+
62+
1. Name: `decoder_hidden`, shape: `1, 3`. The foramt [`has_action_conf_score`, `action_1_logits`, `action_2_logits`]
63+
* `has_action_conf_score` - confidence for action frame. If>0.5, there is specified action.
64+
* `action_1_logits` - confidence for the put_take action class
65+
* `action_2_logits` - confidence for the adjust_rider action class
66+
67+
Classification confidence scores in the [0, 1] range.
68+
## Demo usage
69+
The model can be used in the following demos provided by the Open Model Zoo to show its capabilities:
70+
71+
- [smartlab_demo/python](../../../demos/smartlab_demo/python/README.md)
72+
73+
## Legal Information
74+
75+
[*] Other names and brands may be claimed as the property of others.
41 KB
Loading
Lines changed: 10 additions & 130 deletions
Original file line numberDiff line numberDiff line change
@@ -1,153 +1,33 @@
11
# smartlab-sequence-modelling-0001
22

33
## Use Case and High-Level Description
4-
This is an online action segmentation network for 16 classes trained on Intel dataset. It is an online version of MSTCN++. The difference between online MSTCN++ and MSTCN++ is that the former accept stream video as input while the latter assume the whole video is given.
5-
6-
For the original MSTCN++ model details see [paper](https://arxiv.org/abs/2006.09220)
4+
This is a feature extractor that is based on Mobilenet-v3 network without origianl classifier layer. Input is RGB image and output is feature vector.
5+
For the original mobilenet-v3 model details see [PyTorch\* document](https://pytorch.org/vision/stable/models/generated/torchvision.models.mobilenet_v3_small.html#torchvision.models.mobilenet_v3_small) and [paper](https://arxiv.org/abs/1905.02244).
76

87
## Specification
98

109
| Metric | Value |
1110
|---------------------------------|-------------------------------------------|
12-
| GOPs | 0.048915 |
13-
| MParams | 1.018179 |
11+
| GOPs | 0.11 |
12+
| MParams | 2.537 |
1413
| Source framework | PyTorch\* |
1514

16-
## Accuracy
17-
<table>
18-
<tr>
19-
<th colspan="2">Accuracy</th>
20-
<th>noise/background</th>
21-
<th>remove_support_sleeve</th>
22-
<th>adjust_rider</th>
23-
<th>adjust_nut</th>
24-
<th>adjust_balancing</th>
25-
<th>open_box</th>
26-
<th>close_box</th>
27-
<th>choose_weight</th>
28-
<th>put_left</th>
29-
<th>put_right</th>
30-
<th>take_left</th>
31-
<th>take_right</th>
32-
<th>install support_sleeve</th>
33-
<th>mean</th>
34-
<th>mPR (P+R)/2</th>
35-
</tr>
36-
<tbody>
37-
<tr>
38-
<td rowspan=2>frame-level</td>
39-
<td rowspan=1>precision</td>
40-
<td>0.22</td>
41-
<td>0.84</td>
42-
<td>0.81</td>
43-
<td>0.62</td>
44-
<td>0.67</td>
45-
<td>0.87</td>
46-
<td>0.56</td>
47-
<td>0.52</td>
48-
<td>0.54</td>
49-
<td>0.74</td>
50-
<td>0.62</td>
51-
<td>0.68</td>
52-
<td>0.86</td>
53-
<td>0.66</td>
54-
<td rowspan=2>0.66</td>
55-
</tr>
56-
<tr>
57-
<td rowspan=1>recall</td>
58-
<td>0.4</td>
59-
<td>0.95</td>
60-
<td>0.83</td>
61-
<td>0.86</td>
62-
<td>0.43</td>
63-
<td>0.8</td>
64-
<td>0.31</td>
65-
<td>0.52</td>
66-
<td>0.68</td>
67-
<td>0.65</td>
68-
<td>0.62</td>
69-
<td>0.51</td>
70-
<td>0.92</td>
71-
<td>0.65</td>
72-
</tr>
73-
<tr>
74-
<td rowspan=2>segment IOU</td>
75-
<td rowspan=1>precision</td>
76-
<td>0.38</td>
77-
<td>0.94</td>
78-
<td>0.77</td>
79-
<td>0.65</td>
80-
<td>0.6</td>
81-
<td>0.85</td>
82-
<td>0.56</td>
83-
<td>0.68</td>
84-
<td>0.74</td>
85-
<td>0.88</td>
86-
<td>0.72</td>
87-
<td>0.78</td>
88-
<td>0.69</td>
89-
<td>0.7</td>
90-
<td rowspan=2>0.77</td>
91-
</tr>
92-
<tr>
93-
<td>recall</td>
94-
<td>0.64</td>
95-
<td>1</td>
96-
<td>0.96</td>
97-
<td>0.94</td>
98-
<td>0.62</td>
99-
<td>0.96</td>
100-
<td>0.48</td>
101-
<td>0.77</td>
102-
<td>0.91</td>
103-
<td>0.88</td>
104-
<td>0.83</td>
105-
<td>0.85</td>
106-
<td>1</td>
107-
<td>0.83</td>
108-
</tr>
109-
</tbody>
110-
</table>
11115

112-
Notice: In the accuracy report, feature extraction network is i3d-rgb, you can get this model from `../../public/i3d-rgb-tf/README.md`.
11316

11417
## Inputs
115-
The inputs to the network are feature vectors at each video frame, which should be the output of feature extraction network, such as [i3d-rgb-tf](../../public/i3d-rgb-tf/README.md) and [resnet-50-tf](../../public/resnet-50-tf/README.md), and feature outputs of the previous frame.
116-
117-
You can check the i3d-rgb and smartlab-sequence-modelling-0001 usage in demos/smartlab_demo
11818

119-
1. Input feature, name: `input`, shape: `1, 2048, 24`, format: `B, W, H`, where:
19+
Image, name: `input`, shape: `1, 3, 224, 224`, format: `B, C, H, W`, where:
12020

12121
- `B` - batch size
122-
- `W` - feature map width
123-
- `H` - feature map height
124-
125-
2. History feature 1, name: `fhis_in_0`, shape: `12, 64, 2048`, format: `C, H', W`,
126-
3. History feature 2, name: `fhis_in_1`, shape: `11, 64, 2048`, format: `C, H', W`,
127-
4. History feature 3, name: `fhis_in_2`, shape: `11, 64, 2048`, format: `C, H', W`,
128-
5. History feature 4, name: `fhis_in_3`, shape: `11, 64, 2048`, format: `C, H', W`, where:
22+
- `C` - number of channels
23+
- `H` - image height
24+
- `W` - image width
12925

130-
- `C` - the channel number of feature vector
131-
- `H`- feature map height
132-
- `W` - feature map width
13326

13427
## Outputs
13528

136-
The outputs also include two parts: predictions and four feature outputs. Predictions is the action classification and prediction results. Four Feature maps are the model layer features in past frames.
137-
1. Prediction, name: `output`, shape: `4, 1, 64, 24`, format: `C, B, H, W`,
138-
- `C` - the channel number of feature vector
139-
- `B` - batch size
140-
- `H`- feature map height
141-
- `W` - feature map width
142-
After post-process with argmax() function, the prediction result can be used to decide the action type of the current frame.
143-
2. History feature 1, name: `fhis_out_0`, shape: `12, 64, 2048`, format: `C, H, W`,
144-
3. History feature 2, name: `fhis_out_1`, shape: `11, 64, 2048`, format: `C, H, W`,
145-
4. History feature 3, name: `fhis_out_2`, shape: `11, 64, 2048`, format: `C, H, W`,
146-
5. History feature 4, name: `fhis_out_3`, shape: `11, 64, 2048`, format: `C, H, W`, where:
147-
148-
- `C` - the channel number of feature vector
149-
- `H`- feature map height
150-
- `W` - feature map width
29+
Model has output name: `output`, shape: `1, 576, 1, 1`
30+
`576` is the length of feature map.
15131

15232
## Legal Information
15333
[*] Other names and brands may be claimed as the property of others.

models/intel/smartlab-sequence-modelling-0001/accuracy-check.yml

Lines changed: 6 additions & 48 deletions
Original file line numberDiff line numberDiff line change
@@ -3,62 +3,20 @@ models:
33
launchers:
44
- framework: openvino
55
adapter:
6-
type: multi_output_regression
7-
ignore_batch: True
8-
outputs:
9-
- output
10-
- fhis_out_0
11-
- fhis_out_1
12-
- fhis_out_2
13-
- fhis_out_3
6+
type: regression
147
inputs:
158
- name: input
169
type: INPUT
1710
value: .*input
18-
shape: [1, 2048, 24]
19-
layout: NHWC
20-
- name: fhis_in_0
21-
type: INPUT
22-
value: .*fhis_in_0
23-
shape: [12, 64, 2048]
24-
layout: NHWC
25-
- name: fhis_in_1
26-
type: INPUT
27-
value: .*fhis_in_1
28-
shape: [11, 64, 2048]
29-
layout: NHWC
30-
- name: fhis_in_2
31-
type: INPUT
32-
value: .*fhis_in_2
33-
shape: [11, 64, 2048]
34-
layout: NHWC
35-
- name: fhis_in_3
36-
type: INPUT
37-
value: .*fhis_in_3
38-
shape: [11, 64, 2048]
39-
layout: NHWC
40-
allow_reshape_input: True
11+
shape: [1, 3, 224, 224]
12+
layout: NCHW
4113
datasets:
42-
- name: online_mstcn_plus_dataset
14+
- name: online_mstcn_plus_encoder_dataset
4315
metrics:
4416
- type: mae
45-
max_error: True
4617
presenter: print_vector
4718
abs_threshold: 0.1
4819
rel_threshold: 1
4920
reference:
50-
output@max_error: 1e-5
51-
output@mean: 1e-5
52-
output@std: 0
53-
fhis_out_0@max_error: 1e-5
54-
fhis_out_0@mean: 1e-5
55-
fhis_out_0@std: 0
56-
fhis_out_1@max_error: 1e-5
57-
fhis_out_1@mean: 1e-5
58-
fhis_out_1@std: 0
59-
fhis_out_2@max_error: 1e-5
60-
fhis_out_2@mean: 1e-5
61-
fhis_out_2@std: 0
62-
fhis_out_3@max_error: 1e-5
63-
fhis_out_3@mean: 1e-5
64-
fhis_out_3@std: 0
21+
mean: 0.05
22+
std: 0.1

0 commit comments

Comments
 (0)