Skip to content

Commit d79894a

Browse files
Change docs for action recognition (#1940)
* Change docs for action recognition * Fix typo * Update MoViNet related parts * Add MoViNet performance * Revert table include Complexity and Model size --------- Co-authored-by: Jihwan Eom <[email protected]>
1 parent e654199 commit d79894a

File tree

3 files changed

+91
-18
lines changed

3 files changed

+91
-18
lines changed

docs/source/guide/explanation/algorithms/action/action_classification.rst

Lines changed: 11 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -27,21 +27,23 @@ Refer to our tutorial for more information on how to train, validate, and optimi
2727
Models
2828
******
2929

30-
We support `X3D <https://arxiv.org/abs/2004.04730>`_ for action classification. X3D is a deep learning model that was proposed in the paper "X3D: Expanding Architectures for Efficient Video Recognition" by Christoph Feichtenhofer. The model is an extension of the popular 2D convolutional neural network (CNN) architectures to the 3D domain, allowing it to efficiently process spatiotemporal information in videos.
30+
Currently OpenVINO™ Training Extensions supports `X3D <https://arxiv.org/abs/2004.04730>`_ and `MoViNet <https://arxiv.org/pdf/2103.11511.pdf>`_ for action classification.
3131

32-
Currenly OpenVINO™ Training Extensions supports X3D-S model with below template:
32+
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------+---------------------+-------------------------+
33+
| Template ID | Name | Complexity (GFLOPs) | Model size (MB) |
34+
+========================================================================================================================================================================================+=========+=====================+=========================+
35+
| `Custom_Action_Classification_X3D <https://github.com/openvinotoolkit/training_extensions/blob/develop/otx/algorithms/action/configs/classification/x3d/template.yaml>`_ | X3D | 2.49 | 3.79 |
36+
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------+---------------------+-------------------------+
37+
| `Custom_Action_Classificaiton_MoViNet <https://github.com/openvinotoolkit/training_extensions/blob/develop/otx/algorithms/action/configs/classification/movinet/template.yaml>`_ | MoViNet | 2.71 | 3.10 |
38+
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------+---------------------+-------------------------+
3339

34-
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------+---------------------+-------------------------+
35-
| Template ID | Name | Complexity (GFLOPs) | Model size (MB) |
36-
+===============================================================================================================================================================================+=========+=====================+=========================+
37-
| `Custom_Action_Classification_X3D <https://github.com/openvinotoolkit/training_extensions/blob/develop/otx/algorithms/action/configs/classification/x3d/template.yaml>`_ | X3D | 2.49 | 3.79 |
38-
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------+---------------------+-------------------------+
3940

40-
41-
In the table below the **top-1 accuracy** on some academic datasets are presented. Each model is trained with single Nvidia GeForce RTX3090.
41+
In the table below the **top-1 accuracy** on some academic datasets are presented. Each model is trained with single NVIDIA GeForce RTX 3090.
4242

4343
+-----------------------+------------+-----------------+
4444
| Model name | HMDB51 | UCF101 |
4545
+=======================+============+=================+
4646
| X3D | 67.19 | 87.89 |
4747
+-----------------------+------------+-----------------+
48+
| MoViNet | 62.74 | 81.32 |
49+
+-----------------------+------------+-----------------+

docs/source/guide/tutorials/base/how_to_train/action_classification.rst

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -74,7 +74,6 @@ According to the `documentation <https://mmaction2.readthedocs.io/en/latest/supp
7474
│ │ │ │ ├── 20060723sfjffbartsinger_wave_f_cm_np1_ba_med_0
7575
│ │ │ │ ├── ...
7676
│ │ │ │ ├── winKen_wave_u_cm_np1_ri_bad_1
77-
|
7877
7978
Once you have the dataset structured properly, copy ``mmaction2/data`` folder, which contains hmdb51 dataset, to ``training_extensions/data``.
8079
Then, you can now convert it to the `CVAT <https://www.cvat.ai/>`_ format using the following command:
@@ -128,17 +127,18 @@ To see the list of supported templates, run the following command:
128127

129128
.. note::
130129

131-
OpenVINO™ Training Extensions is supporting only X3D model template now, other architecture will be supported in near future.
130+
OpenVINO™ Training Extensions supports X3D and MoViNet template now, other architecture will be supported in future.
132131

133132
.. code-block::
134133
135134
(otx) ...$ otx find --task action_classification
136135
137-
+-----------------------+----------------------------------+------+----------------------------------------------------------------+
138-
| TASK | ID | NAME | BASE PATH |
139-
+-----------------------+----------------------------------+------+----------------------------------------------------------------+
140-
| ACTION_CLASSIFICATION | Custom_Action_Classification_X3D | X3D | otx/algorithms/action/configs/classification/x3d/template.yaml |
141-
+-----------------------+----------------------------------+------+----------------------------------------------------------------+
136+
+-----------------------+--------------------------------------+---------+-----------------------------------------------------------------------+
137+
| TASK | ID | NAME | BASE PATH |
138+
+-----------------------+--------------------------------------+---------+-----------------------------------------------------------------------+
139+
| ACTION_CLASSIFICATION | Custom_Action_Classification_X3D | X3D | ../otx/algorithms/action/configs/classification/x3d/template.yaml |
140+
| ACTION_CLASSIFICATION | Custom_Action_Classification_MoViNet | MoViNet | ../otx/algorithms/action/configs/classification/movinet/template.yaml |
141+
+-----------------------+--------------------------------------+---------+-----------------------------------------------------------------------+
142142
143143
All commands will be run on the X3D model. It's a light model, that achieves competitive accuracy while keeping the inference fast.
144144

@@ -254,7 +254,7 @@ Optimization
254254
*************
255255

256256
1. You can further optimize the model with ``otx optimize``.
257-
Currently, only POT is supported for action classsification. NNCF will be supported in near future.
257+
Currently, quantization jobs that include POT is supported for X3D template. MoViNet will be supported in near future.
258258
Refer to :doc:`optimization explanation <../../../explanation/additional_features/models_optimization>` section for more details on model optimization.
259259

260260
2. Example command for optimizing
@@ -275,4 +275,4 @@ Keep in mind that POT will take some time (generally less than NNCF optimization
275275
efficient model representation ready-to-use action classification model.
276276

277277
The following tutorials provide further steps on how to :doc:`deploy <../deploy>` and use your model in the :doc:`demonstration mode <../demo>` and visualize results.
278-
The examples are provided with an object detection model, but it is easy to apply them for action classification by substituting the object detection model with classification one.
278+
The examples are provided with an object detection model, but it is easy to apply them for action classification by substituting the object detection model with classification one.

docs/source/guide/tutorials/base/how_to_train/action_detection.rst

Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -153,3 +153,74 @@ We will get a similar to this validation output after some validation time (abou
153153
.. note::
154154

155155
Currently we don't support export and optimize task in action detection. We will support these features very near future.
156+
157+
158+
*********
159+
Export
160+
*********
161+
162+
1. ``otx export`` exports a trained Pytorch `.pth` model to the OpenVINO™ Intermediate Representation (IR) format.
163+
It allows running the model on the Intel hardware much more efficiently, especially on the CPU. Also, the resulting IR model is required to run POT optimization. IR model consists of two files: ``openvino.xml`` for weights and ``openvino.bin`` for architecture.
164+
165+
2. Run the command line below to export the trained model
166+
and save the exported model to the ``openvino_models`` folder.
167+
168+
.. code-block::
169+
170+
(otx) ...$ otx export
171+
172+
2023-03-24 15:03:35,993 - mmdeploy - INFO - Export PyTorch model to ONNX: /tmp/OTX-task-ffw8llin/openvino.onnx.
173+
2023-03-24 15:03:44,450 - mmdeploy - INFO - Args for Model Optimizer: mo --input_model="/tmp/OTX-task-ffw8llin/openvino.onnx" --output_dir="/tmp/OTX-task-ffw8llin/" --output="bboxes,labels" --input="input" --input_shape="[1, 3, 32, 256, 256]" --mean_values="[123.675, 116.28, 103.53]" --scale_values="[58.395, 57.12, 57.375]" --source_layout=bctwh
174+
2023-03-24 15:03:46,707 - mmdeploy - INFO - [ INFO ] The model was converted to IR v11, the latest model format that corresponds to the source DL framework input/output format. While IR v11 is backwards compatible with OpenVINO Inference Engine API v1.0, please use API v2.0 (as of 2022.1) to take advantage of the latest improvements in IR v11.
175+
Find more information about API v2.0 and IR v11 at https://docs.openvino.ai/latest/openvino_2_0_transition_guide.html
176+
[ SUCCESS ] Generated IR version 11 model.
177+
[ SUCCESS ] XML file: /tmp/OTX-task-ffw8llin/openvino.xml
178+
[ SUCCESS ] BIN file: /tmp/OTX-task-ffw8llin/openvino.bin
179+
180+
2023-03-24 15:03:46,707 - mmdeploy - INFO - Successfully exported OpenVINO model: /tmp/OTX-task-ffw8llin/openvino.xml
181+
2023-03-24 15:03:46,756 - mmaction - INFO - Exporting completed
182+
183+
184+
3. Check the accuracy of the IR model and the consistency between the exported model and the PyTorch model,
185+
using ``otx eval`` and passing the IR model path to the ``--load-weights`` parameter.
186+
187+
.. code-block::
188+
189+
(otx) ...$ otx eval --test-data-roots ../data/JHMDB_5%/test \
190+
--load-weights model-exported/openvino.xml \
191+
--save-performance model-exported/performance.json
192+
193+
...
194+
195+
Performance(score: 0.0, dashboard: (3 metric groups))
196+
197+
.. note::
198+
199+
Unfortunately, openvino has trouble in export from ONNX file, which comes from torch 1.13.
200+
You can get proper openvino IR when you downgrade torch version to 1.12.1 when exporting.
201+
202+
203+
*************
204+
Optimization
205+
*************
206+
207+
1. You can further optimize the model with ``otx optimize``.
208+
Currently, only POT is supported for action detection. NNCF will be supported in near future.
209+
Refer to :doc:`optimization explanation <../../../explanation/additional_features/models_optimization>` section for more details on model optimization.
210+
211+
2. Example command for optimizing
212+
OpenVINO™ model (.xml) with OpenVINO™ POT.
213+
214+
.. code-block::
215+
216+
(otx) ...$ otx optimize --load-weights openvino_models/openvino.xml \
217+
--save-model-to pot_model
218+
219+
...
220+
221+
Performance(score: 0.0, dashboard: (3 metric groups))
222+
223+
Keep in mind that POT will take some time (generally less than NNCF optimization) without logging to optimize the model.
224+
225+
3. Now, you have fully trained, optimized and exported an
226+
efficient model representation ready-to-use action detection model.

0 commit comments

Comments
 (0)