Change docs for action recognition (#1940)

jaegukhyun · JihwanEom · web-flow · commit d79894a2ba87 · 2023-03-24T19:53:47.000+09:00
* Change docs for action recognition

* Fix typo

* Update MoViNet related parts

* Add MoViNet performance

* Revert table include Complexity and Model size

---------

Co-authored-by: Jihwan Eom &lt;jihwan.eom@intel.com&gt;
diff --git a/docs/source/guide/explanation/algorithms/action/action_classification.rst b/docs/source/guide/explanation/algorithms/action/action_classification.rst
@@ -27,21 +27,23 @@ Refer to our tutorial for more information on how to train, validate, and optimi
 Models
 ******
 
-We support `X3D <https://arxiv.org/abs/2004.04730>`_ for action classification. X3D is a deep learning model that was proposed in the paper "X3D: Expanding Architectures for Efficient Video Recognition" by Christoph Feichtenhofer. The model is an extension of the popular 2D convolutional neural network (CNN) architectures to the 3D domain, allowing it to efficiently process spatiotemporal information in videos. 
+Currently OpenVINO™ Training Extensions supports `X3D <https://arxiv.org/abs/2004.04730>`_ and `MoViNet <https://arxiv.org/pdf/2103.11511.pdf>`_ for action classification.
 
-Currenly OpenVINO™ Training Extensions supports X3D-S model with below template:
++----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------+---------------------+-------------------------+
+| Template ID                                                                                                                                                                            | Name    | Complexity (GFLOPs) | Model size (MB)         |
++========================================================================================================================================================================================+=========+=====================+=========================+
+| `Custom_Action_Classification_X3D <https://github.com/openvinotoolkit/training_extensions/blob/develop/otx/algorithms/action/configs/classification/x3d/template.yaml>`_               | X3D     | 2.49                | 3.79                    |
++----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------+---------------------+-------------------------+
+| `Custom_Action_Classificaiton_MoViNet <https://github.com/openvinotoolkit/training_extensions/blob/develop/otx/algorithms/action/configs/classification/movinet/template.yaml>`_       | MoViNet | 2.71                | 3.10                    |
++----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------+---------------------+-------------------------+
 
-+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------+---------------------+-------------------------+
-| Template ID                                                                                                                                                                   | Name    | Complexity (GFLOPs) | Model size (MB)         |
-+===============================================================================================================================================================================+=========+=====================+=========================+
-| `Custom_Action_Classification_X3D <https://github.com/openvinotoolkit/training_extensions/blob/develop/otx/algorithms/action/configs/classification/x3d/template.yaml>`_      | X3D     | 2.49                | 3.79                    |
-+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------+---------------------+-------------------------+
 
-
-In the table below the **top-1 accuracy** on some academic datasets are presented. Each model is trained with single Nvidia GeForce RTX3090.
+In the table below the **top-1 accuracy** on some academic datasets are presented. Each model is trained with single NVIDIA GeForce RTX 3090.
 
 +-----------------------+------------+-----------------+
 | Model name            | HMDB51     | UCF101          |
 +=======================+============+=================+
 | X3D                   | 67.19      | 87.89           |
 +-----------------------+------------+-----------------+
+| MoViNet               | 62.74      | 81.32           |
++-----------------------+------------+-----------------+
diff --git a/docs/source/guide/tutorials/base/how_to_train/action_classification.rst b/docs/source/guide/tutorials/base/how_to_train/action_classification.rst
@@ -74,7 +74,6 @@ According to the `documentation <https://mmaction2.readthedocs.io/en/latest/supp
     │   │   │   │   ├── 20060723sfjffbartsinger_wave_f_cm_np1_ba_med_0
     │   │   │   │   ├── ...
     │   │   │   │   ├── winKen_wave_u_cm_np1_ri_bad_1
-    |
 
 Once you have the dataset structured properly, copy ``mmaction2/data`` folder, which contains hmdb51 dataset, to ``training_extensions/data``. 
 Then, you can now convert it to the `CVAT <https://www.cvat.ai/>`_ format using the following command:
@@ -128,17 +127,18 @@ To see the list of supported templates, run the following command:
 
 .. note::
 
-  OpenVINO™ Training Extensions is supporting only X3D model template now, other architecture will be supported in near future.
+  OpenVINO™ Training Extensions supports X3D and MoViNet template now, other architecture will be supported in future.
 
 .. code-block::
 
   (otx) ...$ otx find --task action_classification
 
-  +-----------------------+----------------------------------+------+----------------------------------------------------------------+
-  |          TASK         |                ID                | NAME |                           BASE PATH                            |
-  +-----------------------+----------------------------------+------+----------------------------------------------------------------+
-  | ACTION_CLASSIFICATION | Custom_Action_Classification_X3D | X3D  | otx/algorithms/action/configs/classification/x3d/template.yaml |
-  +-----------------------+----------------------------------+------+----------------------------------------------------------------+
+  +-----------------------+--------------------------------------+---------+-----------------------------------------------------------------------+
+  |          TASK         |                  ID                  |   NAME  |                               BASE PATH                               |
+  +-----------------------+--------------------------------------+---------+-----------------------------------------------------------------------+
+  | ACTION_CLASSIFICATION |   Custom_Action_Classification_X3D   |   X3D   |   ../otx/algorithms/action/configs/classification/x3d/template.yaml   |
+  | ACTION_CLASSIFICATION | Custom_Action_Classification_MoViNet | MoViNet | ../otx/algorithms/action/configs/classification/movinet/template.yaml |
+  +-----------------------+--------------------------------------+---------+-----------------------------------------------------------------------+
 
 All commands will be run on the X3D model. It's a light model, that achieves competitive accuracy while keeping the inference fast.
 
@@ -254,7 +254,7 @@ Optimization
 *************
 
 1. You can further optimize the model with ``otx optimize``.
-Currently, only POT is supported for action classsification. NNCF will be supported in near future.
+Currently, quantization jobs that include POT is supported for X3D template. MoViNet will be supported in near future.
 Refer to :doc:`optimization explanation <../../../explanation/additional_features/models_optimization>` section for more details on model optimization.
 
 2. Example command for optimizing
@@ -275,4 +275,4 @@ Keep in mind that POT will take some time (generally less than NNCF optimization
 efficient model representation ready-to-use action classification model.
 
 The following tutorials provide further steps on how to :doc:`deploy <../deploy>` and use your model in the :doc:`demonstration mode <../demo>` and visualize results.
-The examples are provided with an object detection model, but it is easy to apply them for action classification by substituting the object detection model with classification one.
+The examples are provided with an object detection model, but it is easy to apply them for action classification by substituting the object detection model with classification one.
diff --git a/docs/source/guide/tutorials/base/how_to_train/action_detection.rst b/docs/source/guide/tutorials/base/how_to_train/action_detection.rst
@@ -153,3 +153,74 @@ We will get a similar to this validation output after some validation time (abou
 .. note::
 
   Currently we don't support export and optimize task in action detection. We will support these features very near future.
+
+
+*********
+Export
+*********
+
+1. ``otx export`` exports a trained Pytorch `.pth` model to the OpenVINO™ Intermediate Representation (IR) format.
+It allows running the model on the Intel hardware much more efficiently, especially on the CPU. Also, the resulting IR model is required to run POT optimization. IR model consists of two files: ``openvino.xml`` for weights and ``openvino.bin`` for architecture.
+
+2. Run the command line below to export the trained model
+and save the exported model to the ``openvino_models`` folder.
+
+.. code-block::
+
+  (otx) ...$ otx export 
+
+  2023-03-24 15:03:35,993 - mmdeploy - INFO - Export PyTorch model to ONNX: /tmp/OTX-task-ffw8llin/openvino.onnx.
+  2023-03-24 15:03:44,450 - mmdeploy - INFO - Args for Model Optimizer: mo --input_model="/tmp/OTX-task-ffw8llin/openvino.onnx" --output_dir="/tmp/OTX-task-ffw8llin/" --output="bboxes,labels" --input="input" --input_shape="[1, 3, 32, 256, 256]" --mean_values="[123.675, 116.28, 103.53]" --scale_values="[58.395, 57.12, 57.375]" --source_layout=bctwh
+  2023-03-24 15:03:46,707 - mmdeploy - INFO - [ INFO ] The model was converted to IR v11, the latest model format that corresponds to the source DL framework input/output format. While IR v11 is backwards compatible with OpenVINO Inference Engine API v1.0, please use API v2.0 (as of 2022.1) to take advantage of the latest improvements in IR v11.
+  Find more information about API v2.0 and IR v11 at https://docs.openvino.ai/latest/openvino_2_0_transition_guide.html
+  [ SUCCESS ] Generated IR version 11 model.
+  [ SUCCESS ] XML file: /tmp/OTX-task-ffw8llin/openvino.xml
+  [ SUCCESS ] BIN file: /tmp/OTX-task-ffw8llin/openvino.bin
+
+2023-03-24 15:03:46,707 - mmdeploy - INFO - Successfully exported OpenVINO model: /tmp/OTX-task-ffw8llin/openvino.xml
+2023-03-24 15:03:46,756 - mmaction - INFO - Exporting completed
+
+
+3. Check the accuracy of the IR model and the consistency between the exported model and the PyTorch model,
+using ``otx eval`` and passing the IR model path to the ``--load-weights`` parameter.
+
+.. code-block::
+
+  (otx) ...$ otx eval --test-data-roots ../data/JHMDB_5%/test \
+                      --load-weights model-exported/openvino.xml \
+                      --save-performance model-exported/performance.json
+
+  ...
+
+  Performance(score: 0.0, dashboard: (3 metric groups))
+
+.. note::
+
+  Unfortunately, openvino has trouble in export from ONNX file, which comes from torch 1.13.
+  You can get proper openvino IR when you downgrade torch version to 1.12.1 when exporting.
+
+
+*************
+Optimization
+*************
+
+1. You can further optimize the model with ``otx optimize``.
+Currently, only POT is supported for action detection. NNCF will be supported in near future.
+Refer to :doc:`optimization explanation <../../../explanation/additional_features/models_optimization>` section for more details on model optimization.
+
+2. Example command for optimizing
+OpenVINO™ model (.xml) with OpenVINO™ POT.
+
+.. code-block::
+
+  (otx) ...$ otx optimize --load-weights openvino_models/openvino.xml \
+                          --save-model-to pot_model
+
+  ...
+
+  Performance(score: 0.0, dashboard: (3 metric groups))
+
+Keep in mind that POT will take some time (generally less than NNCF optimization) without logging to optimize the model.
+
+3. Now, you have fully trained, optimized and exported an
+efficient model representation ready-to-use action detection model.