Add explanation for XAI & minor doc fixes (#1923)

Galina Zalesskaya · yunchu · eunwoosh · web-flow · commit 72fc45591fd5 · 2023-03-23T17:12:18.000+09:00
* [CI] Updated daily workflow (#1904) Updated daily workflow - remove if statement to allow running on any branch by manually * [FIX] re-bugfix: ATSS head loss (#1907) re bugfix * Fix typos * Explanation of Explanation * Add images & typo fixes * Fixes from comments * Add accuracy for OD explanation * Tutorial update * Add accuracy for BCCD and WGISD * Fix --------- Co-authored-by: Yunchu Lee <yunchu.lee@intel.com> Co-authored-by: Eunwoo Shin <eunwoo.shin@intel.com>
diff --git a/docs/source/guide/explanation/additional_features/index.rst b/docs/source/guide/explanation/additional_features/index.rst
@@ -9,3 +9,4 @@ Additional Features
    models_optimization
    hpo
    auto_configuration
+   xai
diff --git a/docs/source/guide/explanation/additional_features/xai.rst b/docs/source/guide/explanation/additional_features/xai.rst
@@ -0,0 +1,95 @@
+Explainable AI (XAI)
+====================
+
+**Explainable AI (XAI)** is a field of research that aims to make machine learning models more transparent and interpretable to humans.
+The goal is to help users understand how and why AI systems make decisions and provide insight into their inner workings. It allows us to detect, analyze, and prevent common mistakes, for example, when the model uses irrelevant features to make a prediction.
+XAI can help to build trust in AI, make sure that the model is safe for development and increase its adoption in various domains.
+
+Most XAI methods generate **saliency maps** as a result. Saliency map is a visual representation, suitable for human comprehension, that highlights the most important parts of the image from the model point of view.
+It looks like a heatmap, where warm-colored areas represent the areas with main focus.
+
+
+.. figure:: ../../../../utils/images/xai_example.jpg
+  :width: 600
+  :alt: this image shows the result of XAI algorithm
+
+  These images are taken from `D-RISE paper <https://arxiv.org/abs/2006.03204>`_.
+
+
+We can generate saliency maps for a certain model that was trained in OpenVINO™ Training Extensions, using ``otx explain`` command line. Learn more about its usage in  :doc:`../../tutorials/base/explain` tutorial.
+
+*********************************
+XAI algorithms for classification
+*********************************
+
+.. image:: ../../../../utils/images/xai_cls.jpg
+  :width: 600
+  :align: center
+  :alt: this image shows the comparison of XAI classification algorithms
+
+
+For classification networks these algorithms are used to generate saliency maps:
+
+- **Activation Map​** - this is the most basic and naive approach. It takes the outputs of the model's feature extractor (backbone) and averages it in channel dimension. The results highly rely on the backbone and ignore neck and head computations. Basically, it gives a relatively good and fast result.
+
+- `Eigen-Cam <https://arxiv.org/abs/2008.00299​>`_ uses Principal Component Analysis (PCA).  It returns the first principal component of the feature extractor output, which most of the time corresponds to the dominant object. The results highly rely on the backbone as well and ignore neck and head computations.
+
+- `Recipro-CAM​ <https://arxiv.org/pdf/2209.14074>`_ uses Class Activation Mapping (CAM) to weigh the activation map for each class, so it can generate different saliency per class. Recipro-CAM is a fast gradient-free Reciprocal CAM method. The method involves spatially masking the extracted feature maps to exploit the correlation between activation maps and network predictions for target classes. 
+
+
+Below we show the comparison of described algorithms. ``Access to the model internal state`` means the necessity to modify the model's outputs and dump inner features.
+``Per-class explanation support`` means generation different saliency maps for different classes.
+
++-------------------------------------------+----------------+----------------+-------------------------------------------------------------------------+
+| Classification algorithm                  | Activation Map | Eigen-Cam      | Recipro-CAM                                                             |
++===========================================+================+================+=========================================================================+
+| Need access to model internal state       | Yes            | Yes            |  Yes                                                                    |
++-------------------------------------------+----------------+----------------+-------------------------------------------------------------------------+
+| Gradient-free                             | Yes            | Yes            |  Yes                                                                    |
++-------------------------------------------+----------------+----------------+-------------------------------------------------------------------------+
+| Single-shot                               | Yes            | Yes            |  No (re-infer neck + head H*W times, where HxW – feature map size)      |                                                          
++-------------------------------------------+----------------+----------------+-------------------------------------------------------------------------+
+| Per-class explanation support             | No             | No             | Yes                                                                     |
++-------------------------------------------+----------------+----------------+-------------------------------------------------------------------------+
+| Execution speed                           | Fast           | Fast           | Medium                                                                  |  
++-------------------------------------------+----------------+----------------+-------------------------------------------------------------------------+
+
+
+****************************
+XAI algorithms for detection
+****************************
+
+For detection networks these algorithms are used to generate saliency maps:
+
+- **Activation Map​** - the same approach as for classification networks, which uses the outputs from feature extractor. This is an algorithm is used to generate saliency maps for two-stage detectors.
+
+- **DetClassProbabilityMap** - this approach takes the raw classification head output and uses class probability maps to calculate regions of interest for each class. So, it creates different salience maps for each class. This algorithm is implemented for single-stage detectors only.
+
+.. image:: ../../../../utils/images/xai_det.jpg
+  :width: 600
+  :align: center
+  :alt: this image shows the detailed description of XAI detection algorithm
+
+
+The main limitation of this method is that, due to training loss design of most single-stage detectors, activation values drift towards the center of the object while propagating through the network.
+This prevents from getting clear explanation in the input image space using intermediate activations.
+
+Below we show the comparison of described algorithms. ``Access to the model internal state`` means the necessity to modify the model's outputs and dump inner features.
+``Per-class explanation support`` means generation different saliency maps for different classes. ``Per-box explanation support`` means generation standalone saliency maps for each detected prediction.
+
+
++-------------------------------------------+----------------------------+--------------------------------------------+
+| Detection algorithm                       | Activation Map             |  DetClassProbabilityMap                    |
++===========================================+============================+============================================+
+| Need access to model internal state       | Yes                        | Yes                                        |           
++-------------------------------------------+----------------------------+--------------------------------------------+
+| Gradient-free                             | Yes                        | Yes                                        |         
++-------------------------------------------+----------------------------+--------------------------------------------+
+| Single-shot                               | Yes                        | Yes                                        |         
++-------------------------------------------+----------------------------+--------------------------------------------+
+| Per-class explanation support             | No                         | Yes                                        |          
++-------------------------------------------+----------------------------+--------------------------------------------+
+| Per-box explanation support               | No                         | No                                         |          
++-------------------------------------------+----------------------------+--------------------------------------------+
+| Execution speed                           | Fast                       | Fast                                       |           
++-------------------------------------------+----------------------------+--------------------------------------------+
diff --git a/docs/source/guide/explanation/algorithms/object_detection/object_detection.rst b/docs/source/guide/explanation/algorithms/object_detection/object_detection.rst
@@ -95,20 +95,25 @@ To see which public backbones are available for the task, the following command
 
         $ otx find --backbone {torchvision, pytorchcv, mmcls, omz.mmcls}
 
-.. In the table below the test mAP on some academic datasets using our :ref:`supervised pipeline <od_supervised_pipeline>` is presented.
-.. The results were obtained on our templates without any changes.
-.. For hyperparameters, please, refer to the related template.
-.. We trained each model with a single Nvidia GeForce RTX3090.
+In the table below the test mAP on some academic datasets using our :ref:`supervised pipeline <od_supervised_pipeline>` is presented.
 
-.. +-----------+------------+-----------+-----------+
-.. | Model name| COCO       | PASCAL VOC| MinneApple|
-.. +===========+============+===========+===========+
-.. | YOLOX     | N/A        | N/A       | 24.5      |
-.. +-----------+------------+-----------+-----------+
-.. | SSD       | N/A        | N/A       | 31.2      |
-.. +-----------+------------+-----------+-----------+
-.. | ATSS      | N/A        | N/A       | 42.5      |
-.. +-----------+------------+-----------+-----------+
+For `COCO <https://cocodataset.org/#home>`__ dataset the accuracy of pretrained weights is shown. That means that weights are undertrained for COCO dataset and don't achieve the best result. 
+That is because the purpose of pretrained models is to learn basic features from a such large and diverse dataset as COCO and to use these weights to get good results for other custom datasets right from the start. 
+
+The results on `Pascal VOC <http://host.robots.ox.ac.uk/pascal/VOC/voc2012/>`_,  `BCCD <https://public.roboflow.com/object-detection/bccd/3>`_, `MinneApple <https://rsn.umn.edu/projects/orchard-monitoring/minneapple>`_ and `WGISD <https://github.com/thsant/wgisd>`_  were obtained on our templates without any changes.
+BCCD is an easy dataset with focused large objects, while MinneApple and WGISD have small objects that are hard to distinguish from the background.
+For hyperparameters, please, refer to the related template.
+We trained each model with a single Nvidia GeForce RTX3090.
+
++-----------+------------+-----------+-----------+-----------+-----------+
+| Model name| COCO       | PASCAL VOC| BCCD      | MinneApple| WGISD     |
++===========+============+===========+===========+===========+===========+
+| YOLOX     | 32.0       | 66.6      | 60.3      | 24.5      | 44.1      |
++-----------+------------+-----------+-----------+-----------+-----------+
+| SSD       | 13.5       | 50.0      | 54.2      | 31.2      | 45.9      |
++-----------+------------+-----------+-----------+-----------+-----------+
+| ATSS      | 32.5       | 68.7      | 61.5      | 42.5      | 57.5      |
++-----------+------------+-----------+-----------+-----------+-----------+
 
 
 
@@ -133,7 +138,7 @@ Overall, OpenVINO™ Training Extensions utilizes powerful techniques for improv
 
 Please, refer to the :doc:`tutorial <../../../tutorials/advanced/semi_sl>` how to train semi supervised learning.
 
-In the table below the mAP on toy data sample from `COCO <https://cocodataset.org/#home>`_ dataset using our pipeline is presented.
+In the table below the mAP on toy data sample from `COCO <https://cocodataset.org/#home>`__ dataset using our pipeline is presented.
 
 We sample 400 images that contain one of [person, car, bus] for labeled train images. And 4000 images for unlabeled images. For validation 100 images are selected from val2017.
 
diff --git a/docs/source/guide/get_started/quick_start_guide/cli_commands.rst b/docs/source/guide/get_started/quick_start_guide/cli_commands.rst
@@ -399,7 +399,7 @@ The command below will evaluate the trained model on the provided dataset:
 Explanation
 ***********
 
-``otx explain`` runs the explanation algorithm of a model on the specific dataset. It helps explain the model's decision-making process in a way that is easily understood by humans.
+``otx explain`` runs the explainable AI (XAI) algorithm of a model on the specific dataset. It helps explain the model's decision-making process in a way that is easily understood by humans.
 
 With the ``--help`` command, you can list additional information, such as its parameters common to all model templates:
 
diff --git a/docs/source/guide/tutorials/advanced/self_sl.rst b/docs/source/guide/tutorials/advanced/self_sl.rst
@@ -21,7 +21,7 @@ The process has been tested on the following configuration:
 Setup virtual environment
 *************************
 
-1. You can follow the installation process from a :doc:`quick start guide <../../../get_started/quick_start_guide/installation>` 
+1. You can follow the installation process from a :doc:`quick start guide <../../get_started/quick_start_guide/installation>` 
 to create a universal virtual environment for OpenVINO™ Training Extensions.
 
 2. Activate your virtual 
diff --git a/docs/source/guide/tutorials/advanced/semi_sl.rst b/docs/source/guide/tutorials/advanced/semi_sl.rst
@@ -44,7 +44,7 @@ This tutorial explains how to train a model in semi-supervised learning mode and
 Setup virtual environment
 *************************
 
-1. You can follow the installation process from a :doc:`quick start guide <../../../get_started/quick_start_guide/installation>` 
+1. You can follow the installation process from a :doc:`quick start guide <../../get_started/quick_start_guide/installation>` 
 to create a universal virtual environment for OpenVINO™ Training Extensions.
 
 2. Activate your virtual 
@@ -128,7 +128,7 @@ Enable via ``otx train``
 ***************************
 
 1. To enable semi-supervised learning directly via ``otx train``, we need to add arguments ``--unlabeled-data-roots`` and ``--algo_backend.train_type`` 
-which is one of template-specific parameters (details are provided in `quick start guide <../../get_started/quick_start_guide/cli_commands.html#training>`__.)
+which is one of template-specific parameters (details are provided in `quick start guide <../../get_started/quick_start_guide/cli_commands.html#training>`__).
 
 .. code-block::
 
diff --git a/docs/source/guide/tutorials/base/demo.rst b/docs/source/guide/tutorials/base/demo.rst
@@ -8,7 +8,7 @@ It allows you to apply the model on the custom data or the online footage from a
 
     This tutorial uses an object detection model for example, however for other tasks the functionality remains the same - you just need to replace the input dataset with your own.
 
-For visualization you use images from WGISD dataset from the :doc: `object detection tutorial <how_to_train/detection>`.
+For visualization you use images from WGISD dataset from the :doc:`object detection tutorial <how_to_train/detection>`.
 
 1. Activate the virtual environment 
 created in the previous step.
@@ -69,8 +69,8 @@ You can check a list of camera devices by running the command line below on Linu
 
 .. code-block::
 
-    sudo apt-get install v4l-utils
-    v4l2-ctl --list-devices
+    (demo) ...$ sudo apt-get install v4l-utils
+    (demo) ...$ v4l2-ctl --list-devices
 
 The output will look like this:
 
diff --git a/docs/source/guide/tutorials/base/explain.rst b/docs/source/guide/tutorials/base/explain.rst
@@ -26,9 +26,28 @@ at the path specified by ``--save-explanation-to``.
 
 .. code-block::
 
-    otx explain --explain-data-roots otx-workspace-DETECTION/splitted_dataset/val/ --save-explanation-to outputs/explanation --load-weights outputs/weights.pth
+    otx explain --explain-data-roots otx-workspace-DETECTION/splitted_dataset/val/ \
+                --save-explanation-to outputs/explanation \
+                --load-weights outputs/weights.pth
 
-3. As a result we will get a folder with a pair of generated 
+3. To specify the algorithm of saliency map creation for classification, 
+we can define the ``--explain-algorithm`` parameter.
+
+- ``activationmap`` - for activation map classification algorithm 
+- ``eigencam`` -  for Eigen-Cam classification algorithm
+- ``classwisesaliencymap`` -  for Recipro-CAM classification algorithm, this is a default method
+
+For detection task, we can choose between the following methods:
+
+- ``activationmap`` - for activation map detection algorithm
+- ``classwisesaliencymap`` - for DetClassProbabilityMap algorithm (works for single-stage detectors only)
+
+.. note::
+
+  Learn more about Explainable AI and its algorithms in :doc:`XAI explanation section <../../explanation/additional_features/xai>`
+
+
+4. As a result we will get a folder with a pair of generated 
 images for each image in ``--explain-data-roots``: 
 
 - saliency map - where red color means more attention of the model
diff --git a/docs/source/guide/tutorials/base/how_to_train/classification.rst b/docs/source/guide/tutorials/base/how_to_train/classification.rst
@@ -56,6 +56,7 @@ with the following command:
   cd ..
 
 |
+
 .. image:: ../../../../../utils/images/flowers_example.jpg
   :width: 600
 
@@ -120,7 +121,7 @@ Let's prepare an OpenVINO™ Training Extensions classification workspace runnin
 
   (otx) ...$ cd ./otx-workspace-CLASSIFICATION
 
-It will create **otx-workspace-CLASSIFICATION** with all necessery configs for MobileNet-V3-large-1x, prepared ``data.yaml`` to simplify CLI commands launch and splitted dataset named ``splitted_dataset``.
+It will create **otx-workspace-CLASSIFICATION** with all necessary configs for MobileNet-V3-large-1x, prepared ``data.yaml`` to simplify CLI commands launch and splitted dataset named ``splitted_dataset``.
 
 3. To start training you need to call ``otx train``
 command in our workspace:
diff --git a/docs/source/guide/tutorials/base/how_to_train/detection.rst b/docs/source/guide/tutorials/base/how_to_train/detection.rst
@@ -60,7 +60,7 @@ Dataset preparation
 
 .. code-block::
 
-  cd data
+  mkdir data ; cd data
   git clone https://github.com/thsant/wgisd.git
   cd wgisd
   git checkout 6910edc5ae3aae8c20062941b1641821f0c30127
@@ -107,7 +107,7 @@ We can do that by running these commands:
 .. code-block::
 
   # format images folder
-  mkdir data images
+  mv data images
 
   # format annotations folder
   mv coco_annotations annotations
@@ -116,6 +116,8 @@ We can do that by running these commands:
   mv annotations/train_bbox_instances.json annotations/instances_train.json
   mv annotations/test_bbox_instances.json annotations/instances_val.json
 
+  cd ../..
+
 *********
 Training
 *********
@@ -183,9 +185,9 @@ Let's prepare the object detection workspace running the following command:
 
 
 
-.. note::
+.. warning::
   
-  If you want to update your current workspace by running ``otx build`` with other parameters, it's better to delete the original workplace before that to prevent mistakes.
+  If you want to rebuild your current workspace by running ``otx build`` with other parameters, it's better to delete the original workplace before that to prevent mistakes.
 
 Check ``otx-workspace-DETECTION/data.yaml`` to ensure, which data subsets will be used for training and validation, and update it if necessary.
 
diff --git a/docs/source/guide/tutorials/index.rst b/docs/source/guide/tutorials/index.rst
@@ -6,6 +6,7 @@ This section reveals how to use ``CLI``, both base and advanced features.
 It provides the end-to-end solution from installation to model deployment and demo visualization on specific example for each of the supported tasks.
 
 .. toctree::
+   :titlesonly:
    :maxdepth: 3
 
    base/index
diff --git a/docs/utils/images/xai_cls.jpg b/docs/utils/images/xai_cls.jpg
diff --git a/docs/utils/images/xai_det.jpg b/docs/utils/images/xai_det.jpg
diff --git a/docs/utils/images/xai_example.jpg b/docs/utils/images/xai_example.jpg
diff --git a/otx/algorithms/action/configs/classification/movinet/template.yaml b/otx/algorithms/action/configs/classification/movinet/template.yaml
@@ -1,5 +1,5 @@
 # Description.
-model_template_id: Custom_Action_Classificaiton_MoViNet
+model_template_id: Custom_Action_Classification_MoViNet
 name: MoViNet
 task_type: ACTION_CLASSIFICATION
 task_family: VISION
diff --git a/otx/algorithms/action/configs/classification/x3d/template.yaml b/otx/algorithms/action/configs/classification/x3d/template.yaml
@@ -1,5 +1,5 @@
 # Description.
-model_template_id: Custom_Action_Classificaiton_X3D
+model_template_id: Custom_Action_Classification_X3D
 name: X3D
 task_type: ACTION_CLASSIFICATION
 task_family: VISION
diff --git a/otx/cli/manager/config_manager.py b/otx/cli/manager/config_manager.py
@@ -32,7 +32,7 @@
     "INSTANCE_SEGMENTATION": "Custom_Counting_Instance_Segmentation_MaskRCNN_ResNet50",
     "ROTATED_DETECTION": "Custom_Rotated_Detection_via_Instance_Segmentation_MaskRCNN_ResNet50",
     "SEGMENTATION": "Custom_Semantic_Segmentation_Lite-HRNet-18-mod2_OCR",
-    "ACTION_CLASSIFICATION": "Custom_Action_Classificaiton_X3D",
+    "ACTION_CLASSIFICATION": "Custom_Action_Classification_X3D",
     "ACTION_DETECTION": "Custom_Action_Detection_X3D_FAST_RCNN",
     "ANOMALY_CLASSIFICATION": "ote_anomaly_classification_padim",
     "ANOMALY_DETECTION": "ote_anomaly_detection_padim",