Skip to content

Commit 72fc455

Browse files
Galina Zalesskayayunchueunwoosh
authored
Add explanation for XAI & minor doc fixes (#1923)
* [CI] Updated daily workflow (#1904) Updated daily workflow - remove if statement to allow running on any branch by manually * [FIX] re-bugfix: ATSS head loss (#1907) re bugfix * Fix typos * Explanation of Explanation * Add images & typo fixes * Fixes from comments * Add accuracy for OD explanation * Tutorial update * Add accuracy for BCCD and WGISD * Fix --------- Co-authored-by: Yunchu Lee <[email protected]> Co-authored-by: Eunwoo Shin <[email protected]>
1 parent 0e26106 commit 72fc455

File tree

17 files changed

+155
-31
lines changed

17 files changed

+155
-31
lines changed

docs/source/guide/explanation/additional_features/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,3 +9,4 @@ Additional Features
99
models_optimization
1010
hpo
1111
auto_configuration
12+
xai
Lines changed: 95 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,95 @@
1+
Explainable AI (XAI)
2+
====================
3+
4+
**Explainable AI (XAI)** is a field of research that aims to make machine learning models more transparent and interpretable to humans.
5+
The goal is to help users understand how and why AI systems make decisions and provide insight into their inner workings. It allows us to detect, analyze, and prevent common mistakes, for example, when the model uses irrelevant features to make a prediction.
6+
XAI can help to build trust in AI, make sure that the model is safe for development and increase its adoption in various domains.
7+
8+
Most XAI methods generate **saliency maps** as a result. Saliency map is a visual representation, suitable for human comprehension, that highlights the most important parts of the image from the model point of view.
9+
It looks like a heatmap, where warm-colored areas represent the areas with main focus.
10+
11+
12+
.. figure:: ../../../../utils/images/xai_example.jpg
13+
:width: 600
14+
:alt: this image shows the result of XAI algorithm
15+
16+
These images are taken from `D-RISE paper <https://arxiv.org/abs/2006.03204>`_.
17+
18+
19+
We can generate saliency maps for a certain model that was trained in OpenVINO™ Training Extensions, using ``otx explain`` command line. Learn more about its usage in :doc:`../../tutorials/base/explain` tutorial.
20+
21+
*********************************
22+
XAI algorithms for classification
23+
*********************************
24+
25+
.. image:: ../../../../utils/images/xai_cls.jpg
26+
:width: 600
27+
:align: center
28+
:alt: this image shows the comparison of XAI classification algorithms
29+
30+
31+
For classification networks these algorithms are used to generate saliency maps:
32+
33+
- **Activation Map​** - this is the most basic and naive approach. It takes the outputs of the model's feature extractor (backbone) and averages it in channel dimension. The results highly rely on the backbone and ignore neck and head computations. Basically, it gives a relatively good and fast result.
34+
35+
- `Eigen-Cam <https://arxiv.org/abs/2008.00299​>`_ uses Principal Component Analysis (PCA). It returns the first principal component of the feature extractor output, which most of the time corresponds to the dominant object. The results highly rely on the backbone as well and ignore neck and head computations.
36+
37+
- `Recipro-CAM​ <https://arxiv.org/pdf/2209.14074>`_ uses Class Activation Mapping (CAM) to weigh the activation map for each class, so it can generate different saliency per class. Recipro-CAM is a fast gradient-free Reciprocal CAM method. The method involves spatially masking the extracted feature maps to exploit the correlation between activation maps and network predictions for target classes.
38+
39+
40+
Below we show the comparison of described algorithms. ``Access to the model internal state`` means the necessity to modify the model's outputs and dump inner features.
41+
``Per-class explanation support`` means generation different saliency maps for different classes.
42+
43+
+-------------------------------------------+----------------+----------------+-------------------------------------------------------------------------+
44+
| Classification algorithm | Activation Map | Eigen-Cam | Recipro-CAM |
45+
+===========================================+================+================+=========================================================================+
46+
| Need access to model internal state | Yes | Yes | Yes |
47+
+-------------------------------------------+----------------+----------------+-------------------------------------------------------------------------+
48+
| Gradient-free | Yes | Yes | Yes |
49+
+-------------------------------------------+----------------+----------------+-------------------------------------------------------------------------+
50+
| Single-shot | Yes | Yes | No (re-infer neck + head H*W times, where HxW – feature map size) |
51+
+-------------------------------------------+----------------+----------------+-------------------------------------------------------------------------+
52+
| Per-class explanation support | No | No | Yes |
53+
+-------------------------------------------+----------------+----------------+-------------------------------------------------------------------------+
54+
| Execution speed | Fast | Fast | Medium |
55+
+-------------------------------------------+----------------+----------------+-------------------------------------------------------------------------+
56+
57+
58+
****************************
59+
XAI algorithms for detection
60+
****************************
61+
62+
For detection networks these algorithms are used to generate saliency maps:
63+
64+
- **Activation Map​** - the same approach as for classification networks, which uses the outputs from feature extractor. This is an algorithm is used to generate saliency maps for two-stage detectors.
65+
66+
- **DetClassProbabilityMap** - this approach takes the raw classification head output and uses class probability maps to calculate regions of interest for each class. So, it creates different salience maps for each class. This algorithm is implemented for single-stage detectors only.
67+
68+
.. image:: ../../../../utils/images/xai_det.jpg
69+
:width: 600
70+
:align: center
71+
:alt: this image shows the detailed description of XAI detection algorithm
72+
73+
74+
The main limitation of this method is that, due to training loss design of most single-stage detectors, activation values drift towards the center of the object while propagating through the network.
75+
This prevents from getting clear explanation in the input image space using intermediate activations.
76+
77+
Below we show the comparison of described algorithms. ``Access to the model internal state`` means the necessity to modify the model's outputs and dump inner features.
78+
``Per-class explanation support`` means generation different saliency maps for different classes. ``Per-box explanation support`` means generation standalone saliency maps for each detected prediction.
79+
80+
81+
+-------------------------------------------+----------------------------+--------------------------------------------+
82+
| Detection algorithm | Activation Map | DetClassProbabilityMap |
83+
+===========================================+============================+============================================+
84+
| Need access to model internal state | Yes | Yes |
85+
+-------------------------------------------+----------------------------+--------------------------------------------+
86+
| Gradient-free | Yes | Yes |
87+
+-------------------------------------------+----------------------------+--------------------------------------------+
88+
| Single-shot | Yes | Yes |
89+
+-------------------------------------------+----------------------------+--------------------------------------------+
90+
| Per-class explanation support | No | Yes |
91+
+-------------------------------------------+----------------------------+--------------------------------------------+
92+
| Per-box explanation support | No | No |
93+
+-------------------------------------------+----------------------------+--------------------------------------------+
94+
| Execution speed | Fast | Fast |
95+
+-------------------------------------------+----------------------------+--------------------------------------------+

docs/source/guide/explanation/algorithms/object_detection/object_detection.rst

Lines changed: 19 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -95,20 +95,25 @@ To see which public backbones are available for the task, the following command
9595
9696
$ otx find --backbone {torchvision, pytorchcv, mmcls, omz.mmcls}
9797
98-
.. In the table below the test mAP on some academic datasets using our :ref:`supervised pipeline <od_supervised_pipeline>` is presented.
99-
.. The results were obtained on our templates without any changes.
100-
.. For hyperparameters, please, refer to the related template.
101-
.. We trained each model with a single Nvidia GeForce RTX3090.
98+
In the table below the test mAP on some academic datasets using our :ref:`supervised pipeline <od_supervised_pipeline>` is presented.
10299

103-
.. +-----------+------------+-----------+-----------+
104-
.. | Model name| COCO | PASCAL VOC| MinneApple|
105-
.. +===========+============+===========+===========+
106-
.. | YOLOX | N/A | N/A | 24.5 |
107-
.. +-----------+------------+-----------+-----------+
108-
.. | SSD | N/A | N/A | 31.2 |
109-
.. +-----------+------------+-----------+-----------+
110-
.. | ATSS | N/A | N/A | 42.5 |
111-
.. +-----------+------------+-----------+-----------+
100+
For `COCO <https://cocodataset.org/#home>`__ dataset the accuracy of pretrained weights is shown. That means that weights are undertrained for COCO dataset and don't achieve the best result.
101+
That is because the purpose of pretrained models is to learn basic features from a such large and diverse dataset as COCO and to use these weights to get good results for other custom datasets right from the start.
102+
103+
The results on `Pascal VOC <http://host.robots.ox.ac.uk/pascal/VOC/voc2012/>`_, `BCCD <https://public.roboflow.com/object-detection/bccd/3>`_, `MinneApple <https://rsn.umn.edu/projects/orchard-monitoring/minneapple>`_ and `WGISD <https://github.com/thsant/wgisd>`_ were obtained on our templates without any changes.
104+
BCCD is an easy dataset with focused large objects, while MinneApple and WGISD have small objects that are hard to distinguish from the background.
105+
For hyperparameters, please, refer to the related template.
106+
We trained each model with a single Nvidia GeForce RTX3090.
107+
108+
+-----------+------------+-----------+-----------+-----------+-----------+
109+
| Model name| COCO | PASCAL VOC| BCCD | MinneApple| WGISD |
110+
+===========+============+===========+===========+===========+===========+
111+
| YOLOX | 32.0 | 66.6 | 60.3 | 24.5 | 44.1 |
112+
+-----------+------------+-----------+-----------+-----------+-----------+
113+
| SSD | 13.5 | 50.0 | 54.2 | 31.2 | 45.9 |
114+
+-----------+------------+-----------+-----------+-----------+-----------+
115+
| ATSS | 32.5 | 68.7 | 61.5 | 42.5 | 57.5 |
116+
+-----------+------------+-----------+-----------+-----------+-----------+
112117

113118

114119

@@ -133,7 +138,7 @@ Overall, OpenVINO™ Training Extensions utilizes powerful techniques for improv
133138

134139
Please, refer to the :doc:`tutorial <../../../tutorials/advanced/semi_sl>` how to train semi supervised learning.
135140

136-
In the table below the mAP on toy data sample from `COCO <https://cocodataset.org/#home>`_ dataset using our pipeline is presented.
141+
In the table below the mAP on toy data sample from `COCO <https://cocodataset.org/#home>`__ dataset using our pipeline is presented.
137142

138143
We sample 400 images that contain one of [person, car, bus] for labeled train images. And 4000 images for unlabeled images. For validation 100 images are selected from val2017.
139144

docs/source/guide/get_started/quick_start_guide/cli_commands.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -399,7 +399,7 @@ The command below will evaluate the trained model on the provided dataset:
399399
Explanation
400400
***********
401401

402-
``otx explain`` runs the explanation algorithm of a model on the specific dataset. It helps explain the model's decision-making process in a way that is easily understood by humans.
402+
``otx explain`` runs the explainable AI (XAI) algorithm of a model on the specific dataset. It helps explain the model's decision-making process in a way that is easily understood by humans.
403403

404404
With the ``--help`` command, you can list additional information, such as its parameters common to all model templates:
405405

docs/source/guide/tutorials/advanced/self_sl.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ The process has been tested on the following configuration:
2121
Setup virtual environment
2222
*************************
2323

24-
1. You can follow the installation process from a :doc:`quick start guide <../../../get_started/quick_start_guide/installation>`
24+
1. You can follow the installation process from a :doc:`quick start guide <../../get_started/quick_start_guide/installation>`
2525
to create a universal virtual environment for OpenVINO™ Training Extensions.
2626

2727
2. Activate your virtual

docs/source/guide/tutorials/advanced/semi_sl.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,7 @@ This tutorial explains how to train a model in semi-supervised learning mode and
4444
Setup virtual environment
4545
*************************
4646

47-
1. You can follow the installation process from a :doc:`quick start guide <../../../get_started/quick_start_guide/installation>`
47+
1. You can follow the installation process from a :doc:`quick start guide <../../get_started/quick_start_guide/installation>`
4848
to create a universal virtual environment for OpenVINO™ Training Extensions.
4949

5050
2. Activate your virtual
@@ -128,7 +128,7 @@ Enable via ``otx train``
128128
***************************
129129

130130
1. To enable semi-supervised learning directly via ``otx train``, we need to add arguments ``--unlabeled-data-roots`` and ``--algo_backend.train_type``
131-
which is one of template-specific parameters (details are provided in `quick start guide <../../get_started/quick_start_guide/cli_commands.html#training>`__.)
131+
which is one of template-specific parameters (details are provided in `quick start guide <../../get_started/quick_start_guide/cli_commands.html#training>`__).
132132

133133
.. code-block::
134134

docs/source/guide/tutorials/base/demo.rst

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ It allows you to apply the model on the custom data or the online footage from a
88

99
This tutorial uses an object detection model for example, however for other tasks the functionality remains the same - you just need to replace the input dataset with your own.
1010

11-
For visualization you use images from WGISD dataset from the :doc: `object detection tutorial <how_to_train/detection>`.
11+
For visualization you use images from WGISD dataset from the :doc:`object detection tutorial <how_to_train/detection>`.
1212

1313
1. Activate the virtual environment
1414
created in the previous step.
@@ -69,8 +69,8 @@ You can check a list of camera devices by running the command line below on Linu
6969

7070
.. code-block::
7171
72-
sudo apt-get install v4l-utils
73-
v4l2-ctl --list-devices
72+
(demo) ...$ sudo apt-get install v4l-utils
73+
(demo) ...$ v4l2-ctl --list-devices
7474
7575
The output will look like this:
7676

docs/source/guide/tutorials/base/explain.rst

Lines changed: 21 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -26,9 +26,28 @@ at the path specified by ``--save-explanation-to``.
2626

2727
.. code-block::
2828
29-
otx explain --explain-data-roots otx-workspace-DETECTION/splitted_dataset/val/ --save-explanation-to outputs/explanation --load-weights outputs/weights.pth
29+
otx explain --explain-data-roots otx-workspace-DETECTION/splitted_dataset/val/ \
30+
--save-explanation-to outputs/explanation \
31+
--load-weights outputs/weights.pth
3032
31-
3. As a result we will get a folder with a pair of generated
33+
3. To specify the algorithm of saliency map creation for classification,
34+
we can define the ``--explain-algorithm`` parameter.
35+
36+
- ``activationmap`` - for activation map classification algorithm
37+
- ``eigencam`` - for Eigen-Cam classification algorithm
38+
- ``classwisesaliencymap`` - for Recipro-CAM classification algorithm, this is a default method
39+
40+
For detection task, we can choose between the following methods:
41+
42+
- ``activationmap`` - for activation map detection algorithm
43+
- ``classwisesaliencymap`` - for DetClassProbabilityMap algorithm (works for single-stage detectors only)
44+
45+
.. note::
46+
47+
Learn more about Explainable AI and its algorithms in :doc:`XAI explanation section <../../explanation/additional_features/xai>`
48+
49+
50+
4. As a result we will get a folder with a pair of generated
3251
images for each image in ``--explain-data-roots``:
3352

3453
- saliency map - where red color means more attention of the model

docs/source/guide/tutorials/base/how_to_train/classification.rst

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -56,6 +56,7 @@ with the following command:
5656
cd ..
5757
5858
|
59+
5960
.. image:: ../../../../../utils/images/flowers_example.jpg
6061
:width: 600
6162

@@ -120,7 +121,7 @@ Let's prepare an OpenVINO™ Training Extensions classification workspace runnin
120121
121122
(otx) ...$ cd ./otx-workspace-CLASSIFICATION
122123
123-
It will create **otx-workspace-CLASSIFICATION** with all necessery configs for MobileNet-V3-large-1x, prepared ``data.yaml`` to simplify CLI commands launch and splitted dataset named ``splitted_dataset``.
124+
It will create **otx-workspace-CLASSIFICATION** with all necessary configs for MobileNet-V3-large-1x, prepared ``data.yaml`` to simplify CLI commands launch and splitted dataset named ``splitted_dataset``.
124125

125126
3. To start training you need to call ``otx train``
126127
command in our workspace:

docs/source/guide/tutorials/base/how_to_train/detection.rst

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -60,7 +60,7 @@ Dataset preparation
6060

6161
.. code-block::
6262
63-
cd data
63+
mkdir data ; cd data
6464
git clone https://github.com/thsant/wgisd.git
6565
cd wgisd
6666
git checkout 6910edc5ae3aae8c20062941b1641821f0c30127
@@ -107,7 +107,7 @@ We can do that by running these commands:
107107
.. code-block::
108108
109109
# format images folder
110-
mkdir data images
110+
mv data images
111111
112112
# format annotations folder
113113
mv coco_annotations annotations
@@ -116,6 +116,8 @@ We can do that by running these commands:
116116
mv annotations/train_bbox_instances.json annotations/instances_train.json
117117
mv annotations/test_bbox_instances.json annotations/instances_val.json
118118
119+
cd ../..
120+
119121
*********
120122
Training
121123
*********
@@ -183,9 +185,9 @@ Let's prepare the object detection workspace running the following command:
183185
184186
185187
186-
.. note::
188+
.. warning::
187189

188-
If you want to update your current workspace by running ``otx build`` with other parameters, it's better to delete the original workplace before that to prevent mistakes.
190+
If you want to rebuild your current workspace by running ``otx build`` with other parameters, it's better to delete the original workplace before that to prevent mistakes.
189191

190192
Check ``otx-workspace-DETECTION/data.yaml`` to ensure, which data subsets will be used for training and validation, and update it if necessary.
191193

0 commit comments

Comments
 (0)