[RELEASE][DOC] Fix wrong info in documentation (#1849)

kprokofi · web-flow · commit c5b41d9d96fd · 2023-03-06T09:58:33.000+09:00
* updated dataset formats info. Fix for multilabel classification

* revert file

* revert file. minor

* added warning to instance segmentation

* revert changes
diff --git a/docs/source/guide/explanation/algorithms/classification/multi_label_classification.rst b/docs/source/guide/explanation/algorithms/classification/multi_label_classification.rst
@@ -23,11 +23,15 @@ For supervised learning we use the following algorithms components:
 Dataset Format
 **************
 
-As it is a common practice to use object detection datasets in the academic area, we support the most popular object detection formats: `COCO <https://cocodataset.org/#format-data>`_.
-Specifically, these formats will be converted in our `internal representation <https://github.com/openvinotoolkit/training_extensions/tree/develop/data/datumaro_multilabel>`_ via the `Datumaro <https://github.com/openvinotoolkit/datumaro>`_ dataset handler.
+As it is a common practice to use object detection datasets in the academic area, we support the most popular object detection format: `COCO <https://cocodataset.org/#format-data>`_.
+Specifically, this format should be converted in our `internal representation <https://github.com/openvinotoolkit/training_extensions/tree/develop/data/datumaro_multilabel>`_ first. We provided a `script <https://github.com/openvinotoolkit/training_extensions/blob/develop/otx/algorithms/classification/utils/convert_coco_to_multilabel.py>` to help with conversion.
+To convert the COCO data format to our internal one, run this script in similar way:
+
+.. code-block::
+    python convert_coco_to_multilabel.py --ann_file_path <path to .json COCO annotations> --data_root_dir <path to images folder> --output <output path to save annotations>
 
 .. note::
-    Names of the annotations files and overall dataset structure should be the same as the original `COCO <https://cocodataset.org/#format-data>`_.
+    Names of the annotations files and overall dataset structure should be the same as the original `COCO <https://cocodataset.org/#format-data>`_. You need to convert train and validation sets separately.
 
     Please, refer to our :doc:`dedicated tutorial <../../../tutorials/base/how_to_train/classification>` for more information how to train, validate and optimize classification models.
 
diff --git a/docs/source/guide/explanation/algorithms/object_detection/object_detection.rst b/docs/source/guide/explanation/algorithms/object_detection/object_detection.rst
@@ -1,7 +1,7 @@
 Object Detection
 ================
 
-Object detection is a computer vision task where it's needed to locate objects, finding their bounding boxes coordinates together with defining class. 
+Object detection is a computer vision task where it's needed to locate objects, finding their bounding boxes coordinates together with defining class.
 The input is an image, and the output is a pair of coordinates for bouding box corners and a class number for each detected object.
 
 The common approach to building object detection architecture is to take a feature extractor (backbone), that can be inherited from the classification task.
@@ -22,25 +22,24 @@ For the supervised training we use the following algorithms components:
 
 - ``Additional training techniques``
     - ``Early stopping``: To add adaptability to the training pipeline and prevent overfitting. You can use early stopping like the below command.
-      
+
       .. code-block::
 
         $ otx train {TEMPLATE} ... \
                     params \
                     --learning_parameters.enable_early_stopping=True
 
-    - `Anchor clustering for SSD <https://arxiv.org/abs/2211.17170>`_: This model highly relies on predefined anchor boxes hyperparameter that impacts the size of objects, which can be detected. So before training, we collect object statistics within dataset, cluster them and modify anchor boxes sizes to fit the most for objects the model is going to detect. 
-    
+    - `Anchor clustering for SSD <https://arxiv.org/abs/2211.17170>`_: This model highly relies on predefined anchor boxes hyperparameter that impacts the size of objects, which can be detected. So before training, we collect object statistics within dataset, cluster them and modify anchor boxes sizes to fit the most for objects the model is going to detect.
+
     - ``Backbone pretraining``: we pretrained MobileNetV2 backbone on large `ImageNet21k <https://github.com/Alibaba-MIIL/ImageNet21K>`_ dataset to improve feature extractor and learn better and faster.
 
 
 **************
 Dataset Format
 **************
 
-At the current point we support `COCO <https://cocodataset.org/#format-data>`_, 
-`Pascal-VOC <https://openvinotoolkit.github.io/datumaro/docs/formats/pascal_voc/>`_ and
-`YOLO <https://openvinotoolkit.github.io/datumaro/docs/formats/yolo/>`_ dataset format.
+At the current point we support `COCO <https://cocodataset.org/#format-data>`_ and
+`Pascal-VOC <https://openvinotoolkit.github.io/datumaro/docs/formats/pascal_voc/>`_ dataset formats.
 Learn more about the formats by following the links above. Here is an example of expected format for COCO dataset:
 
 .. code::
@@ -82,7 +81,7 @@ We support the following ready-to-use model templates:
 | `Custom_Object_Detection_Gen3_ATSS <https://github.com/openvinotoolkit/training_extensions/blob/develop/otx/algorithms/detection/configs/detection/mobilenetv2_atss/template.yaml>`_  | ATSS    | 20.6                | 9.1             |
 +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------+---------------------+-----------------+
 
-`ATSS <https://arxiv.org/abs/1912.02424>`_ is a good medium-range model that works well and fast in most cases. 
+`ATSS <https://arxiv.org/abs/1912.02424>`_ is a good medium-range model that works well and fast in most cases.
 `SSD <https://arxiv.org/abs/1512.02325>`_ and `YOLOX <https://arxiv.org/abs/2107.08430>`_ are light models, that a perfect for the fastest inference on low-power hardware.
 YOLOX achieved the same accuracy as SSD, and even outperforms its inference on CPU 1.5 times, but requires 3 times more time for training due to `Mosaic augmentation <https://arxiv.org/pdf/2004.10934.pdf>`_, which is even more than for ATSS.
 So if you have resources for a long training, you can pick the YOLOX model.
@@ -132,14 +131,14 @@ Overall, OpenVINO™ Training Extensions utilizes powerful techniques for improv
 
 - ``Additional training techniques``: Other than that, we use several solutions that apply to supervised learning (No bias Decay, Augmentations, Early stopping, LR conditioning.).
 
-Please, refer to the :doc:`tutorial <../../../tutorials/advanced/semi_sl>` how to train semi supervised learning. 
+Please, refer to the :doc:`tutorial <../../../tutorials/advanced/semi_sl>` how to train semi supervised learning.
 
-In the table below the mAP on toy data sample from `COCO <https://cocodataset.org/#home>`_ dataset using our pipeline is presented. 
+In the table below the mAP on toy data sample from `COCO <https://cocodataset.org/#home>`_ dataset using our pipeline is presented.
 
 We sample 400 images that contain one of [person, car, bus] for labeled train images. And 4000 images for unlabeled images. For validation 100 images are selected from val2017.
 
 +---------+--------------------------------------------+
-| Dataset |            Sampled COCO dataset            |   
+| Dataset |            Sampled COCO dataset            |
 +=========+=====================+======================+
 |         |          SL         |       Semi-SL        |
 +---------+---------------------+----------------------+
diff --git a/docs/source/guide/explanation/algorithms/segmentation/instance_segmentation.rst b/docs/source/guide/explanation/algorithms/segmentation/instance_segmentation.rst
@@ -40,7 +40,7 @@ For the supervised training we use the following algorithms components:
 Dataset Format
 **************
 
-For the dataset handling inside OpenVINO™ Training Extensions, we use `Dataset Management Framework (Datumaro) <https://github.com/openvinotoolkit/datumaro>`_. For instance segmentation we support `COCO <https://cocodataset.org/#format-data>`_ and `Pascal-VOC <https://openvinotoolkit.github.io/datumaro/docs/formats/pascal_voc/>`_ dataset formats.
+For the dataset handling inside OpenVINO™ Training Extensions, we use `Dataset Management Framework (Datumaro) <https://github.com/openvinotoolkit/datumaro>`_. For instance segmentation we support `COCO <https://cocodataset.org/#format-data>`_ dataset format.
 If you have your dataset in those formats, then you can simply run using one line of code:
 
 .. code-block::
diff --git a/docs/source/guide/explanation/algorithms/segmentation/semantic_segmentation.rst b/docs/source/guide/explanation/algorithms/segmentation/semantic_segmentation.rst
@@ -42,7 +42,7 @@ Dataset Format
 
 For the dataset handling inside OpenVINO™ Training Extensions, we use `Dataset Management Framework (Datumaro) <https://github.com/openvinotoolkit/datumaro>`_.
 
-At this end we support `Pascal VOC <https://openvinotoolkit.github.io/datumaro/docs/formats/pascal_voc/>`_ and `Common Semantic Segmentation <https://openvinotoolkit.github.io/datumaro/docs/formats/common_semantic_segmentation/>`_ data formats.
+At this end we support `Common Semantic Segmentation <https://openvinotoolkit.github.io/datumaro/docs/formats/common_semantic_segmentation/>`_ data format.
 If you organized supported dataset format, starting training will be very simple. We just need to pass a path to the root folder and desired model template to start training:
 
 .. code-block::
@@ -90,7 +90,7 @@ Whereas the ``Lite-HRNet-s-mod2`` is the lightweight architecture for fast infer
 Semi-supervised Learning
 ************************
 
-To solve :ref:`Semi-supervised learning <semi_sl_explanation>` problem for the semantic segmentation we use the `Mean Teacher algorithm <https://arxiv.org/abs/1703.01780>`_. 
+To solve :ref:`Semi-supervised learning <semi_sl_explanation>` problem for the semantic segmentation we use the `Mean Teacher algorithm <https://arxiv.org/abs/1703.01780>`_.
 
 The basic idea of this approach is to use two models during training: a "student" model, which is the main model being trained, and a "teacher" model, which acts as a guide for the student model.
 The student model is updated based on the ground truth annotations (for the labeled data) and pseudo-labels (for the unlabeled data) which are the predictions of the teacher model.
diff --git a/docs/source/guide/tutorials/base/how_to_train/instance_segmentation.rst b/docs/source/guide/tutorials/base/how_to_train/instance_segmentation.rst
@@ -26,10 +26,10 @@ The process has been tested on the following configuration.
 Setup virtual environment
 *************************
 
-1. You can follow the installation process from a :doc:`quick start guide <../../../get_started/quick_start_guide/installation>` 
+1. You can follow the installation process from a :doc:`quick start guide <../../../get_started/quick_start_guide/installation>`
 to create a universal virtual environment for OpenVINO™ Training Extensions.
 
-2. Activate your virtual 
+2. Activate your virtual
 environment:
 
 .. code-block::
@@ -43,7 +43,7 @@ environment:
 Dataset preparation
 ***************************
 
-1. Let's use the simple toy dataset `Car, Tree, Bug dataset <https://github.com/openvinotoolkit/training_extensions/tree/develop/tests/assets/car_tree_bug>`_ 
+1. Let's use the simple toy dataset `Car, Tree, Bug dataset <https://github.com/openvinotoolkit/training_extensions/tree/develop/tests/assets/car_tree_bug>`_
 provided by OpenVINO™ Training Extensions.
 
 This dataset contains images of simple car, tree, bug with the annotation for instance segmentation.
@@ -73,7 +73,7 @@ we will need the following file structure:
   ...
 
 .. warning::
-  There may be features that don't work properly with the current toy dataset. We recommend that you proceed with a proper training and validation dataset, 
+  There may be features that don't work properly with the current toy dataset. We recommend that you proceed with a proper training and validation dataset,
   the tutorial and dataset here are for reference only.
 
   We will update this tutorial with larger public datasets soon.
@@ -102,7 +102,7 @@ The list of supported templates for instance segmentation is available with the
   | INSTANCE_SEGMENTATION | Custom_Counting_Instance_Segmentation_MaskRCNN_EfficientNetB2B | MaskRCNN-EfficientNetB2B | otx/algorithms/detection/configs/instance_segmentation/efficientnetb2b_maskrcnn/template.yaml |
   +-----------------------+----------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------------------------------+
 
-2. We need to create 
+2. We need to create
 OpenVINO™ Training Extensions workspace first.
 
 Let's prepare an OpenVINO™ Training Extensions instance segmentation workspace running the following command:
@@ -142,10 +142,11 @@ It will create **otx-workspace-INSTANCE_SEGMENTATION** with all necessary config
 
   For more information, see :doc:`quick start guide <../../../get_started/quick_start_guide/cli_commands>` or :ref:`detection example <detection_workspace>`.
 
-3. Next, we need to update 
-train/validation set configuration in ``data.yaml``. 
+.. warning::
+  Note, that we can't run CLI commands for instance segmentation via model name, since the same models are utilized for different algorithm and the behavior can be unpredictable.
+  Please, use the template path or template ID instead.
 
-To simplify the command line functions calling, we may create a ``data.yaml`` file with annotations info and pass it as a ``--data`` parameter. 
+To simplify the command line functions calling, we may create a ``data.yaml`` file with annotations info and pass it as a ``--data`` parameter.
 The content of the ``otx-workspace-INSTANCE_SEGMENTATION/data.yaml`` for dataset should have absolute paths and will be similar to that:
 
 .. note::
@@ -255,7 +256,7 @@ The output of ``./outputs/performance.json`` consists of a dict with target metr
 Export
 *********
 
-1. ``otx export`` exports a trained Pytorch `.pth` model to the 
+1. ``otx export`` exports a trained Pytorch `.pth` model to the
 OpenVINO™ Intermediate Representation (IR) format.
 
 It allows running the model on the Intel hardware much more efficient, especially on the CPU. Also, the resulting IR model is required to run POT optimization. IR model consists of 2 files: ``openvino.xml`` for weights and ``openvino.bin`` for architecture.
@@ -277,7 +278,7 @@ and save the exported model to the ``openvino_model`` folder.
   2023-02-21 22:38:21,894 | INFO : run task done.
   2023-02-21 22:38:21,940 | INFO : Exporting completed
 
-3. We can check the accuracy of the IR model and the consistency between 
+3. We can check the accuracy of the IR model and the consistency between
 the exported model and the PyTorch model.
 
 You can use ``otx train`` directly without ``otx build``. It will be required to add ``--train-data-roots`` and ``--val-data-roots`` in the command line: