Set "Auto" as default input size mode (#2515)

Songki Choi · web-flow · commit 007896762b49 · 2023-09-25T14:07:30.000+09:00
* Refine input size adaptation

* Enable auto input size to integration tests

* Set 'Auto' as default mode of input size config

* Fix rotated detection config

* Update README.md

* Update CHANGELOG.md

---------

Signed-off-by: Songki Choi &lt;songki.choi@intel.com&gt;
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -25,6 +25,7 @@ All notable changes to this project will be documented in this file.
 - Adapt timeout value of initialization for distributed training (<https://github.com/openvinotoolkit/training_extensions/pull/2422>)
 - Optimize data loading by merging load & resize operations w/ caching support for cls/det/iseg/sseg (<https://github.com/openvinotoolkit/training_extensions/pull/2438>, <https://github.com/openvinotoolkit/training_extensions/pull/2453>, <https://github.com/openvinotoolkit/training_extensions/pull/2460>)
 - Support torch==2.0.1 (<https://github.com/openvinotoolkit/training_extensions/pull/2465>)
+- Set "Auto" as default input size mode (<https://github.com/openvinotoolkit/training_extensions/pull/2515>)
 
 ### Bug fixes
 
diff --git a/README.md b/README.md
@@ -33,15 +33,16 @@
 ## Introduction
 
 OpenVINO™ Training Extensions is a low-code transfer learning framework for Computer Vision.
-The CLI commands of the framework allows users to train, infer, optimize and deploy models easily and quickly even with low expertise in the deep learning field. OpenVINO™ Training Extensions offers diverse combinations of model architectures, learning methods, and task types based on [PyTorch](https://pytorch.org) and [OpenVINO™
-toolkit](https://software.intel.com/en-us/openvino-toolkit).
+The CLI commands of the framework allows users to train, infer, optimize and deploy models easily and quickly even with low expertise in the deep learning field.
+OpenVINO™ Training Extensions offers diverse combinations of model architectures, learning methods, and task types based on [PyTorch](https://pytorch.org) and [OpenVINO™ toolkit](https://software.intel.com/en-us/openvino-toolkit).
 
 OpenVINO™ Training Extensions provides a "model template" for every supported task type, which consolidates necessary information to build a model.
 Model templates are validated on various datasets and serve one-stop shop for obtaining the best models in general.
-If you are an experienced user, you can configure your own model based on [torchvision](https://pytorch.org/vision/stable/index.html), [pytorchcv](https://github.com/osmr/imgclsmob), [mmcv](https://github.com/open-mmlab/mmcv) and [OpenVINO Model Zoo (OMZ)](https://github.com/openvinotoolkit/open_model_zoo).
+If you are an experienced user, you can configure your own model based on [torchvision](https://pytorch.org/vision/stable/index.html), [mmcv](https://github.com/open-mmlab/mmcv), [timm](https://github.com/huggingface/pytorch-image-models) and [OpenVINO Model Zoo (OMZ)](https://github.com/openvinotoolkit/open_model_zoo).
 
-Furthermore, OpenVINO™ Training Extensions provides automatic configuration of task types and hyperparameters.
-The framework will identify the most suitable model template based on your dataset, and choose the best hyperparameter configuration. The development team is continuously extending functionalities to make training as simple as possible so that single CLI command can obtain accurate, efficient and robust models ready to be integrated into your project.
+Furthermore, OpenVINO™ Training Extensions provides automatic configuration for ease of use.
+The framework will analyze your dataset and identify the most suitable model and figure out the best input size setting and other hyper-parameters.
+The development team is continuously extending this [Auto-configuration](https://openvinotoolkit.github.io/training_extensions/latest/guide/explanation/additional_features/auto_configuration.html) functionalities to make training as simple as possible so that single CLI command can obtain accurate, efficient and robust models ready to be integrated into your project.
 
 ### Key Features
 
@@ -60,13 +61,13 @@ OpenVINO™ Training Extensions supports the [following learning methods](https:
 - **Semi-supervised learning**
 - **Self-supervised learning**
 
-OpenVINO™ Training Extensions will provide the following features in coming releases:
+OpenVINO™ Training Extensions provides the following usability features:
 
+- [Auto-configuration](https://openvinotoolkit.github.io/training_extensions/latest/guide/explanation/additional_features/auto_configuration.html). OpenVINO™ Training Extensions analyzes provided dataset and selects the proper task and model with appropriate input size to provide the best accuracy/speed trade-off. It will also make a random auto-split of your dataset if there is no validation set provided.
+- [Datumaro](https://openvinotoolkit.github.io/datumaro/stable/index.html) data frontend: OpenVINO™ Training Extensions supports the most common academic field dataset formats for each task. We are constantly working to extend supported formats to give more freedom of datasets format choice.
 - **Distributed training** to accelerate the training process when you have multiple GPUs
-- **Half-precision training** to save GPUs memory and use larger batch sizes
+- **Mixed-precision training** to save GPUs memory and use larger batch sizes
 - Integrated, efficient [hyper-parameter optimization module (HPO)](https://openvinotoolkit.github.io/training_extensions/latest/guide/explanation/additional_features/hpo.html). Through dataset proxy and built-in hyper-parameter optimizer, you can get much faster hyper-parameter optimization compared to other off-the-shelf tools. The hyperparameter optimization is dynamically scheduled based on your resource budget.
-- OpenVINO™ Training Extensions uses [Datumaro](https://openvinotoolkit.github.io/datumaro/stable/index.html) as the backend to hadle datasets. Thanks to that, OpenVINO™ Training Extensions supports the most common academic field dataset formats for each task. We constantly working to extend supported formats to give more freedom of datasets format choice.
-- [Auto-configuration functionality](https://openvinotoolkit.github.io/training_extensions/latest/guide/explanation/additional_features/auto_configuration.html). OpenVINO™ Training Extensions analyzes provided dataset and selects the proper task and model template to provide the best accuracy/speed trade-off. It will also make a random auto-split of your dataset if there is no validation set provided.
 
 ---
 
diff --git a/src/otx/algorithms/classification/configs/configuration.yaml b/src/otx/algorithms/classification/configs/configuration.yaml
@@ -277,11 +277,11 @@ learning_parameters:
     warning: null
   input_size:
     affects_outcome_of: INFERENCE
-    default_value: Default
+    default_value: Auto
     description:
       The input size of the given model could be configured to one of the predefined resolutions.
       Reduced training and inference time could be expected by using smaller input size.
-      Defaults to per-model default resolution.
+      Defaults to Auto, in which input size is automatically determined based on dataset statistics.
     editable: true
     enum_name: InputSizePreset
     header: Configure model input size.
diff --git a/src/otx/algorithms/common/adapters/mmcv/utils/config_utils.py b/src/otx/algorithms/common/adapters/mmcv/utils/config_utils.py
@@ -641,6 +641,7 @@ class InputSizeManager:
 
     MIN_RECOGNIZABLE_OBJECT_SIZE = 32  # Minimum object size recognizable by NNs: typically 16 ~ 32
     # meaning NxN input pixels being downscaled to 1x1 on feature map
+    MIN_DETECTION_INPUT_SIZE = 256  # Minimum input size for object detection
 
     def __init__(
         self,
@@ -960,6 +961,12 @@ def adapt_input_size_to_dataset(
         if min_object_size is not None and min_object_size > 0:
             image_size = round(image_size * self.MIN_RECOGNIZABLE_OBJECT_SIZE / min_object_size)
             logger.info(f"-> Based on typical small object size {min_object_size}: {image_size}")
+            if image_size > max_image_size:
+                image_size = max_image_size
+                logger.info(f"-> Restrict to max image size: {image_size}")
+            if image_size < self.MIN_DETECTION_INPUT_SIZE:
+                image_size = self.MIN_DETECTION_INPUT_SIZE
+                logger.info(f"-> Based on minimum object detection input size: {image_size}")
 
         input_size = (round(image_size), round(image_size))
 
diff --git a/src/otx/algorithms/detection/configs/detection/configuration.yaml b/src/otx/algorithms/detection/configs/detection/configuration.yaml
@@ -245,11 +245,11 @@ learning_parameters:
     warning: null
   input_size:
     affects_outcome_of: INFERENCE
-    default_value: Default
+    default_value: Auto
     description:
       The input size of the given model could be configured to one of the predefined resolutions.
       Reduced training and inference time could be expected by using smaller input size.
-      Defaults to per-model default resolution.
+      Defaults to Auto, in which input size is automatically determined based on dataset statistics.
     editable: true
     enum_name: InputSizePreset
     header: Configure model input size.
diff --git a/src/otx/algorithms/detection/configs/instance_segmentation/configuration.yaml b/src/otx/algorithms/detection/configs/instance_segmentation/configuration.yaml
@@ -245,11 +245,11 @@ learning_parameters:
     warning: null
   input_size:
     affects_outcome_of: INFERENCE
-    default_value: Default
+    default_value: Auto
     description:
       The input size of the given model could be configured to one of the predefined resolutions.
       Reduced training and inference time could be expected by using smaller input size.
-      Defaults to per-model default resolution.
+      Defaults to Auto, in which input size is automatically determined based on dataset statistics.
     editable: true
     enum_name: InputSizePreset
     header: Configure model input size.
diff --git a/src/otx/algorithms/detection/configs/rotated_detection/configuration.yaml b/src/otx/algorithms/detection/configs/rotated_detection/configuration.yaml
@@ -245,11 +245,11 @@ learning_parameters:
     warning: null
   input_size:
     affects_outcome_of: INFERENCE
-    default_value: Default
+    default_value: Auto
     description:
       The input size of the given model could be configured to one of the predefined resolutions.
       Reduced training and inference time could be expected by using smaller input size.
-      Defaults to per-model default resolution.
+      Defaults to Auto, in which input size is automatically determined based on dataset statistics.
     editable: true
     enum_name: InputSizePreset
     header: Configure model input size.
diff --git a/src/otx/algorithms/segmentation/configs/configuration.yaml b/src/otx/algorithms/segmentation/configs/configuration.yaml
@@ -232,11 +232,11 @@ learning_parameters:
     warning: null
   input_size:
     affects_outcome_of: INFERENCE
-    default_value: Default
+    default_value: Auto
     description:
       The input size of the given model could be configured to one of the predefined resolutions.
       Reduced training and inference time could be expected by using smaller input size.
-      Defaults to per-model default resolution.
+      Defaults to Auto, in which input size is automatically determined based on dataset statistics.
     editable: true
     enum_name: InputSizePreset
     header: Configure model input size.
diff --git a/tests/integration/cli/classification/test_classification.py b/tests/integration/cli/classification/test_classification.py
@@ -52,7 +52,13 @@
 args_selfsl = {
     "--train-data-roots": "tests/assets/classification_dataset",
     "--train-type": "Selfsupervised",
-    "train_params": ["params", "--learning_parameters.num_iters", "1", "--learning_parameters.batch_size", "4"],
+    "train_params": [
+        "params",
+        "--learning_parameters.num_iters",
+        "1",
+        "--learning_parameters.batch_size",
+        "4",
+    ],
 }
 
 # Training params for resume, num_iters*2
diff --git a/tests/integration/cli/detection/test_detection.py b/tests/integration/cli/detection/test_detection.py
@@ -36,7 +36,13 @@
     "--val-data-roots": "tests/assets/car_tree_bug",
     "--test-data-roots": "tests/assets/car_tree_bug",
     "--input": "tests/assets/car_tree_bug/images/train",
-    "train_params": ["params", "--learning_parameters.num_iters", "1", "--learning_parameters.batch_size", "4"],
+    "train_params": [
+        "params",
+        "--learning_parameters.num_iters",
+        "1",
+        "--learning_parameters.batch_size",
+        "4",
+    ],
 }
 
 args_semisl = {
@@ -45,7 +51,13 @@
     "--test-data-roots": "tests/assets/car_tree_bug",
     "--unlabeled-data-roots": "tests/assets/car_tree_bug",
     "--input": "tests/assets/car_tree_bug/images/train",
-    "train_params": ["params", "--learning_parameters.num_iters", "1", "--learning_parameters.batch_size", "4"],
+    "train_params": [
+        "params",
+        "--learning_parameters.num_iters",
+        "1",
+        "--learning_parameters.batch_size",
+        "4",
+    ],
 }
 
 # Training params for resume, num_iters*2
@@ -118,7 +130,15 @@ def test_otx_resume(self, template, tmp_dir_path):
         _resume_params = resume_params.copy()
         # FIXME: remove this block once Issue#2504 resolved
         if "DINO" in template.name:
-            _args["train_params"].extend(["--learning_parameters.input_size", "Default"])
+            _args["train_params"] = [
+                "params",
+                "--learning_parameters.num_iters",
+                "1",
+                "--learning_parameters.batch_size",
+                "4",
+                "--learning_parameters.input_size",
+                "Default",
+            ]
             _resume_params.extend(["--learning_parameters.input_size", "Default"])
         otx_resume_testing(template, tmp_dir_path, otx_dir, _args)
         template_work_dir = get_template_dir(template, tmp_dir_path)
diff --git a/tests/integration/cli/instance_segmentation/test_instance_segmentation.py b/tests/integration/cli/instance_segmentation/test_instance_segmentation.py
@@ -33,7 +33,13 @@
     "--val-data-roots": "tests/assets/car_tree_bug",
     "--test-data-roots": "tests/assets/car_tree_bug",
     "--input": "tests/assets/car_tree_bug/images/train",
-    "train_params": ["params", "--learning_parameters.num_iters", "1", "--learning_parameters.batch_size", "2"],
+    "train_params": [
+        "params",
+        "--learning_parameters.num_iters",
+        "1",
+        "--learning_parameters.batch_size",
+        "2",
+    ],
 }
 
 args_semisl = {
@@ -42,7 +48,13 @@
     "--test-data-roots": "tests/assets/car_tree_bug",
     "--unlabeled-data-roots": "tests/assets/car_tree_bug",
     "--input": "tests/assets/car_tree_bug/images/train",
-    "train_params": ["params", "--learning_parameters.num_iters", "1", "--learning_parameters.batch_size", "2"],
+    "train_params": [
+        "params",
+        "--learning_parameters.num_iters",
+        "1",
+        "--learning_parameters.batch_size",
+        "2",
+    ],
 }
 
 # Training params for resume, num_iters*2
@@ -51,7 +63,7 @@
     "--learning_parameters.num_iters",
     "2",
     "--learning_parameters.batch_size",
-    "4",
+    "2",
 ]
 
 otx_dir = os.getcwd()
diff --git a/tests/integration/cli/semantic_segmentation/test_segmentation.py b/tests/integration/cli/semantic_segmentation/test_segmentation.py
@@ -46,13 +46,29 @@
     "--val-data-roots": "tests/assets/common_semantic_segmentation_dataset/val",
     "--test-data-roots": "tests/assets/common_semantic_segmentation_dataset/val",
     "--unlabeled-data-roots": "tests/assets/common_semantic_segmentation_dataset/train",
-    "train_params": ["params", "--learning_parameters.num_iters", "1", "--learning_parameters.batch_size", "4"],
+    "train_params": [
+        "params",
+        "--learning_parameters.learning_rate_warmup_iters",
+        "1",
+        "--learning_parameters.num_iters",
+        "1",
+        "--learning_parameters.batch_size",
+        "4",
+    ],
 }
 
 args_selfsl = {
     "--train-data-roots": "tests/assets/common_semantic_segmentation_dataset/train/images",
     "--input": "tests/assets/segmentation/custom/images/training",
-    "train_params": ["params", "--learning_parameters.num_iters", "1", "--learning_parameters.batch_size", "4"],
+    "train_params": [
+        "params",
+        "--learning_parameters.learning_rate_warmup_iters",
+        "1",
+        "--learning_parameters.num_iters",
+        "1",
+        "--learning_parameters.batch_size",
+        "4",
+    ],
 }
 
 # Training params for resume, num_iters*2
diff --git a/tests/unit/algorithms/common/adapters/mmcv/test_configurer.py b/tests/unit/algorithms/common/adapters/mmcv/test_configurer.py
@@ -14,7 +14,7 @@ def test_get_input_size_to_fit_dataset(self, mocker):
         assert input_size is None
 
         cfg = Config({"data": {"train": {"otx_dataset": True}}})
-        input_size_manager = InputSizeManager(cfg, base_input_size=128)
+        input_size_manager = InputSizeManager(cfg, base_input_size=512)
         mock_stat = mocker.patch.object(configurer, "compute_robust_dataset_statistics")
 
         mock_stat.return_value = {}
@@ -37,10 +37,28 @@ def test_get_input_size_to_fit_dataset(self, mocker):
         assert input_size == (128, 128)
 
         mock_stat.return_value = dict(
-            image=dict(robust_max=150),
+            image=dict(robust_max=256),
+            annotation=dict(size_of_shape=dict(robust_min=64)),
+        )
+        input_size = configurer.BaseConfigurer.adapt_input_size_to_dataset(
+            cfg, input_size_manager, use_annotations=True
+        )
+        assert input_size == (256, 256)
+
+        mock_stat.return_value = dict(
+            image=dict(robust_max=1024),
+            annotation=dict(size_of_shape=dict(robust_min=64)),
+        )
+        input_size = configurer.BaseConfigurer.adapt_input_size_to_dataset(
+            cfg, input_size_manager, use_annotations=True
+        )
+        assert input_size == (512, 512)
+
+        mock_stat.return_value = dict(
+            image=dict(robust_max=2045),
             annotation=dict(size_of_shape=dict(robust_min=64)),
         )
         input_size = configurer.BaseConfigurer.adapt_input_size_to_dataset(
             cfg, input_size_manager, use_annotations=True
         )
-        assert input_size == (64, 64)
+        assert input_size == (512, 512)
diff --git a/tests/unit/algorithms/common/adapters/mmcv/utils/test_config_utils.py b/tests/unit/algorithms/common/adapters/mmcv/utils/test_config_utils.py
@@ -455,39 +455,39 @@ def test_select_closest_size(self):
         assert manager.select_closest_size(input_size, preset_sizes) == (128, 128)
 
     def test_adapt_input_size_to_dataset(self):
-        base_input_size = (128, 128)
+        base_input_size = (512, 512)
         manager = InputSizeManager({}, base_input_size)
         input_size = manager.adapt_input_size_to_dataset(
             max_image_size=-1,
         )
         assert input_size == base_input_size
 
         input_size = manager.adapt_input_size_to_dataset(
-            max_image_size=200,
-        )  # 200 -> 128
+            max_image_size=1024,
+        )  # 1024 -> 512
         assert input_size == base_input_size
 
         input_size = manager.adapt_input_size_to_dataset(
-            max_image_size=200,
+            max_image_size=1024,
             downscale_only=False,
-        )  # 200 -> 224
-        assert input_size == (224, 224)
+        )  # 512 -> 1024
+        assert input_size == (1024, 1024)
 
         input_size = manager.adapt_input_size_to_dataset(
-            max_image_size=200,
+            max_image_size=1024,
             min_object_size=128,
-        )  # 50 -> 64
-        assert input_size == (64, 64)
+        )  # 1024 -> 256
+        assert input_size == (256, 256)
 
         input_size = manager.adapt_input_size_to_dataset(
-            max_image_size=200,
+            max_image_size=1024,
             min_object_size=16,
-        )  # 400 -> 128
+        )  # 1024 -> 2048 -> 512
         assert input_size == base_input_size
 
         input_size = manager.adapt_input_size_to_dataset(
-            max_image_size=200,
+            max_image_size=1024,
             min_object_size=16,
             downscale_only=False,
-        )  # 400 -> 384
-        assert input_size == (384, 384)
+        )  # 1024 -> 2048 -> 1024
+        assert input_size == (1024, 1024)