[poc] pytorch model converter tool (#421)

mgumowsk · web-flow · commit 06d1bfc52afd · 2025-11-04T11:36:50.000Z
* tools/model_converter

* fix bandit

* checks

* fix

* style

* semgrep

* sem
diff --git a/tools/model_converter/README.md b/tools/model_converter/README.md
@@ -0,0 +1,133 @@
+# Model Converter Tool
+
+A command-line utility to download PyTorch models and convert them to OpenVINO format.
+
+## Overview
+
+This tool reads a JSON configuration file containing model specifications, downloads PyTorch weights from URLs, loads the models, and exports them to OpenVINO Intermediate Representation (IR) format.
+
+## Features
+
+- **Automatic Download**: Downloads model weights from HTTP/HTTPS URLs with caching support
+- **Dynamic Model Loading**: Dynamically imports and instantiates model classes from Python paths
+- **Metadata Embedding**: Embeds custom metadata into OpenVINO models
+- **Input/Output Naming**: Configurable input and output tensor names
+- **Batch Processing**: Process multiple models from a single configuration file
+- **Selective Conversion**: Convert specific models using the `--model` flag
+
+## Installation
+
+### Prerequisites
+
+```bash
+# Required packages
+uv pip install torch torchvision openvino
+
+```
+
+## Usage
+
+### Basic Usage
+
+```bash
+uv run python model_converter.py config.json -o ./output_models
+```
+
+### Command-Line Options
+
+```text
+positional arguments:
+  config                Path to JSON configuration file
+
+options:
+  -h, --help            Show help message and exit
+  -o OUTPUT, --output OUTPUT
+                        Output directory for converted models (default: ./converted_models)
+  -c CACHE, --cache CACHE
+                        Cache directory for downloaded weights (default: ~/.cache/torch/hub/checkpoints)
+  --model MODEL         Process only the specified model (by model_short_name)
+  --list                List all models in the configuration file and exit
+  -v, --verbose         Enable verbose logging
+```
+
+### Examples
+
+**List all models in configuration:**
+
+```bash
+uv run python model_converter.py example_config.json --list
+```
+
+**Convert all models:**
+
+```bash
+uv run python model_converter.py example_config.json -o ./converted_models
+```
+
+**Convert a specific model:**
+
+```bash
+uv run python model_converter.py example_config.json -o ./converted_models --model resnet50
+```
+
+**Use custom cache directory:**
+
+```bash
+uv run python model_converter.py example_config.json -o ./output -c ./my_cache
+```
+
+**Enable verbose logging:**
+
+```bash
+uv run python model_converter.py example_config.json -o ./output -v
+```
+
+## Configuration File Format
+
+The configuration file is a JSON file with the following structure:
+
+```json
+{
+  "models": [
+    {
+      "model_short_name": "resnet50",
+      "model_class_name": "torchvision.models.resnet.resnet50",
+      "model_full_name": "ResNet-50",
+      "description": "ResNet-50 image classification model",
+      "weights_url": "https://download.pytorch.org/models/resnet50-0676ba61.pth",
+      "input_shape": [1, 3, 224, 224],
+      "input_names": ["images"],
+      "output_names": ["output"],
+      "model_params": null,
+      "model_type": "Classification"
+    }
+  ]
+}
+```
+
+**Important**: The `model_type` field enables automatic model detection when using [Intel's model_api](https://github.com/openvinotoolkit/model_api). When specified, this metadata is embedded in the OpenVINO IR, allowing `Model.create_model()` to automatically select the correct model wrapper class.
+
+Common `model_type` values:
+
+- `"Classification"` - Image classification models
+- `"DetectionModel"` - Object detection models
+- `"YOLOX"` - YOLOX detection models
+- `"SegmentationModel"` - Segmentation models
+
+### Configuration Fields
+
+#### Required Fields
+
+- **`model_short_name`** (string): Short identifier for the model (used for output filename)
+- **`model_class_name`** (string): Full Python path to the model class (e.g., `torchvision.models.resnet.resnet50`)
+- **`weights_url`** (string): URL to download the PyTorch weights (.pth file)
+
+#### Optional Fields
+
+- **`model_full_name`** (string): Full descriptive name of the model
+- **`description`** (string): Description of the model
+- **`input_shape`** (array of integers): Input tensor shape (default: `[1, 3, 224, 224]`)
+- **`input_names`** (array of strings): Names for input tensors (default: `["input"]`)
+- **`output_names`** (array of strings): Names for output tensors (default: auto-generated)
+- **`model_params`** (object): Parameters to pass to model constructor (default: `null`)
+- **`model_type`** (string): Model type for model_api auto-detection (e.g., `"Classification"`, `"DetectionModel"`, `"YOLOX"`, etc.)
diff --git a/tools/model_converter/config.json b/tools/model_converter/config.json
@@ -0,0 +1,102 @@
+{
+  "models": [
+    {
+      "model_short_name": "mobilenet_v3_small",
+      "model_class_name": "torchvision.models.mobilenetv3.mobilenet_v3_small",
+      "model_full_name": "MobileNetV3-Small",
+      "description": "MobileNetV3 Small - Efficient convolutional neural network for mobile and embedded vision applications",
+      "docs": "https://docs.pytorch.org/vision/main/models/generated/torchvision.models.mobilenet_v3_small.html#torchvision.models.mobilenet_v3_small",
+      "weights_url": "https://download.pytorch.org/models/mobilenet_v3_small-047dcff4.pth",
+      "input_shape": [1, 3, 224, 224],
+      "input_names": ["image"],
+      "output_names": ["output1"],
+      "model_params": null,
+      "model_type": "Classification",
+      "reverse_input_channels": false,
+      "mean_values": "123.675 116.28 103.53",
+      "scale_values": "58.395 57.12 57.375",
+      "labels": "IMAGENET1K_V1"
+    },
+    {
+      "model_short_name": "efficientnet_b0",
+      "model_class_name": "torchvision.models.efficientnet.efficientnet_b0",
+      "model_full_name": "EfficientNet-B0",
+      "description": "EfficientNet-B0 - Efficient convolutional neural network with compound scaling",
+      "docs": "https://docs.pytorch.org/vision/main/models/generated/torchvision.models.efficientnet_b0.html#torchvision.models.efficientnet_b0",
+      "weights_url": "https://download.pytorch.org/models/efficientnet_b0_rwightman-3dd342df.pth",
+      "input_shape": [1, 3, 224, 224],
+      "input_names": ["image"],
+      "output_names": ["logits"],
+      "model_params": null,
+      "model_type": "Classification",
+      "reverse_input_channels": true,
+      "mean_values": "123.675 116.28 103.53",
+      "scale_values": "58.395 57.12 57.375",
+      "labels": "IMAGENET1K_V1"
+    },
+    {
+      "model_short_name": "resnet18",
+      "model_class_name": "torchvision.models.resnet.resnet18",
+      "model_full_name": "ResNet-18",
+      "description": "ResNet-18 - 18-layer residual learning network for image classification",
+      "weights_url": "https://download.pytorch.org/models/resnet18-f37072fd.pth",
+      "input_shape": [1, 3, 224, 224],
+      "input_names": ["image"],
+      "output_names": ["output"],
+      "model_params": null,
+      "model_type": "Classification",
+      "reverse_input_channels": true,
+      "mean_values": "123.675 116.28 103.53",
+      "scale_values": "58.395 57.12 57.375",
+      "labels": "IMAGENET1K_V1"
+    },
+    {
+      "model_short_name": "resnet50",
+      "model_class_name": "torchvision.models.resnet.resnet50",
+      "model_full_name": "ResNet-50",
+      "description": "ResNet-50 - 50-layer residual learning network for image classification",
+      "weights_url": "https://download.pytorch.org/models/resnet50-0676ba61.pth",
+      "input_shape": [1, 3, 224, 224],
+      "input_names": ["image"],
+      "output_names": ["output"],
+      "model_params": null,
+      "model_type": "Classification",
+      "reverse_input_channels": true,
+      "mean_values": "123.675 116.28 103.53",
+      "scale_values": "58.395 57.12 57.375",
+      "labels": "IMAGENET1K_V1"
+    },
+    {
+      "model_short_name": "squeezenet1_0",
+      "model_class_name": "torchvision.models.squeezenet.squeezenet1_0",
+      "model_full_name": "SqueezeNet 1.0",
+      "description": "SqueezeNet 1.0 - Small CNN with AlexNet-level accuracy and 50x fewer parameters",
+      "weights_url": "https://download.pytorch.org/models/squeezenet1_0-b66bff10.pth",
+      "input_shape": [1, 3, 224, 224],
+      "input_names": ["image"],
+      "output_names": ["output"],
+      "model_params": null,
+      "model_type": "Classification",
+      "reverse_input_channels": true,
+      "mean_values": "123.675 116.28 103.53",
+      "scale_values": "58.395 57.12 57.375",
+      "labels": "IMAGENET1K_V1"
+    },
+    {
+      "model_short_name": "vgg16",
+      "model_class_name": "torchvision.models.vgg.vgg16",
+      "model_full_name": "VGG-16",
+      "description": "VGG-16 - 16-layer deep convolutional network",
+      "weights_url": "https://download.pytorch.org/models/vgg16-397923af.pth",
+      "input_shape": [1, 3, 224, 224],
+      "input_names": ["image"],
+      "output_names": ["output"],
+      "model_params": null,
+      "model_type": "Classification",
+      "reverse_input_channels": true,
+      "mean_values": "123.675 116.28 103.53",
+      "scale_values": "58.395 57.12 57.375",
+      "labels": "IMAGENET1K_V1"
+    }
+  ]
+}
diff --git a/tools/model_converter/model_converter.py b/tools/model_converter/model_converter.py