Skip to content

Latest commit

Β 

History

History
510 lines (386 loc) Β· 13.4 KB

File metadata and controls

510 lines (386 loc) Β· 13.4 KB

πŸ“± Mobile Model Optimization Guide

Comprehensive guide to optimizing YOLOv8 models for mobile and edge deployment.

🎯 Overview

The optimization script provides multiple techniques to reduce model size and improve inference speed for mobile devices:

  • INT8 Quantization: Reduce model size by ~75% with minimal accuracy loss
  • FP16 Precision: Reduce model size by ~50%
  • TensorFlow Lite: Optimized for Android/iOS
  • CoreML: Native iOS framework
  • ONNX: Cross-platform deployment
  • NCNN: Optimized for ARM processors

πŸ“Š Expected Results

Format Size Reduction Accuracy Loss Best For
TFLite INT8 ~75% 1-3% Android
CoreML INT8 ~75% 1-3% iOS
ONNX INT8 ~75% 1-3% Cross-platform
NCNN ~70% <1% ARM devices
PyTorch FP16 ~50% <0.5% Python/C++

Example: YOLOv8n Model

  • Original: ~6 MB (FP32)
  • TFLite INT8: ~1.5 MB (75% reduction)
  • ONNX INT8: ~1.6 MB (73% reduction)
  • CoreML INT8: ~1.5 MB (75% reduction)
  • PyTorch FP16: ~3 MB (50% reduction)

πŸš€ Quick Start

Optimize for All Platforms

python models/yolov8/optimize_for_mobile.py \
  --model runs/train/best.pt \
  --optimize all \
  --output ./mobile_models

This will generate optimized models for all platforms and print a comparison report.

Platform-Specific Optimization

Android (TensorFlow Lite)

python models/yolov8/optimize_for_mobile.py \
  --model runs/train/best.pt \
  --optimize tflite \
  --output ./mobile_models

Output: best_int8.tflite (~1.5 MB)

Integration:

// Android Kotlin
import org.tensorflow.lite.Interpreter
import java.nio.MappedByteBuffer

val interpreter = Interpreter(loadModelFile("best_int8.tflite"))

iOS (CoreML)

python models/yolov8/optimize_for_mobile.py \
  --model runs/train/best.pt \
  --optimize coreml \
  --output ./mobile_models

Output: best_int8.mlpackage (~1.5 MB)

Integration:

// iOS Swift
import CoreML
import Vision

guard let model = try? VNCoreMLModel(for: best_int8().model) else {
    return
}
let request = VNCoreMLRequest(model: model)

Cross-Platform (ONNX)

python models/yolov8/optimize_for_mobile.py \
  --model runs/train/best.pt \
  --optimize onnx \
  --output ./mobile_models

Output: best_int8.onnx (~1.6 MB)

Integration:

import onnxruntime as ort

session = ort.InferenceSession("best_int8.onnx")
outputs = session.run(None, {"images": input_tensor})

ARM Devices (NCNN)

python models/yolov8/optimize_for_mobile.py \
  --model runs/train/best.pt \
  --optimize ncnn \
  --output ./mobile_models

Output: best_ncnn/ directory with .param and .bin files

Integration:

// C++
#include "net.h"

ncnn::Net net;
net.load_param("best_ncnn/model.param");
net.load_model("best_ncnn/model.bin");

πŸ”§ Optimization Techniques Explained

1. INT8 Quantization

Converts 32-bit floating point weights to 8-bit integers.

Benefits:

  • 75% size reduction
  • 2-4x faster inference on mobile CPUs
  • Lower memory usage

Trade-offs:

  • 1-3% accuracy loss (typically acceptable)
  • Requires post-training quantization

How it works:

FP32 weight: 0.123456789 (4 bytes)
↓ Quantization
INT8 weight: 123 (1 byte)
Scale factor: 0.001

2. FP16 (Half Precision)

Uses 16-bit floating point instead of 32-bit.

Benefits:

  • 50% size reduction
  • 1.5-2x faster on GPUs with FP16 support
  • Minimal accuracy loss (<0.5%)

Best for:

  • Devices with GPU acceleration
  • When accuracy is critical

3. Dynamic Shape Optimization

Removes dynamic shapes for faster inference.

Benefits:

  • Faster model loading
  • Better mobile CPU optimization
  • Reduced memory overhead

πŸ“± Platform-Specific Integration

Android Integration

1. Add Dependencies (build.gradle)

dependencies {
    implementation 'org.tensorflow:tensorflow-lite:2.14.0'
    implementation 'org.tensorflow:tensorflow-lite-gpu:2.14.0'
    implementation 'org.tensorflow:tensorflow-lite-support:0.4.4'
}

2. Load and Run Model

import org.tensorflow.lite.Interpreter
import org.tensorflow.lite.support.image.TensorImage
import java.nio.ByteBuffer

class UIElementDetector(context: Context) {
    private val interpreter: Interpreter

    init {
        val model = loadModelFile(context, "best_int8.tflite")
        interpreter = Interpreter(model, Interpreter.Options().apply {
            setNumThreads(4) // Use 4 CPU threads
        })
    }

    fun detect(bitmap: Bitmap): List<Detection> {
        val tensorImage = TensorImage.fromBitmap(bitmap)
        val output = Array(1) { Array(25200) { FloatArray(25) } }

        interpreter.run(tensorImage.buffer, output)
        return parseDetections(output)
    }
}

3. GPU Acceleration (Optional)

val options = Interpreter.Options().apply {
    addDelegate(GpuDelegate())
}
val interpreter = Interpreter(model, options)

iOS Integration

1. Add CoreML Model to Xcode

  • Drag best_int8.mlpackage into your Xcode project
  • Xcode automatically generates Swift interface

2. Run Inference

import CoreML
import Vision
import UIKit

class UIElementDetector {
    private var model: VNCoreMLModel?

    init() {
        guard let mlModel = try? best_int8(configuration: MLModelConfiguration()) else {
            fatalError("Failed to load model")
        }
        model = try? VNCoreMLModel(for: mlModel.model)
    }

    func detect(image: UIImage, completion: @escaping ([Detection]) -> Void) {
        guard let model = model else { return }

        let request = VNCoreMLRequest(model: model) { request, error in
            guard let results = request.results as? [VNRecognizedObjectObservation] else {
                return
            }
            let detections = self.parseResults(results)
            completion(detections)
        }

        let handler = VNImageRequestHandler(cgImage: image.cgImage!)
        try? handler.perform([request])
    }
}

3. Metal GPU Acceleration (Automatic)

CoreML automatically uses Metal GPU when available.

React Native Integration

Using ONNX Runtime

import { InferenceSession } from 'onnxruntime-react-native';

const session = await InferenceSession.create('./best_int8.onnx');

const detect = async (imageData) => {
  const feeds = { images: new Tensor('float32', imageData, [1, 3, 640, 640]) };
  const results = await session.run(feeds);
  return parseDetections(results);
};

πŸ” Benchmarking

Compare Model Performance

# Install dependencies
pip install onnxruntime opencv-python

# Run benchmark
python models/yolov8/benchmark_mobile.py \
  --original runs/train/best.pt \
  --optimized mobile_models/best_int8.onnx \
  --test_images ./test_images/ \
  --runs 100

Expected Performance (YOLOv8n)

Device Format Inference Time FPS
iPhone 13 Pro CoreML INT8 15ms 66
Pixel 6 TFLite INT8 18ms 55
Raspberry Pi 4 NCNN 45ms 22
Desktop CPU ONNX INT8 8ms 125

βš™οΈ Advanced Configuration

Custom Quantization

For better accuracy, provide calibration data:

from ultralytics import YOLO

model = YOLO('runs/train/best.pt')

# Export with calibration data
model.export(
    format='tflite',
    int8=True,
    data='./data/dataset.yaml',  # Use your dataset for calibration
    imgsz=640
)

Optimize for Specific Input Size

If you always use a specific input size, optimize for it:

python models/yolov8/optimize_for_mobile.py \
  --model runs/train/best.pt \
  --optimize onnx \
  --input_size 320  # Smaller = faster

Pruning (Further Size Reduction)

Remove redundant weights for even smaller models:

from ultralytics import YOLO

# Train with pruning
model = YOLO('yolov8n.pt')
model.train(
    data='./data/dataset.yaml',
    epochs=100,
    prune=0.3  # Remove 30% of weights
)

πŸ“Š Size Comparison Tool

Generate a visual comparison of all formats:

python models/yolov8/optimize_for_mobile.py \
  --model runs/train/best.pt \
  --optimize all

Output:

================================================================================
OPTIMIZATION RESULTS SUMMARY
================================================================================

Format               Size (MB)    Reduction    Best For
────────────────────────────────────────────────────────────────────────────
Original             6.00         -            Reference
TFLite INT8          1.50         ↓75.0%       Android, iOS
ONNX INT8            1.60         ↓73.3%       Cross-platform
CoreML INT8          1.55         ↓74.2%       iOS, macOS
NCNN                 1.80         ↓70.0%       Android, iOS (ARM)
PyTorch FP16         3.00         ↓50.0%       Python, C++

────────────────────────────────────────────────────────────────────────────
πŸ† Smallest model: TFLite INT8 (1.50 MB)
   Size reduction: 75.0%

🎯 Choosing the Right Format

Decision Tree

Need to deploy on mobile?
β”œβ”€β”€ Yes
β”‚   β”œβ”€β”€ Android only? β†’ TFLite INT8
β”‚   β”œβ”€β”€ iOS only? β†’ CoreML INT8
β”‚   β”œβ”€β”€ Both? β†’ TFLite INT8 + CoreML INT8
β”‚   └── React Native? β†’ ONNX INT8
β”‚
└── No
    β”œβ”€β”€ Edge device (ARM)? β†’ NCNN
    β”œβ”€β”€ Python inference? β†’ PyTorch FP16
    └── Cross-platform? β†’ ONNX INT8

Recommendations by Use Case

Use Case Recommended Format Reason
Android App TFLite INT8 Native framework, smallest size
iOS App CoreML INT8 Native framework, GPU acceleration
Cross-platform App ONNX INT8 Works everywhere, good size
Raspberry Pi NCNN or ONNX ARM optimized
Web Browser ONNX (ONNX.js) Browser compatible
Server Inference PyTorch FP16 Good balance

πŸ› Troubleshooting

CoreML Export: "RuntimeError: BlobWriter not loaded"

Problem: CoreML export fails with BlobWriter error.

Solution:

# Option 1: Update coremltools
pip install --upgrade coremltools

# Option 2: Use TFLite instead (works on iOS too!)
python models/yolov8/optimize_for_mobile.py --model runs/train/best.pt --optimize tflite

# Option 3: Use ONNX (works on iOS with ONNX Runtime)
python models/yolov8/optimize_for_mobile.py --model runs/train/best.pt --optimize onnx

Why this happens: CoreML export in YOLOv8 sometimes has compatibility issues with certain coremltools versions.

Good news: TFLite and ONNX work perfectly on iOS and give the same size reduction!

ONNX Export: "TypeError: quantize_dynamic() got unexpected keyword"

Problem: ONNX quantization fails with unexpected keyword error.

Solution: This is fixed in the script. The optimize_model parameter has been removed. Update your script if you see this error.

Model Size Still Too Large?

  1. Use a smaller base model: YOLOv8n instead of YOLOv8m
  2. Apply pruning: Remove 30-50% of weights during training
  3. Reduce input size: Use 320x320 instead of 640x640
  4. Distillation: Train a smaller model to mimic larger one

Accuracy Loss Too High?

  1. Use FP16 instead of INT8: Better accuracy, still 50% smaller
  2. Provide calibration data: Better quantization
  3. Post-training fine-tuning: Fine-tune after quantization
  4. QAT (Quantization-Aware Training): Train with quantization in mind

Slow Inference on Device?

  1. Enable GPU acceleration: Use GPU delegate (Android) or Metal (iOS)
  2. Reduce input size: Smaller images = faster inference
  3. Use multi-threading: Set num_threads parameter
  4. Optimize model architecture: Use YOLOv8n instead of YOLOv8m

πŸ“š Additional Resources

Documentation

Example Projects

  • examples/android_app/ - Android TFLite integration
  • examples/ios_app/ - iOS CoreML integration
  • examples/react_native/ - React Native ONNX integration

Benchmarking Tools

  • models/yolov8/benchmark_mobile.py - Performance testing
  • models/yolov8/accuracy_test.py - Accuracy comparison

πŸ”„ Update Requirements

Add to your requirements.txt:

# Model optimization dependencies
onnx>=1.14.0
onnxruntime>=1.16.0
onnxruntime-tools>=1.7.0
coremltools>=7.0  # For CoreML export (macOS only)
tensorflow>=2.14.0  # For TFLite export

Install:

pip install onnx onnxruntime onnxruntime-tools tensorflow
# macOS only for CoreML:
pip install coremltools

βœ… Validation

After optimization, validate your model:

# Test on sample images
python models/yolov8/validate_optimized.py \
  --original runs/train/best.pt \
  --optimized mobile_models/best_int8.onnx \
  --test_dir ./test_images/ \
  --compare_accuracy

This ensures your optimized model maintains acceptable accuracy.


Ready to deploy? Choose your platform and follow the integration guide above! πŸš€