|
| 1 | +# MedASR Medical Speech Recognition with OpenVINO |
| 2 | + |
| 3 | +This notebook demonstrates converting Google's MedASR (Medical Automatic Speech Recognition) model to OpenVINO format with FP16 and INT8 quantization for efficient medical speech-to-text transcription. |
| 4 | + |
| 5 | +## Overview |
| 6 | + |
| 7 | +MedASR is a specialized speech recognition model optimized for medical terminology. This tutorial shows how to: |
| 8 | + |
| 9 | +- Load the MedASR model from HuggingFace |
| 10 | +- Convert it to OpenVINO IR format for optimal inference performance |
| 11 | +- Apply INT8 quantization using NNCF for model compression |
| 12 | +- Compare accuracy and performance across PyTorch, FP16, and INT8 versions |
| 13 | + |
| 14 | +## Key Features |
| 15 | + |
| 16 | +- **Model Compression**: 3.9x size reduction (402 MB → 102 MB) with INT8 quantization |
| 17 | +- **High Accuracy**: 97.98% token-level accuracy maintained after INT8 quantization |
| 18 | +- **Medical Terminology**: Optimized for accurate medical speech recognition |
| 19 | + |
| 20 | +## Tutorial Contents |
| 21 | + |
| 22 | +1. **Installation** - Install required packages (OpenVINO, NNCF, Transformers, etc.) |
| 23 | +2. **Load Model** - Load Google's MedASR model from HuggingFace |
| 24 | +3. **Prepare Audio Data** - Download and preprocess test audio (optimized for 10s chunks) |
| 25 | +4. **PyTorch Inference** - Establish baseline accuracy with original model |
| 26 | +5. **Convert to OpenVINO FP16** - Convert using torch.export and ov.convert_model |
| 27 | +6. **INT8 Quantization** - Apply NNCF quantization with real audio calibration |
| 28 | +7. **Accuracy Comparison** - Validate quantization quality across all versions |
| 29 | +8. **Performance Benchmarking** - Measure inference speed on CPU and GPU |
| 30 | + |
| 31 | +## Results |
| 32 | + |
| 33 | +- **Model Size**: 402 MB (FP16) → 102 MB (INT8) = **3.9x compression** |
| 34 | +- **Accuracy**: 97.98% token match between INT8 and PyTorch |
| 35 | +- **Model Shape**: Static [1, 998, 128] optimized for 10-second audio chunks |
| 36 | + |
| 37 | +## Installation |
| 38 | + |
| 39 | +```bash |
| 40 | +pip install -q "openvino>=2024.4.0" "nncf>=2.13.0" "torch>=2.1" "transformers>=5.4.0" "librosa" "soundfile" "huggingface_hub" |
| 41 | +``` |
| 42 | + |
| 43 | +## Important Notes |
| 44 | + |
| 45 | +⚠️ **Gated Model Access**: The MedASR model is gated on HuggingFace. You must: |
| 46 | +1. Request access at https://huggingface.co/google/medasr |
| 47 | +2. Authenticate with your HuggingFace token before running the notebook |
| 48 | + |
| 49 | +## Use Cases |
| 50 | + |
| 51 | +- Medical transcription systems |
| 52 | +- Clinical documentation automation |
| 53 | +- Healthcare voice assistants |
| 54 | +- Medical education and training platforms |
0 commit comments