Skip to content

Commit 93c9058

Browse files
committed
Add MedASR medical speech recognition notebook
This notebook demonstrates converting Google's MedASR model to OpenVINO with FP16 and INT8 quantization for efficient medical speech recognition. Features: - HuggingFace authentication with notebook_login for gated model access - Model conversion using torch.export and ov.convert_model - INT8 quantization with NNCF using real audio calibration data - Comprehensive accuracy validation (97.98% token-level accuracy) - Performance benchmarking on CPU and GPU - Model compression: 402 MB -> 102 MB (3.9x reduction) The notebook includes complete workflow from model loading to deployment, with support for 10-second audio chunks (static shape [1, 998, 128]).
1 parent 25f4d2e commit 93c9058

File tree

2 files changed

+1002
-0
lines changed

2 files changed

+1002
-0
lines changed
Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,54 @@
1+
# MedASR Medical Speech Recognition with OpenVINO
2+
3+
This notebook demonstrates converting Google's MedASR (Medical Automatic Speech Recognition) model to OpenVINO format with FP16 and INT8 quantization for efficient medical speech-to-text transcription.
4+
5+
## Overview
6+
7+
MedASR is a specialized speech recognition model optimized for medical terminology. This tutorial shows how to:
8+
9+
- Load the MedASR model from HuggingFace
10+
- Convert it to OpenVINO IR format for optimal inference performance
11+
- Apply INT8 quantization using NNCF for model compression
12+
- Compare accuracy and performance across PyTorch, FP16, and INT8 versions
13+
14+
## Key Features
15+
16+
- **Model Compression**: 3.9x size reduction (402 MB → 102 MB) with INT8 quantization
17+
- **High Accuracy**: 97.98% token-level accuracy maintained after INT8 quantization
18+
- **Medical Terminology**: Optimized for accurate medical speech recognition
19+
20+
## Tutorial Contents
21+
22+
1. **Installation** - Install required packages (OpenVINO, NNCF, Transformers, etc.)
23+
2. **Load Model** - Load Google's MedASR model from HuggingFace
24+
3. **Prepare Audio Data** - Download and preprocess test audio (optimized for 10s chunks)
25+
4. **PyTorch Inference** - Establish baseline accuracy with original model
26+
5. **Convert to OpenVINO FP16** - Convert using torch.export and ov.convert_model
27+
6. **INT8 Quantization** - Apply NNCF quantization with real audio calibration
28+
7. **Accuracy Comparison** - Validate quantization quality across all versions
29+
8. **Performance Benchmarking** - Measure inference speed on CPU and GPU
30+
31+
## Results
32+
33+
- **Model Size**: 402 MB (FP16) → 102 MB (INT8) = **3.9x compression**
34+
- **Accuracy**: 97.98% token match between INT8 and PyTorch
35+
- **Model Shape**: Static [1, 998, 128] optimized for 10-second audio chunks
36+
37+
## Installation
38+
39+
```bash
40+
pip install -q "openvino>=2024.4.0" "nncf>=2.13.0" "torch>=2.1" "transformers>=5.4.0" "librosa" "soundfile" "huggingface_hub"
41+
```
42+
43+
## Important Notes
44+
45+
⚠️ **Gated Model Access**: The MedASR model is gated on HuggingFace. You must:
46+
1. Request access at https://huggingface.co/google/medasr
47+
2. Authenticate with your HuggingFace token before running the notebook
48+
49+
## Use Cases
50+
51+
- Medical transcription systems
52+
- Clinical documentation automation
53+
- Healthcare voice assistants
54+
- Medical education and training platforms

0 commit comments

Comments
 (0)