Add MedASR medical speech recognition notebook#3416
Add MedASR medical speech recognition notebook#3416padatta wants to merge 2 commits intoopenvinotoolkit:latestfrom
Conversation
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
|
Have a look into e.g. "https://github.com/padatta/openvino_notebooks/blob/6736abef648d34b96f51b7ebcf5b4801ae24fe35/notebooks/llm-chatbot/llm-chatbot.ipynb" - could you add a note regarding HuggingFace access token? This could help improve usability.
|
6736abe to
ed695b7
Compare
This notebook demonstrates converting Google's MedASR model to OpenVINO with FP16 and INT8 quantization for efficient medical speech recognition. Features: - HuggingFace authentication with notebook_login for gated model access - Model conversion using torch.export and ov.convert_model - INT8 quantization with NNCF using real audio calibration data - Comprehensive accuracy validation (97.98% token-level accuracy) - Performance benchmarking on CPU and GPU - Model compression: 402 MB -> 102 MB (3.9x reduction) The notebook includes complete workflow from model loading to deployment, with support for 10-second audio chunks (static shape [1, 998, 128]).
ed695b7 to
93c9058
Compare
"I have pushed the changes. The notebook now includes a HuggingFace login section for gated model access, updated instructions, and improved usability." |
|
The Intel GNA (Gaussian & Neural Accelerator) accelerator was declared deprecated and the GNA plugin got removed several major releases ago (since v2024). Could the NPU device be used for this task, similar to the GNA in the past, to offload the CPU/GPU/SoC? |
Summary
This PR adds a comprehensive notebook demonstrating Google's MedASR (Medical Automatic Speech Recognition) model optimization with OpenVINO and NNCF quantization.
What's New
Features
Model Conversion & Optimization
torch.exportModelType.TRANSFORMERpresetKey Results
Notebook Contents
Use Case
Medical speech recognition for: