Skip to content

Add MedASR medical speech recognition notebook#3416

Open
padatta wants to merge 2 commits intoopenvinotoolkit:latestfrom
padatta:add-medasr-medical-asr-notebook
Open

Add MedASR medical speech recognition notebook#3416
padatta wants to merge 2 commits intoopenvinotoolkit:latestfrom
padatta:add-medasr-medical-asr-notebook

Conversation

@padatta
Copy link
Copy Markdown

@padatta padatta commented Apr 15, 2026

Summary

This PR adds a comprehensive notebook demonstrating Google's MedASR (Medical Automatic Speech Recognition) model optimization with OpenVINO and NNCF quantization.

What's New

  • Demonstrates medical speech recognition with INT8 quantization
  • Complete workflow from PyTorch to optimized OpenVINO models

Features

Model Conversion & Optimization

  • Convert MedASR PyTorch model to OpenVINO FP16 using torch.export
  • Apply INT8 quantization with NNCF ModelType.TRANSFORMER preset
  • Use real audio calibration data for accurate quantization

Key Results

  • Model Compression: 3.9x size reduction (402 MB → 102 MB)
  • Accuracy: 97.98% token match accuracy (INT8 vs PyTorch)
  • Medical Terminology: Optimized for clinical documentation

Notebook Contents

  1. Installation & Setup
  2. Load MedASR model from HuggingFace
  3. Prepare audio data (10s optimal for GPU)
  4. PyTorch inference baseline
  5. OpenVINO FP16 conversion
  6. INT8 quantization with real audio calibration
  7. Accuracy comparison (PyTorch/FP16/INT8)
  8. Performance benchmarking (CPU/GPU)

Use Case

Medical speech recognition for:

  • Clinical documentation
  • Medical transcription services
  • Healthcare voice assistants
  • Edge deployment in medical devices

@review-notebook-app
Copy link
Copy Markdown

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@brmarkus
Copy link
Copy Markdown

Have a look into e.g. "https://github.com/padatta/openvino_notebooks/blob/6736abef648d34b96f51b7ebcf5b4801ae24fe35/notebooks/llm-chatbot/llm-chatbot.ipynb" - could you add a note regarding HuggingFace access token? This could help improve usability.

Note: run model with demo, you will need to accept license agreement. You must be a registered user in 🤗 Hugging Face Hub. Please visit HuggingFace model card, carefully read terms of usage and click accept button. You will need to use an access token for the code below to run. For more information on access tokens, refer to this section of the documentation. You can login on Hugging Face Hub in notebook environment, using following code:

## login to huggingfacehub to get access to pretrained model 

from huggingface_hub import notebook_login, whoami

try:
    whoami()
    print('Authorization token already provided')
except OSError:
    notebook_login()

@padatta padatta force-pushed the add-medasr-medical-asr-notebook branch from 6736abe to ed695b7 Compare April 15, 2026 12:02
This notebook demonstrates converting Google's MedASR model to OpenVINO
with FP16 and INT8 quantization for efficient medical speech recognition.

Features:
- HuggingFace authentication with notebook_login for gated model access
- Model conversion using torch.export and ov.convert_model
- INT8 quantization with NNCF using real audio calibration data
- Comprehensive accuracy validation (97.98% token-level accuracy)
- Performance benchmarking on CPU and GPU
- Model compression: 402 MB -> 102 MB (3.9x reduction)

The notebook includes complete workflow from model loading to deployment,
with support for 10-second audio chunks (static shape [1, 998, 128]).
@padatta padatta force-pushed the add-medasr-medical-asr-notebook branch from ed695b7 to 93c9058 Compare April 15, 2026 12:06
@padatta
Copy link
Copy Markdown
Author

padatta commented Apr 15, 2026

Have a look into e.g. "https://github.com/padatta/openvino_notebooks/blob/6736abef648d34b96f51b7ebcf5b4801ae24fe35/notebooks/llm-chatbot/llm-chatbot.ipynb" - could you add a note regarding HuggingFace access token? This could help improve usability.

Note: run model with demo, you will need to accept license agreement. You must be a registered user in 🤗 Hugging Face Hub. Please visit HuggingFace model card, carefully read terms of usage and click accept button. You will need to use an access token for the code below to run. For more information on access tokens, refer to this section of the documentation. You can login on Hugging Face Hub in notebook environment, using following code:

## login to huggingfacehub to get access to pretrained model 

from huggingface_hub import notebook_login, whoami

try:
    whoami()
    print('Authorization token already provided')
except OSError:
    notebook_login()

"I have pushed the changes. The notebook now includes a HuggingFace login section for gated model access, updated instructions, and improved usability."

@brmarkus
Copy link
Copy Markdown

The Intel GNA (Gaussian & Neural Accelerator) accelerator was declared deprecated and the GNA plugin got removed several major releases ago (since v2024).

Could the NPU device be used for this task, similar to the GNA in the past, to offload the CPU/GPU/SoC?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants