Skip to content

Conversation

@larryliu0820
Copy link
Collaborator

This pull request improves device and dtype handling across the ExecuTorch export pipeline, ensuring that models, tensors, and modules are consistently placed on the correct device and use the appropriate data type. Additionally, it introduces a new test to validate exporting Whisper models with bfloat16 precision and checks the resulting file size.

Device and dtype consistency improvements:

  • Updated initialization and export logic in optimum/exporters/executorch/integrations.py to use model.device and model.dtype for all relevant tensors and modules, replacing hardcoded "cpu" and torch.float32 values. This ensures exported models and caches are created on the correct device with the correct data type. [1] [2] [3] [4] [5]
  • Modified load_seq2seq_speech_model in optimum/exporters/executorch/tasks/asr.py to accept device and dtype as keyword arguments, passing them to the underlying model loading and export logic.

Testing enhancements:

  • Added a new slow test test_whisper_large_v3_turbo_export_bfloat16 in tests/models/test_modeling_whisper.py to export the Whisper large v3 turbo model with bfloat16 precision, verify the output file exists, and check that its size is approximately 1.2GB with a 10% tolerance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant