Use faster-whisper for transcription and update transcription dependencies for Python 3.11 compatibility#124
Open
SoldierSacha wants to merge 17 commits intoManimCommunity:mainfrom
Open
Use faster-whisper for transcription and update transcription dependencies for Python 3.11 compatibility#124SoldierSacha wants to merge 17 commits intoManimCommunity:mainfrom
SoldierSacha wants to merge 17 commits intoManimCommunity:mainfrom
Conversation
- Replace openai-whisper and stable-ts dependencies with faster-whisper - Update transcription implementation in base.py to use faster-whisper API - Update timestamps_to_word_boundaries to handle faster-whisper segments - Update documentation references to point to faster-whisper repository - Update error messages and docstrings faster-whisper provides significant performance improvements through CTranslate2 optimization while maintaining compatibility with OpenAI Whisper models. https://claude.ai/code/session_01Paz1JXifQu8F3npTJpx2TT
…r-AFaND Replace OpenAI Whisper with faster-whisper for transcription
Allow passing the OpenAI API key directly to the constructor instead of requiring it to be set via environment variable. Falls back to OPENAI_API_KEY env var if not provided. https://claude.ai/code/session_01WkxRgXzowc15j1FFFmFP2r
…YPY0L Allow OpenAI API key to be passed as constructor parameter
Add validation in OpenAIService.generate_from_text() to check if the
input text is empty after removing bookmarks. This prevents sending
empty strings to OpenAI's TTS API, which returns a 400 error with
'loc': ('body', 'input').
The empty string issue can occur when:
- User passes whitespace-only text
- User passes text containing only bookmark tags
https://claude.ai/code/session_01LH2yMANh1zTUAdXv9hWeu4
…0K01j Add validation for empty text after bookmark removal in OpenAI service
- Add transcription_model_kwargs parameter to SpeechService.__init__() to allow passing kwargs (like compute_type) to WhisperModel - Add model_kwargs parameter to set_transcription() method - Suppress ctranslate2 warning logs by default https://claude.ai/code/session_011zmzHd9JQbjQ8n1vyKkDoH
…zvJE Add support for WhisperModel constructor kwargs in transcription
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR improves the transcription workflow by migrating to faster-whisper .
This migration also happens to resolve an installation issue on Python 3.11 caused by outdated openai-whisper and llvmlite version requirements.
Background / Motivation
The current transcribe extra depends on:
This version of whisper pulls in old numba/llvmlite versions that only support Python < 3.10. As a result, manim-voiceover[transcribe] fails to install on Python 3.11+ (especially on macOS and CI/Linux environments). This personally caused me repeated dependency resolution and build failures. So I decided to fix it myself.
Benefits
Notes: